| Dedispersion, the removal of deleterious smearing of impulsive signals caused by interstellar matter, is one of the most intensive processing steps in any radio survey for pulsars and fast transients. Thus for real-life pulsar and FRB surveys such as LOTAAS on LOFAR, ALERT on Apertif, and for the SKA, dedispersion accounts for a significant fraction of the required compute demands.
In our recent paper in Astronomy and Computing we present a study of the parallelization of this algorithm on many-core accelerators, including GPUs from AMD and NVIDIA (shown in the background in the Figure, above), and the Intel Xeon Phi. The team behind this paper consists of a mix of computer scientists and astronomers and includes ASTRON's Alessio Sclocco and Joeri van Leeuwen. Many-core accelerators, such as GPUs, can deliver performance that is orders of magnitude higher than traditional CPUs, resulting also in higher performance per watt and performance per dollar ratios. They are, however, more difficult to program and optimize, thus the focus of our work.
In our results, we find that dedispersion is an inherently memory-bound algorithm, which means that its performance is bound by the memory bandwidth of the platform used for execution, a resource scarcer than computational power. We also find we need to exploit auto-tuning, a technique to automatically select the best parameters for each execution, to adapt the same code to different accelerators, observations, and even telescopes.
In the figure we present two of the plots included in the paper. The plot on the left shows the performance of our dedispersion algorithm on different platforms. You can see that GPUs are faster than the Xeon Phi and traditional CPUs, and are able to compute thousands of DMs in real time. The plot on the right shows the shape of the optimization space of a particular instance. The point to take home there is that optimal configurations can be far from the average, and are thus difficult to predict without tuning.
To conclude, thanks to the contributions of our paper, dedispersion for ALERT will require only 50 GPUs.