Each major observing mode has its own set of post-processing pipelines and associated options. The first pipelines which will be available by mid-2012 are summarised in Table 1 and are described below.
Figure 1: Schematic form of the LOFAR Imaging Pipeine
The Imaging Pipeline is shown schematically in Figure 1. A short overview of the pipeline is given by Heald et al (2010)[PDF].
Following the data path from the left, visibility data are created in Measurement Sets at the IBM Blue Gene/P correlator in Groningen (Online Processing, or OLAP), and recorded to storage nodes in the current LOFAR offline processing cluster.
The first standard data processing steps are encapsulated within a sub-pipeline called the Preprocessing Pipeline. Its role is to flag the data in time and frequency, and optionally to compress the data in time, frequency, or both (the software that performs this step is labelled New Default Pre-Processing Pipeline, or NDPPP). This stage of the processing also includes a subtraction of the contributions of the brightest sources in the sky (Cygnus A, Cassiopeia A, etc.) from the visibilities through the 'demixing' algorithm.
Next, an initial calibration is performed. Using the BlackBoard Selfcal (BBS) system that has been developed for LOFAR, a local sky model (LSM) is generated from a Global Sky Model (GSM) that is stored in a database. Calibration of the complex station gains is achieved using this LSM, or optionally from externally determined calibration parameters (as shown in the figure; such external parameters could be produced by a simultaneous calibrator observation, for example). At this stage, an additional flagging operation (not shown in the Figure) is performed with NDPPP in order to clip any remaining RFI or bad data.
Following the preprocessing stage, the calibrated data are further processed in the Imaging Pipeline, which begins with an imaging step that uses an updated version of the CASA imager that performs both w-projection and A-projection (called the AWImager).
Source finding software is used to identify the sources detected in the image, and generate an updated local sky model. One or more `major cycle' loops of calibration (with BBS), flagging, imaging, and LSM updates will then be performed. At the end of the process, the final LSM will be used to update the GSM, and final image products will be made available via the archive.
The entire end-to-end process, from performing the observation through obtaining the final images is overseen by the SCHEDULER, software specially developed for LOFAR. In addition to scheduling the observing blocks at the telescope level, it keeps an overview of the storage resources in order to decide where to store the raw visibilities. It also keeps an overview of the computational resources on the cluster, so that runs of the Preprocessing Pipeline and Imaging Pipeline can be scheduled and distributed over cluster nodes with available processing power.
Figure 2: A schematic overview of the overall Pulsar Pipeline, as it runs online on the BG/P, followed by offline scientific processing on the offline cluster. Offline pipeline processing can be run on data directly out of the BG/P or on RFI-filtered data.
The offline pulsar processing is shown schematically in Figure 2, and is described in more detail by Stappers et al (2011), where the online beam-formed pipeline and its various sub-modes are also discussed (see also Mol & Romein 2011).
The Beam-Formed data written by BG/P are stored on the LOFAR offline processing cluster in the HDF5 format (Hierarchical Data Format). Several conversion tools have been developed to convert these data into other formats, e.g. PSRFITS, suitable for direct input into to standard pulsar data reduction packages, such as PSRCHIVE, PRESTO, and SIGPROC.
However, the long-term goal is to adapt these packages to natively read HDF5, using classes which exist for interpreting the HDF5 files.
Among other things, these reduction packages allow for RFI masking, dedispersion, and searching of the data for single pulses and periodic signals. Already, a test-mode exists to perform coherent dedispersion online, also for multiple beams/dispersion measures. Likewise, online RFI excision is also being implemented in order to excise corrupted data from individual stations before it is added in to form an array beam.
Since the Beam-Formed data serve a much larger community than just pulsar astronomers, various other analysis tools are also being developed, e.g. a dynamic spectrum tool.
At the moment the automatic processing pipeline is not yet available for TBB data. Data is stored as Raw Voltages per station in HDF5 format, including some of the metadata. Python code to acces these (meta-)data and perform tasks such as FFT or RFI-mitigation, exists at CEP (pycrtools).
A full dump of one second of data for one RCU is about 400 MB. A full dump of one 48 antenna station therefore amounts to approximately 40 GB.
There is no central processing available for station correlator data at the CEP in Groningen. This is a very specialised observing mode. The data will be transferred from the station directly to the storage network in Groningen, after which the PI is responsible for any further processing of the data and/or data transfer off-site. The data consists of binary blocks of 64 bit floating point numbers. The start date is encoded in the file name. There is no other metadata associated with this type of dataset.
PIs who intend to use this mode should contact the LOFAR sciencesupport [at] astron [dot] nl (science support group).