Netherlands Institute for Radio Astronomy
Example astronomical applications: Single Source Imaging, Surveys, polarization imaging, Transient detection and follow-up with interferometry.
The interferometric imaging mode provides correlated visibility data (similar to traditional aperture synthesis radio telescope arrays consisting of antenna elements). Station beams are transferred to the Central Processing (CEP) facility (currently an IBM Blue Gene/P computer is used as the correlator) where they are correlated to produce raw visibility data, which are writen in Measurement Set format in the postprocessing cluster (known as the CEP2 cluster) also located at CEP.In LOFAR Version 1.0, two observing modes with associated "recipes" for the imaging pipeline are suggested for targeted observations:
Observations with the Low Band Antennae (LBA):
Continuous in time/Hour Angle observations with half the available bandwidth on the target field (<=24 MHz, <=122 subbands) and half on a (strong source) calibrator (the same as the target<=24 MHz, <=122 subbands).
Observation in the band of 10-80 MHz.
Processing with the Standard Imaging Pipeline
Observations with the High Band Antennae (HBA):
Interleaved short calibrator observations (eg. 2 min) with target field (eg. ~11-30 min), quasi-continuous in HA. Up to the full available bandwidth.
Observations in one of the three HBA bands: 110-190 MHz, 170-230 MHz, 210-250 MHz
Processing with the Standard Imaging Pipeline
Note: Each "observation" includes not only the uv-data taking (as in traditional interferometers) but also processing via the "Standard Imaging Pipeline" (see below), to the capabilities of the pipeline as defined in each software release.
The processing is based on parameters that are defined by the proposers, as they are approved from the Science Support Group.
Hence, all proposals should also take into account the processing needed to reach their scientific goals.
Users can require export of averaged uv-data for further processing at their own computing facilities.
In the future, computational resources for further processing wll be available in the Long Term Archive sites or in the Grid. Users will be able to propose for the use of these resources in future Calls for Proposals (when available).
The "Standard Imaging Pipeline"
Further processing of the raw uv data, which consists of calibration and imaging, is handled offline via a series of automated pipelines (see "Software Pipelines"). Calibration is an iterative process of obtaining the best estimates of instrumental and environmental effects such as electronic station gains and ionospheric delays.
Figure 1: The Imaging Pipeline shown schematically. A short overview of the pipeline is given by Heald et al (2010)[PDF].
The Standard Imaging Pipeline has been under development, and its first version deployed in LOFAR Version 1, is graphically illustrated in Figure 1.
The first standard data processing steps are encapsulated within a sub-pipeline called the Preprocessing Pipeline, which itself has two steps:
(a) The Calibrator Pipeline is first step to flag the data in time and frequency, and optionally to compress the data in time, frequency, or both (the software that performs this step is labelled New Default Pre-Processing Pipeline, or NDPPP). This stage of the processing also includes a subtraction of the contributions of the brightest sources in the sky (Cygnus A, Cassiopeia A, etc.) from the visibilities through the 'demixing' algorithm.
(b) In the next step, the Target Pre-processing pipeline, an initial calibration is performed. Using the BlackBoard Selfcal (BBS) system that has been developed for LOFAR, a local sky model (LSM) is generated from a Global Sky Model (GSM) that is stored in a database. Calibration of the complex station gains is achieved using this LSM, or optionally from externally determined calibration parameters (as shown in the figure; such external parameters could be produced by a simultaneous calibrator observation, for example). At this stage, an additional flagging operation (not shown in the Figure) is performed with NDPPP in order to clip any remaining RFI or bad data.
Following the preprocessing stage, the calibrated data are further processed in the Imaging Pipeline, which begins with an imaging step that uses the AWImager, an updated version of the CASA imager that performs both w-projection and A-projection.
Source finding software is used to identify the sources detected in the image, and generate an updated local sky model. One or more `major cycle' loops of calibration (with BBS), flagging, imaging, and LSM updates will then be performed.
At the end of the process, the final LSM
will be used to update the GSM, and final image products will be produced.
The entire end-to-end process, from performing the observation through obtaining the final images is overseen by the SCHEDULER,
software specially developed for LOFAR. In addition to scheduling the
observing blocks at the telescope level, it keeps an overview of the
storage resources in order to decide where to store the raw
visibilities. It also keeps an overview of the computational resources
on the cluster, so that runs of the Preprocessing Pipeline and Imaging
Pipeline can be scheduled and distributed over cluster nodes with
available processing power.
Final Data Products
The final data products include the calibrated uv data, optionally averaged in time and frequency, and corresponding images/image cubes. Visibility averaging is performed to a level which reduces the data volume to a manageable level, while minimizing the effects of time and bandwidth smearing. The averaging parameters are determined from the request of the astronomers/users upon consultation with the Radio Observatory's Science Support Group.
The final products will be stored at the LOFAR Long Term Archive where,
in the future, significant computing facilities will become available
for further re-processing. It will be possible to routinely export
datasets from the LTA to investigators for reduction and analysis at
their Science Centre or through the use of suitable resources on the
GRID.
This mode requires medium to long-term storage of un-calibrated or
partially calibrated data at the central processing facility, until
processing is complete, following detailed inspection of results by the
user. The resulting storage and processing requirements will impose
limits on the amount of such customized reprocessing which may be
conducted in the early years of LOFAR operation.
Current Status and Performance of the SIP
The first implementation of the imaging pipeline in LOFAR Version 1.0 will produce images and calibrated data after only the first imaging step. Development of "Major Cycle" calibration will follow.
To provide an estimate of the performance of the Stαndard Imaging Pipeline a series of test observations have been performed and analysed.
The observations were based on the recommended observing and processing setups mentioned above.
An LBA and an HBA observation of a relatively empty field, that has already been observed in the MSSS survey, L227+69 were performed in April 2012. The observation used 3C295 as a calibrator.
Hence this first set of test observations provide the characteristics of imaging a relatively empty field (the filed of L227+69 which does not containing a strong source) as well as field containing strong source(by imaging the field containing the calibrator 3C295).
The characteristics of the observations are the following:LBA: 6 hours observations with 24 MHz on the target field and 4 MHz on a calibrator, 3C295.
Bandwidth: 48 MHz (24 MHz on 3C295 and 24 MHz on the target field)
Number of stations: 31 (22CS+9RS)
Duration: 6 hours
Integration time: 1 second
Data distribution: 2SBs per node (61 nodes used)
HBA: 6 hours of observations with alternating 1min observations of the calibrator, 3C295, and 15 min observations of the target field.
Bandwidth: 24 MHz (120 Subbands)
Number of stations: 51 (42CS+9RS)
Integration time: 2 seconds
Data distribution: 2 subbandss per node (61 nodes used)
A further observation of a source close to an A-team source has also been conducted. The analysis of this has started and the results are expected from the week May 29-June 2.
The Pre-processing and Image Processing subpipelines of the SIP were run following the method that will be used operationally.
Processing on the CEP2 cluster, was under a range of conditions assumed to be representative of the operational period, namely in combination with observations, in combination with other (not extremely computational intensive) processing etc.
The run times of various pipelines are assumed to be indicative of the current software.
Pre-processing pipelines
In order to aquire more insight in the processing time required, observations with the same characteristics as above were made and processed for various combination of stations.
The run times of the pre-processing pipelines are sumarised in the Table below:
|
Type |
Nr Stations |
Pre-processing (hrs) |
Processing/ observation ratio |
|
HBA Superterp |
12 |
4.5 |
0.7 |
|
HBA CS |
46 |
25 |
4 |
|
HBA CS+RS |
51 |
55 |
9 |
|
LBA Superterp |
6 |
0.5 |
0.1 |
|
LBA CS |
21 |
3.3 |
0.5 |
|
LBA CS+RS |
31 |
11 |
1.8 |
Table 1: Pre-processing Stage Performance - Summary Table
Imaging Pipeline
Each dataset has been imaged using the "imaging sub-pipeline" with the current implementation of the AWImager. To derive useful estimates of the sensitivity that can be achieved and the requred processing load, a succession of baseline tapering (at 3km, 6km, 12km and 24 km) and different data weighting (given by Robust parameters -1,0,1), yielding different resolution, have been tried.
Furthermore to investigate the effects of observing frequency, each dataset was split into 12 "bands", each with a bandwidth of ~ 2 MHz, which were created by averaging on average 10 subbands.
[NOTE: The information in this section is updated as the results of the system characterization are produced. Detailed analysis of the results will be added, as more insights will be gathered. An additional web page will give the full tabular results for the interested users.
The current results can be found in an Excel speadsheet following this link]
Plots of the reached sensitivity and the respective processing time along frequency for different baseline tapering and data weighting are given below.
Note that, in order to illustrate the realistic conditions of processing, the plots below show all the available results, including certain outliers which are due to either RFI still present in some bands (see eg the higher noise in the upper 3 bands of the 3C295 HBA observations), incredibly long processing times in a few bands, (due mainly to parallel computing intensive activity in the same node), narrower fractional bandwidth of a band (from less number of subbands included in the averaging as eg. in band 77.5 MHz of LBA observations of L227+69).
Figure 2: Imaging time (in minutes) and Sensitivity (RMS noise) of the image, across frequency for the LBA observation of the L227+69 field for different baseline tapering.
Figure 3: Resolution, Sensitivity and Imaging time across frequency for the HBA observations of L227+69 field for tapering at the 3km and 6km baselines. Note that the first "band" at 116 MHz has problems so the derived numbers are not reliable.
Figure 4: Imaging time (in minutes) and Sensitivity (RMS noise) of the image, across frequency for the LBA and the HBA observation of the 3C295 field for different baseline tapering.
System perforance Trends
We consider only the trends from the plots in Figures 1-6 to derive the following general conclusions for the current performance of the array and associated Standard Imaging Pipeline.
a) Image Sensitivity
In the LBA, in general, the achieved sensitivity in total intensity is ~125 times the thermal noise and in the QUV it is a factor of 10 of the thermal noise.The sensitivity for the field containing a strong source is slightly lower (ie the RMS noise is ~1.2 to 2 times larger) than that of the empty field.
In the HBA the sensitivity in total instensity I is a factor of 100 above the thermal noise and in QUV a factor of 10 for the relatively empy field.
Imaging Processing / Observing Time Ratio
In order to illustrate more clearly the amount of processing needed for the Imaging step (run time of AWIMAGER), the "Imaging Time" plots (Fig 1-4) above are scaled to the observing time (~360 min) to provide an Processing over Observing Time ratio (Figures 5 and 6).
Figure 5: The processing time over observing time ration (P/O) for the Imaging step for the field of L227+69 and the same parameters as above (note that these diagrammesuse the same data as the "Imaging Time" plots in Figures 2 and 3 above
Figure 6: The processing time over observing time ration (P/O) for the Imaging step for the field of 3C295 and the same parameters as above (note that these diagrammes use the same data as the "Imaging Time" plots in Figures 4 above)
Given the finite computing resources of the post-processing cluster special considerstion is needed in deriving the proposed parameters of the final images.
The ratio of Imaging processing time over Observing time (P/O) is given in Figures 5 and 6, above for corresponing parameters as above.
The processing time appears to increase significantly when imaging baselines longer than 12km with the total Field of View, particularly for the HBA. It becomes evident that for the current available processing power, in order to image a dataset in a practical timescale, the Field of View has to be limited, in order to include long baselines and produce high resolution images.
Expected Short Term Improvements
We expect further improvements in the processing time, particularly as, by June, the Standard Imaging Pipeline will incorporate the new method of Demixing.
Image sensitivity is expected to improve as a series of improvements will be applied:
a) Station static-delay calibration tables will improve.
b) Synchronization of the analogue processors through specialized optics in the single clock board and will result in an improvement of the beam and of station sensitivity.
c) Cycles of Self-calibration and Imaging. A "Major Cycle" is expected to be incorporated in the SIP in the summer of 2012.
While the current implementation of the Standard Imaging Pipeline produces resutls with the characteristics above, further processing and extra data reduction steps have shown to produce images with much higher sensitivity. Further processing and data reduction can be done either at CEP (additional processing by experts must be requested when proposing for observatons) or at external computing facilities, using the data extracted via the LOFAR Long Term Archive.
Examples of further analysis which produced improved results compared to the current output of the SIP are eg. Abell 2256 (van Weeren et al, astroph) or the images of the field around 3C196 produced by the EoR Key Science Project.
Using International Stations
Baselines provided by the international array range from 53km (DE601-DE605) to 1292km (SE607-FR606). The corresponding resolution extends far into the sub-arcsec range for the high band and reaches 1 arcsec even in the low band (see eg. table 2 at "Lofar Imaging Capabilities" ).
It should be noted that the total number of input data streams in the Correlator is limited to 64, hence some Dutch stations have to be excluded when we select all international stations in HBA_DUAL mode.
Even in the high band, ionospheric effects can be quite severe. At night-time and under quiet ionospheric conditions, (differential) ionospheric delays are generally below 0.5 microsec at frequencies around 150 MHz, but these values can easily increase by an order of magnitude at low elevation or at day-time. 5 microsec correspond to about one full phase turn over a subband, so that data have the be kept channelised before calibration. Keeping 16 channels generally seems to be safe for the high band. Phase rates can be below 10 mHz (again around 150 MHz) under good conditions, but increase to 100 mHz during day-time. This implies that time-averaging beyond 1 or at most a few seconds generally has to be avoided.
This leads to a data volume before calibration, considerably larger than when using baselines in the Netherlands. Ionospheric effects are even much stronger in the low band.
Besides the total ionospheric delays, long baselines are also strongly affected by differential Faraday rotation (DFR). Under optimal conditions, DFR can be small enough to be neglected in the high band, but often (and almost always in the low band) the effect is so strong that it introduces closure errors of the order one. In this general case, DFR has to be solved for, explicitly (which will eventually be possible with BBS), or the procedure described below has to be applied to avoid the problem.
The point-source sensitivity of the international array is higher than that of the Dutch array alone. Because of the higher number of antennas/elements of IS, even the baseline-sensitivity is superior for international baselines. Nevertheless, sensitivity is one of the limiting factors in international observations. The reason is that almost all known sources are more or less resolved on international baselines. Due to interstellar scattering, this may even be true for intrinsically compact sources. As a result, fluxes on international baselines are typically reduced by one or even several orders of magnitude compared to shorter baselines. With the exception of very bright sources, the flux per subband and integration time is not sufficient for calibration. Because of this, direct phase solutions are generally not sufficient, but at least linear trends in time (rates) and frequency (delays) have to be used. Contributions from clock offsets (non-dispersive) and the ionosphere (dispersive) generally both have to be taken into account.
BBS will eventually be able to solve for these parameters. For the time being, fringe-fitting with other software packages is the best option for all but the brightest sources.
Until fringe-fitting and DFR-correction are implemented into BBS and are fully operational, the following procedure is used to image long-baseline observations. Note that not all the required software is publicly available at the moment. This will change in the near future.