===== Data Processing School :: Exercise 08 =====

^ Source directory | /data/lofarschool/data/Exercise-08 |
^ Contact person | Joris van Zwieten (zwieten -at- astron.nl) |

==== Context ====

The goal of this exercise is to acquaint you with the calibration of ''LOFAR'' data using the ''BBS'' software package. ''BBS'' is intended to operate as an integral part of the standard imaging pipeline. Though it can be run manually, it was originally designed to be run in an automated fashion. You will probably be reminded of this fact now and then in the course of this exercise. 

We encourage you to explore the software and try out things other than those presented here if you feel like it. We very much welcome any feedback and/or suggestions you may have. Just send me an email at zwieten -at- astron.nl. Good luck with the exercise!


==== Prerequisites ====

Basic knowledge of Python and the ''ipython'' shell, [[http://www.astron.nl/aips++/docs/notes/229/229.html|MeasurementSet]] format, ''casaviewer'' or any available ''fits'' image viewer, and a notion radio data reduction.

==== Description ====

The goal of this exercise is to get hands-on experience with the reduction of ''LOFAR'' data in general and with the ''BBS'' software package in particular. In the course of the exercise you will:
  * Inspect an uncalibrated ''LOFAR'' image
  * Perform a simple simulation with ''BBS''
  * Perform rudimentary self-calibration of ''LOFAR'' data using ''BBS''
  * Inspect calibration solutions
  * Flag data based on the quality of calibration solutions
  * Inspect a calibrated ''LOFAR'' image


==== Files & Directories ====

The following is a list of the files present in the data directory for this exercise (''/data/lofarschool/data/Exercise-08/'') along with a short description of each file.

<code>
/data/lofarschool/data/Exercise-08/
  |
  |-- image-plane-cal.parset    # BBS parameter set file (image-plane calibration)
  |-- instrument                # Initial instrument parameter database
  |-- lioffen.clusterdesc       # Description of the offline cluster
  |-- make-image.sh             # Script to make an image
  |-- plot.py                   # Script to plot calibration solutions
  |-- simulation.parset         # BBS parameter set file (simulation)
  |-- sky.in                    # Initial sky model
  |-- solflag.py                # Script to flag data based on solution quality
  |-- solplot.py                # Helper script for plot.py
  |-- uv-plane-cal.parset       # BBS parameter set file (uv-plane calibration)
  |-- L2007_03463_SB10_compressed.MS    #
  |-- L2007_03463_SB11_compressed.MS    #
  |-- L2007_03463_SB12_compressed.MS    # LOFAR measurement sets
  |-- L2007_03463_SB13_compressed.MS    #
  |-- L2007_03463_SB14_compressed.MS    # Each set contains a single
  |-- L2007_03463_SB15_compressed.MS    # subband. You will only need
  |-- L2007_03463_SB16_compressed.MS    # one of these measurement sets
  |-- L2007_03463_SB17_compressed.MS    # for this exercise.
  |-- L2007_03463_SB18_compressed.MS    #
  |-- L2007_03463_SB19_compressed.MS    #
</code>


==== Step-by-step instructions ====

This section starts with a step-by-step guide to correctly setup the environment for this exercise. Next is a short description of the scripts that will be used regularly in the course of this exercise. Below that are the step-by-step instructions for the exercise itself. The topics 'Observed data' up to and including 'Image-plane clalibration' are self contained, so you can pick the ones that interest you most to save time.

  - [[#Setting up the environment]]
  - [[#Scripts]]
  - [[#Observed data]]
  - [[#Simulation]]
  - [[#UV-plane calibration]]
  - [[#Image-plane calibration]]
  - [[#Calibration solutions]]
  - [[#Solution based flagging]]
  - [[#A deeper image]]
  - [[#Clean up]]

=== Setting up the environment ===

Initialize the ''LOFAR'' software environment.
<code>
source /app/lofar/dev/lofarinit.csh
</code>

Initialize ''pyrap'' (the Python wrappings for ''casacore'').
<code>
source /app/scripts/doPyrap
</code>

Create a private directory below ''/data/lofarschool/users''. **NB.** Please //do not use spaces or special characters// to avoid quoting issues later.
<code>
cd /data/lofarschool/users
mkdir <your directory>
</code>

Copy all files needed for the exercise. Several measurement sets can be found under ''/data/lofarschool/data/Exercise-08''. You'll only need a single measurement set for the exercise, so //just pick one//. Later you can look for participants who have reduced a different measurement sets and create an averaged (deeper) output image. The name of the measurement set used in these instructions is always ''L2007_03463_SB10_compressed.MS''. Please replace it with the name of the measurement set you picked (starting with the copy command below).
<code>
cd <your directory>
cp /data/lofarschool/data/Exercise-08/* .
cp -r /data/lofarschool/data/Exercise-08/instrument .
cp -r /data/lofarschool/data/Exercise-08/L2007_03463_SB10_compressed.MS .
</code>


Create a measurement description (VDS) file.
<code>
makevds lioffen.clusterdesc L2007_03463_SB10_compressed.MS
combinevds L2007_03463_compressed.vds L2007_03463_SB10_compressed.MS.vds
rm L2007_03463_SB10_compressed.MS.vds
</code>

=== Scripts ===

This section contains a short description of two scripts that will be used regularly in the course of this exercise: ''make-image.sh'' and ''calibrate''.

The ''make-image.sh'' script will be used to create images. The script takes three arguments: the measurement set name, the column name (one of ''DATA'', ''MODEL_DATA'', ''CORRECTED_DATA''), and the output image name, respectively. The script will create both a ''.img'' (''casa'') image and a ''.fits'' image. Imaging should take approximately 5 minutes.

The ''calibrate'' script will be used to start ''BBS''. If you run this script without arguments, it will print a short help text. Try it if you like. The following options should always be specified:
<code>
--cluster-desc lioffen.clusterdesc
--db cepmaster0
--db-user postgres
</code>

The ''lioffen.clusterdesc'' file contains a description of the offline processing cluster. It should be located in the same directory where you start the calibrate script, or you have to specify a path to it. The ''--db'' options specify the database server and user to use.

The ''calibrate'' script starts processes via ''ssh''. When you start it for the first time, ''ssh'' may report that it cannot confirm the authenticity of the host and will ask if you want to continue connecting. Answer 'yes' to continue.

Running the ''calibrate'' script generates a large number of log files. You may want to delete them from time to time as you work your way through the exercise.

Should you need to terminate the ''calibrate'' script (i.e. by hitting Control-C), make sure you also kill any remaining ''BBS'' processes:
<code>
killall GlobalControl KernelControl SolverControl
</code>


=== Observed data ===

The measurement set you picked is one of the 36 subbands (38-60 MHz) of the measurement called ''L2007_03463.MS''. This measurement was performed on July 27, 2007. It is a 24-hour measurement (2861 timeslots of 30.2 s. each) using 16 single ''LBA'' (Low Band Antenna) //micro-stations//, targeted at the North celestial pole. The individual subband measurement sets were ''RFI'' flagged and compressed to a single frequency channel.

We will start by imaging the raw visibility data using the ''make-image.sh'' script.
<code>
./make-image.sh L2007_03463_SB10_compressed.MS DATA observed
</code>
This will create ''observed.img'' (a ''casa'' image) and ''observed.fits'' (a ''fits'' image). You can use ''casaviewer'' to view ''observed.img'', or you can use a ''fits'' viewer (e.g. ''ds9'') to view ''observed.fits''.

=== Simulation ===

Next, we will simulate visbility data using ''BBS''. The sky model, ''sky.in'', contains the two strongest radio sources at ''LOFAR'' frequencies: Cassiopeia A and Cygnus A. The sky model is just a flat ''ASCII'' file, so you could inspect it with a text editor.

''BBS'' expects a //parameter set file// that describes the processing steps that will be performed. A parameter set file is simply a text file that contains a list of ''key = value'' pairs. The file we will use is called ''simulation.parset''. If you like, open it in a text editor and try to figure out what is does. An overview of the keys recognized by ''BBS'' can be found in the {{exercise-08-parset_syntax.pdf|parset documentation}}.

Run ''BBS'' using the ''calibrate'' script as follows:
<code>
calibrate -f --cluster-desc lioffen.clusterdesc --db cepmaster0
--db-user postgres --instrument-db instrument L2007_03463_compressed.vds
simulation.parset sky.in /data/lofarschool/users/<your directory>
</code>

Make an image of the ''MODEL_DATA'' column:
<code>
./make-image.sh L2007_03463_SB10_compressed.MS MODEL_DATA simulation
</code>

Have a look at either ''simualtion.img'' or ''simulation.fits''. You should see an all sky map, centered on the North celestial pole, containing two strong point sources.

=== UV-plane calibration ===

Ok. Time to calibrate the visibility data. Open ''uv-plane-cal.parset'' in a text editor and figure out what it does. Two separate operations will be performed on the data. First, the (uv-plane) complex antenna gains will be fitted. Second, the data will be corrected for these gains. The calibration should take approximately 5 minutes. Of course, if many people are running  at the same time, it may take longer.

Run ''BBS'' using the ''calibrate'' script as follows:
<code>
calibrate -f --cluster-desc lioffen.clusterdesc --db cepmaster0
--db-user postgres --instrument-db instrument L2007_03463_compressed.vds
uv-plane-cal.parset sky.in /data/lofarschool/users/<your directory>
</code>

Make an image of the ''CORRECTED_DATA'' column:
<code>
./make-image.sh L2007_03463_SB10_compressed.MS CORRECTED_DATA uv-plane
</code>

Have a look at ''uv-plane.img'' or ''uv-plane.fits''. The image you've just created should look similar to the result of the previous step. Of course, the images will not be exactly the same. There may be calibration errors. For instance, if gain amplitude has gone to zero during the fitting process, then correcting for it will create an image with a very prominent ringing structure across the entire map. Furthermore, the real sky contains many sources, so a sky model that consists of CasA and CygA is only an approximation.

=== Image-plane calibration ===

In this exercise we will subtract CasA and CygA to have a look at a number of fainter radio sources. Open ''image-plane-cal.parset'' in a text editor and figure out what it does. Three operations will be performed on the data. First, the complex antenna gains in the direction of CasA and CygA will be fitted. Second, CasA and CygA will be subtracted from the visibility data (each corrupted with their own complex gains). Third, the visibility data will be corrected for the complex gains in the direction of CasA. The calibration should take approximately 10 minutes. Of course, if many people are running at the same time, it may take longer.

Run ''BBS'' using the ''calibrate'' script as follows:
<code>
calibrate -f --cluster-desc lioffen.clusterdesc --db cepmaster0
--db-user postgres --instrument-db instrument L2007_03463_compressed.vds
image-plane-cal.parset sky.in /data/lofarschool/users/<your directory>
</code>

Make an image of the ''CORRECTED_DATA'' column:
<code>
./make-image.sh L2007_03463_SB10_compressed.MS CORRECTED_DATA image-plane
</code>

Have a look at ''image-plane.img'' or ''image-plane.fits''. As an example, for ''L2007_03463_SB13_compressed.MS'' the calibrated image looks like this (**NB**. Note the remnant of CygA in the upper left corner):

{{exercise-08-sb13.png|}}


=== Calibration solutions ===

Did the image look reasonable? If not, it may that the fitting has failed for some time samples. In this exercise we will inspect the complex gains obtained during the previous step. The fitted model parameters are stored in a parameter database. By default, this database is stored inside the measurement set as a ''casa'' table called ''instrument''.

To plot the gains we will use a simple Python script (''pylab'' is used for the plotting). First, run the ''ipython'' interpreter in its ''pylab'' aware mode:
<code>
ipython -pylab
</code>

The first time you run ''ipython'' you may be presented with a welcome screen. Just press enter to continue. Now let's plot the complex gains for the x-dipole in the direction of CasA and CygA:
<code>
import plot
help(plot.plot)    # Use 'q' to exit help

plot.plot("L2007_03463_SB10_compressed.MS/instrument", "CasA")
plot.plot("L2007_03463_SB10_compressed.MS/instrument", "CygA")
</code>

You can plot the complex gains for the y-dipole using ''parm="Gain:22"'' as an extra argument.
<code>
plot.plot("L2007_03463_SB10_compressed.MS/instrument", "CasA", parm="Gain:22")
plot.plot("L2007_03463_SB10_compressed.MS/instrument", "CygA", parm="Gain:22")
</code>

The help text of the ''plot.plot()'' function shows a number of options we have not used yet. If you like, try different settings and have a look at the plots.

=== Solution based flagging ===

In this exercise we will try to improve the image by flagging visibility data that has become corrupted due to bad fitting solutions for the antenna gains. In particular, amplitudes close to zero have a large impact on the image (because correcting for such gains creates visibilities with very large amplitudes in the residuals).

Before you start, copy your measurement set. This is important because it is not trivial to undo flagging.
<code>
cp -r L2007_03463_SB10_compressed.MS L2007_03463_SB10_compressed_solflag.MS
</code>

The script we will use (called ''solflag.py'') does two things: It flags visibility data for which the corresponding gain amplitudes fall outside a given range and it flags visibility data based on the (windowed) median of the absolute distance to the median of the gain amplitudes.

The range of valid amplitudes must be supplied by the user. Therefore, we will have another look at the gain plots to pick suitable upper and lower amplitude thresholds. The same thresholds are applied to the gains for both directions (CasA and CygA), so make sure you choose thresholds that are reasonable for both plots.

If you left the ipython interpreter, restart it. Picking thresholds is easiest when the amplitude plots are not stacked (offset).
<code>
import plot

plot.plot("L2007_03463_SB10_compressed_solflag.MS/instrument", "CasA", stack=False)
plot.plot("L2007_03463_SB10_compressed_solflag.MS/instrument", "CygA", stack=False)
</code>

Pick a suitable upper and lower threshold. Typical values are 0.0008 for the upper and 0.000075 for the lower threshold. Substitute the values you picked in the ''solflag.flag()'' command below.
<code>
import solflag
help(solflag.flag)    # Use 'q' to exit help

solflag.flag("L2007_03463_SB10_compressed_solflag.MS",
    "L2007_03463_SB10_compressed_solflag.MS/instrument",
    half_window=5, threshold=3, cutoffLow=<your lower threshold>,
    cutoffHigh=<your upper threshold>)
</code>

The flagging script outputs the percentage of timestamps it flagged for each antenna. After the script finishes, make an image of the ''CORRECTED_DATA'' column.
<code>
./make-image.sh L2007_03463_SB10_compressed_solflag.MS CORRECTED_DATA
image-plane-solflag
</code>

Have a look at ''image-plane-solflag.img'' or ''image-plane-solflag.fits''. Did you see any improvement in the image? You can try to fiddle with the settings of the flagging script to see if you can improve the image further. Remember to start from a clean copy of your measurement set.

=== A deeper image ===

If everything went well, you should now have a quite decent subband image. As a last step, you could try to find other participants who reduced different subbands and average all images to obtain a deeper image.

The ''average.py'' script can be used as a starting point. Open it in a text editor and set the list of images to the images you have available. Then run it as followes:
<code>
python average.py
</code>

Inspect ''averaged.img''. If you averaged images of all 10 measurement sets the output will look like (without solution based flagging):

{{exercise-08-averaged.png|}}

=== Clean up ===

If you want to keep (part of) your work, make sure you copy it to an external machine (for instance using ''scp''). Once this is done, please remove your directory:
<code>
rm -rf /data/lofarschool/users/<your directory>
</code>