Differences

This shows you the differences between two versions of the page.

--- public:user_software:documentation:ndppp [2017-04-20 07:20] – Tammo Jan Dijkema
+++ public:user_software:documentation:ndppp [2021-02-26 14:18] (current) – [DPPP] Tammo Jan Dijkema
@@ Line 1: / Line 1: @@
-===== NDPPP =====
+===== DPPP =====
+==== Important ====
+A newer version of this documentation is available at https://www.astron.nl/citt/DP3
+==== Old documentation ====
 DPPP (the Default Preprocessing Pipeline, previously NDPPP for New Preprocessing Pipeline) is the LOFAR data pipelined processing program. It can be used to do all kind of operations on the data in a pipelined way, so the data are read and written only once.
@@ Line 27: / Line 34: @@
   * **[[#Counter]]** to count the number of flags per baseline, frequency, and correlation. A flagging step also counts how many visibilities it flagged. Counts can be saved to a table to be plotted later using function ''plotflags'' in python module ''lofar.dppp''.
   * **Data calibration** and **[[#Data scaling]]**
-    * **[[#ApplyCal]]** to apply an existing calibration (from BBS) to a MeasurementSet.
+    * **[[#ApplyCal]]** to apply an existing calibration to a MeasurementSet.
     * **[[#GainCal]]** to calibrate gains using StefCal.
+    * **[[#DDECal]]** to calibrate direction dependent gains.
     * **[[#Predict]]** to predict the visibilities of a given sky model.
+    * **[[#H5ParmPredict]]** to subtract multiple directions of visibilities corrupted by an instrument model (in H5Parm) generated by DDECal.
     * **[[#ApplyBeam]]** to apply the LOFAR beam model, or the inverse of it.
+    * **[[#SetBeam]]** to set the beam keywords after prediction.
     * **[[#ScaleData]]** to scale the data with a polynomial in frequency (based on SEFD of LOFAR stations).
+    * **[[#Upsample]]** to upsample visibilities in time
+    * **[[#Intermediate_output_step|Out]]** to add intermediate output steps
+  * **[[#Interpolate]]** for improving the accuracy of data averaging.
   * **[[#User defined]]** steps provide a plugin mechanism for arbitrary steps implemented in C++.
   * **[[#Python defined]]** steps provide a plugin mechanism for arbitrary steps implemented in Python.
@@ Line 64: / Line 77: @@
 </code>
 where WGHT is the weight put in by RTCP (number of samples used / total number of samples).
-\\ {{ DPPP_weights.pdf | This note}} discusses weighting in some more detail.
+\\ {{:public:user_software:documentation:ndppp_weights.pdf|This note}} discusses weighting in some more detail.
 === Flagging ===
@@ Line 128: / Line 141: @@
   * [[#PhaseShift|Data can be shifted]] to another phase center.
   * A shift step can shift back to the original phase center (by giving an empty center). If that is done by the last shift step, no new MS needs to be created.
+=== Upsample ===
+  * [[#Upsample|Upsampling]] data can be useful for at least one use case. Consider data that has been integrated for two seconds, by a correlator (the AARTFAAC correlator) that sometimes misses one second of data. The times of the visibilities will then look like [0, 2, 4, 7, 9, 12], each having integration time 2 seconds. DPPP will automatically fill missing time slots, which will lead to times [0, 2, 4, 6, 7, 9, 11, 12]. This is still a nonuniform time coverage, which is not desirable. Calling the upsample step with ''timestep=2'' on this data will create times [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] (it will remove the inserted dummy time slots that overlap, i.e. at 7 and 12). This data is then useful for further processing, e.g. averaging to 10 seconds.
 === Station summation ===
@@ Line 157: / Line 173: @@
   * The ''plotflags'' function in the Python module ''lofar.dppp'' can be used to plot those tables. It can plot multiple subbands by giving it a list of table names. The flags per station will be averaged for those subbands.
+=== Intermediate output step ===
+The step ''out'' can write data to disk at an intermediate stage. It takes the same arguments as the [[#Output|'msout' step]]. As an example, the following reduction will flag, save flagged data at high resolution, then average and save the result in another measurement set. On the averaged data, it will also apply a calibration table and save that in the ''CORRECTED_DATA'' column.
+<code>
+msin=L123.MS
+steps=[aoflag,out1,average,out2,applycal]
+# Write out flagged data at full resolution
+out1.type=out
+out1.name=L123-flagged.MS
+average.timestep=4
+# Write out averaged data
+out2.type=out
+out2.name=L123-averaged.MS
+out2.datacolumn=DATA
+applycal.parmdb=instrument.parmdb
+# Write the corrected data to CORRECTED_DATA
+msout=L123-averaged.MS
+msout.datacolumn=CORRECTED_DATA
+</code>
 === User defined step ===
@@ Line 166: / Line 206: @@
 DPPP.
 The name of such a shared library has to be the step type name. DPPP will try to load the
-library libxxx.so (or .dylib on OS-X) for a step type xxx.
+library libdppp_xxx.so (or .dylib on OS-X) for a step type xxx.
 To make this a bit more flexible it is possible to define multiple
@@ Line 186: / Line 226: @@
 === Python defined step ===
 The mechanism described above is used to make it possible to implement a user
-step in Python. The step type has to be ''pythoDPPP'' and the name of the Python module and class containing the code have to be given. DPPP will load the library ''libpythoDPPP.so'', which
+step in Python. The step type has to be ''pythoDPPP'' and the name of the Python module and class containing the code have to be given. DPPP will load the library ''libdppp_pythonDPPP.so'', which
 will start an embedded Python shell, load the module, and instantiate an object of the class.
 \\ A [[engineering:software:tools:DPPP:pythonstep| detailed description]] is available.
@@ Line 268: / Line 308: @@
 ==== Input ====
-| msin \\ msin.name | string | | Name of the input MeasurementSets. If a single name is given, it can be a glob-pattern (like L23456_SAP000_SB*) meaning that all MSs matching the pattern will be used. A glob-pattern can contain *, ?, [], and {} pattern characters (as used in bash). \\ If multiple MSs are to be used, their data are concatenated in frequency, thus multiple subbands are combined to a single band. In principle all MSs should exist, but if 'missingdata=true' and 'orderms=false' flagged zero data will be inserted for missing MS(s) and their frequency info will be deduced from the other MSs. |
-| msin.sort | bool | false | Does the MS need to be sorted in TIME order? |
+|msin \\ msin.name|string| |Name of the input MeasurementSets. If a single name is given, it can be a glob-pattern (like L23456_SAP000_SB*) meaning that all MSs matching the pattern will be used. A glob-pattern can contain *, ?, [], and {} pattern characters (as used in bash). \\ If multiple MSs are to be used, their data are concatenated in frequency, thus multiple subbands are combined to a single band. In principle all MSs should exist, but if 'missingdata=true' and 'orderms=false' flagged zero data will be inserted for missing MS(s) and their frequency info will be deduced from the other MSs.|
-| msin.orderms | bool | true | Do the MSs need to be ordered on frequency? If true, all MSs must exist, otherwise they cannot be ordered. If false, the MSs must be given in order of frequency. |
+|msin.sort|bool|false|Does the MS need to be sorted in TIME order?|
-| msin.missingdata | bool | false | true = it is allowed that a data column in an MS does not exist. In that case its data will be 0 and flagged. It can be useful if the CORRECTED_DATA of subbands are combined, but a BBS run for one of them failed. \\ If 'orderms=false', it also makes it possible that a MS is specified but does not exist.  In such a case flagged data will be used instead. The missing frequency info will be deduced from the other MSs where all MSs have to have the same number of channels and must be defined in order of frequency. |
+|msin.orderms|bool|true|Do the MSs need to be ordered on frequency? If true, all MSs must exist, otherwise they cannot be ordered. If false, the MSs must be given in order of frequency.|
-| msin.baseline | string | | Baselines to be selected (default is all baselines). See [[#Description of baseline selection parameters]]. Only the CASA baseline selection syntax as described in {{msselection.pdf | this note}} can be used. |
+|msin.missingdata|bool|false|true = it is allowed that a data column in an MS does not exist. In that case its data will be 0 and flagged. It can be useful if the CORRECTED_DATA of subbands are combined, but a BBS run for one of them failed. \\ If 'orderms=false', it also makes it possible that a MS is specified but does not exist. In such a case flagged data will be used instead. The missing frequency info will be deduced from the other MSs where all MSs have to have the same number of channels and must be defined in order of frequency.|
-| msin.band | integer | -1 | Band (spectral window) to select (<0 is no selection). This is mainly useful for WSRT data. |
+|msin.baseline|string| |Baselines to be selected (default is all baselines). See [[#description_of_baseline_selection_parameters|Description of baseline selection parameters]]. Only the CASA baseline selection syntax as described in {{:public:user_software:documentation:msselection.pdf| this note}}  can be used.|
-| msin.startchan | integer | 0 | First channel to use from the input MS (channel numbers start counting at 0). Note that skipped channels will not be written into the output MS. It can be an expression with `nchan` (nr of input channels) as parameter. E.g. \\ ''  nchan/32'' \\ will be fine for LOFAR observations with 64 and 256 channels. |
+|msin.band|integer|-1|Band (spectral window) to select (<0 is no selection). This is mainly useful for WSRT data.|
-| msin.nchan | integer | 0 | Number of channels to use from the input MS (0 means till the end). It can be an expression with `nchan` (nr of input channels) as parameter. E.g. \\ ''15*nchan/16'' |
+|msin.startchan|integer|0|First channel to use from the input MS (channel numbers start counting at 0). Note that skipped channels will not be written into the output MS. It can be an expression with `nchan` (nr of input channels) as parameter. E.g. \\  ''nchan/32'' \\ will be fine for LOFAR observations with 64 and 256 channels.|
-| msin.starttime | string | first time in MS | Center of first time slot to use; if < first time in MS, dummy time slots are inserted. A date/time must be specified in the casacore MVTime format, e.g. 19Feb2010/14:01:23.817 |
+|msin.nchan|integer|0|Number of channels to use from the input MS (0 means till the end). It can be an expression with `nchan` (nr of input channels) as parameter. E.g. \\  ''15*nchan/16'' |
-| msin.endtime | string | last time in MS | Center of last time slot to use; if > last time in MS, dummy time slots are inserted. |
+|msin.starttime|string|first time in MS|Center of first time slot to use; if < first time in MS, dummy time slots are inserted. A date/time must be specified in the casacore MVTime format, e.g. 19Feb2010/14:01:23.817|
-| msin.useflag | bool | true | Use the current flags in the MS? If false, all flags in the MS are ignore and the data (except NaN and infinite values) are assumed to be good and will be used in later steps. |
+|msin.starttimeslot|int|0|Starting time slot. This can be negative to insert flagged time slots before the beginning of the MS.|
-| msin.datacolumn | string | DATA | Data column to use. |
+|msin.endtime|string|last time in MS|Center of last time slot to use; if > last time in MS, dummy time slots are inserted.|
-| msin.weightcolumn | string | WEIGHT_SPECTRUM or WEIGHT | Weight column to use. Defaults to WEIGHT_SPECTRUM if this exists, otherwise the WEIGHT column is used. |
+|msin.ntimes|integer|0|Number of time slots to use (0 means till the end).|
-| msin.modelcolumn | string | MODEL_DATA | Model data column. Currently only used in gaincal |
+|msin.useflag|bool|true|Use the current flags in the MS? If false, all flags in the MS are ignore and the data (except NaN and infinite values) are assumed to be good and will be used in later steps.|
-| msin.autoweight | bool | false | Calculate weights using the auto-correlation data? It is meant for setting the proper weights for a raw LOFAR MeasurementSet. |
+|msin.datacolumn|string|DATA|Data column to use, i.e. the name of the column in which the visibilities are written.|
-| msin.forceautoweight | bool | false | In principle the calculation of the weights should only be done for the raw LOFAR data. It appeared that sometimes the ''autoweight'' switch was accidently set in a DPPP run on already dppp-ed data. To make it harder to make such mistakes, the ''forceautoweight'' flag has to be set as well for MSs containing dppp-ed data. |
+|msin.weightcolumn|string|WEIGHT_SPECTRUM or WEIGHT|Weight column to use. Defaults to WEIGHT_SPECTRUM if this exists, otherwise the WEIGHT column is used.|
+|msin.modelcolumn|string|MODEL_DATA|Model data column. Currently only used in gaincal and ddecal.|
+|msin.autoweight|bool|false|Calculate weights using the auto-correlation data? It is meant for setting the proper weights for a raw LOFAR MeasurementSet.|
+|msin.forceautoweight|bool|false|In principle the calculation of the weights should only be done for the raw LOFAR data. It appeared that sometimes the ''autoweight'' switch was accidently set in a DPPP run on already dppp-ed data. To make it harder to make such mistakes, the ''forceautoweight'' flag has to be set as well for MSs containing dppp-ed data.|
+\\
 ==== Output ====
@@ Line 295: / Line 341: @@
 | msout.clusterdesc | string | "" | If not empty, create the VDS file using this ClusterDesc file. |
 | msout.vdsdir | string | "" | Directory where to put the VDS file; if empty, the MS directory is used. |
-| msout.storagemanager | string | "" | What storage manager to use. When empty (default), the data will be stored uncompressed. When set to "dysco", the data will be compressed. Settings below will set the compression settings; see [[https://github.com/aroffringa/dysco/wiki|the Dysco wiki]] and [[https://arxiv.org/abs/1609.02019|the paper]] for more info. The default settings are reasonably conservative and safe. |
+| msout.storagemanager \\ msout.storagemanager.name| string | "" | What storage manager to use. When empty (default), the data will be stored uncompressed. When set to "dysco", the data will be compressed. Settings below will set the compression settings; see [[https://github.com/aroffringa/dysco/wiki|the Dysco wiki]] and [[https://arxiv.org/abs/1609.02019|the paper]] for more info. The default settings are reasonably conservative and safe. |
 | msout.storagemanager.databitrate | integer | 10 | Number of bits per float used for columns containing visibilities. Can be set to zero to compress weights only. |
-| msout.storagemanager.weightbitrate | integer | 12 | Number of bits per float used for WEIGHT_SPECTRUM column. |
+| msout.storagemanager.weightbitrate | integer | 12 | Number of bits per float used for WEIGHT_SPECTRUM column. Can be set to zero to compress data only. Note that compressing weights will set all polarizations to the same weight (determined by the minimum weight over the polarizations). |
 | msout.storagemanager.distribution | string | "TruncatedGaussian" | Assumed distribution for compression; "Uniform", "TruncatedGaussian", "Gaussian" or "StudentsT".|
 | msout.storagemanager.disttruncation | double | 2.5 | Truncation level for compression with the Truncated Gaussian distribution.|
@@ Line 310: / Line 356: @@
 | <step>.corrtype | string | "" | Correlation type to match? Must be auto, cross, or an empty string. |
 | <step>.remove | bool | false | If true, the stations not used in any baseline will be removed from the ANTENNA subtable and the antenna ids in the main table will be renumbered accordingly. To have a consistent output MeasurementSet, other subtables (FEED, POINTING, SYSCAL, LOFAR_ANTENNA_FIELD, LOFAR_ELEMENT_FAILURE, and QUALITY_BASELINE_STATISTIC) will also be updated. \\ Note that stations filtered previously (e.g. using msselect) will also be removed, even if no baseline selection is done in the filter step. |
+==== Upsample ====
+| <step>.type | string | | Case-insensitive step type; must be 'upsample'|
+| <step>.timestep | integer |  | Number of times into which each timestep will be expanded |
 ==== AOFlagger ====
@@ Line 316: / Line 366: @@
 | <step>.count.path | string | "" | The directory where to create the flag percentages table. If empty, the path of the input MS is used. |
 | <step>.strategy | string | "" | The name of the strategy file to use. If no name is given, the default strategy is used which is fine for HBA. For LBA data the strategy ''LBAdefault'' should be used. \\ A strategy file is looked up as given. If not found, it is looked up in $LOFARROOT/share/rfistrategies that contains the standard strategies. |
-| <step>.memoryperc | integer | 0 | If >0, percentage of the machine's memory to use. If ''memorymax'' nor ''memoryperc'' is given, all memory will be used (minus 2 GB (at most 50%) for other purposes). Accepts only integer values (LOFAR v2.16). |
+| <step>.memoryperc | integer | 0 | If >0, percentage of the machine's memory to use. If ''memorymax'' nor ''memoryperc'' is given, all memory will be used (minus 2 GB (at most 50%) for other purposes). Accepts only integer values (LOFAR v2.16). Limiting the available memory too much affects flagging accuracy; in general try to use at least 10 GB of memory. |
-| <step>.memorymax | double | 0 | Maximum amount of memory (in GB) to use. <=0 means no maximum. |
+| <step>.memorymax | double | 0 | Maximum amount of memory (in GB) to use. <=0 means no maximum. As stated above, this affects flagging accuracy.|
 | <step>.timewindow | integer | 0 | Number of time slots to be flagged jointly. The larger the time window, the better the flagging performs. 0 means that it will be deduced from the memory to use. Note that the time window can be extended with an overlap on the left and right side to minimize possible boundary effects.|
 | <step>.overlapperc | double | 0 or 1 | If >0, percentage of time window to be added to the left and right side for overlap purposes (to minimize boundary effects). If ''overlapmax'' is not given, it defaults to 1%. |
@@ Line 345: / Line 395: @@
 | <step>.type | string | | Case-insensitive step type; must be 'demixer' (or 'demix'). |
 | <step>.baseline | string | "" | Baselines to demix. See [[#Description of baseline selection parameters]]. |
-| <step>.blrange | double vector | "" | Baselines to demix. See [[#Description of baseline selection parameters]]. |
+| <step>.blrange | double vector | [] | Baselines to demix. See [[#Description of baseline selection parameters]]. |
 | <step>.corrtype | string | cross | Baselines to demix. Correlation type to match? Must be auto, cross, or an empty string. |
-| <step>.timestep | integer | 1 | Number of time slots to average when subtracting. It is truncated if exceeding the actual number of times. |
+| <step>.timestep | integer | 1 | Number of time slots to average when subtracting. It is truncated if exceeding the actual number of times. Note that the data itself will also be averaged by this amount. |
-| <step>.freqstep | integer | 1 | Number of channels to average when subtracting. It is truncated if exceeding the actual number of channels. |
+| <step>.freqstep | integer | 1 | Number of channels to average when subtracting. It is truncated if exceeding the actual number of channels.  Note that the data itself will also be averaged by this amount. |
 | <step>.demixtimestep | integer | timestep | Number of time slots to average when demixing. It is truncated if exceeding the actual number of times. It defaults to the averaging used for the subtract. |
 | <step>.demixfreqstep | integer | freqstep | Number of channels to average when demixing. It is truncated if exceeding the actual number of channels. It defaults to the averaging used for the subtract. |
@@ Line 372: / Line 422: @@
 | <step>.target.blrange | double vector | "" | Baselines to use in prediction of median target amplitude. See [[#Description of baseline selection parameters]]. |
 | <step>.target.corrtype | string | cross | Baselines to use in prediction of median target amplitude. Correlation type to match? Must be auto, cross, or an empty string. |
-| <step>.timestep | integer | 1 | Number of time slots to average when subtracting. It is truncated if exceeding the actual number of times. |
+| <step>.timestep | integer | 1 | Number of time slots to average when subtracting. It is truncated if exceeding the actual number of times.  Note that the data itself will also be averaged by this amount. |
-| <step>.freqstep | integer | 1 | Number of channels to average when subtracting. It is truncated if exceeding the actual number of channels. |
+| <step>.freqstep | integer | 1 | Number of channels to average when subtracting. It is truncated if exceeding the actual number of channels.  Note that the data itself will also be averaged by this amount. |
 | <step>.demixtimestep | integer | timestep | Number of time slots to average when demixing. It is truncated if exceeding the actual number of times. It defaults to the averaging used for the subtract. |
 | <step>.demixfreqstep | integer | freqstep | Number of channels to average when demixing. It is truncated if exceeding the actual number of channels. It defaults to the averaging used for the subtract. |
@@ Line 382: / Line 432: @@
 | <step>.target.skymodel | string | | The sky model of the target. It is the name of the SourceDB to use (i.e., the output of makesourcedb). |
 | <step>.target.delta | double | 60 | Angular distance uncertainty (in arcsec) to determine if an A-team source is at the same position as a target source. |
-| <step>.instrumentmodel | string | instrument | The name of the ParmDB to use. The ParmDB does not need to exist. If it does not exist it will be created. \\ Note that the ParmDB is created after the output MS is created, so it can be a subdirectory of the output MS. |
+| <step>.instrumentmodel | string | instrument | The name of the ParmDB to use. The ParmDB does not need to exist. If it does not exist it will be created. \\ Note that the ParmDB is created after the output MS is created, so it can be a subdirectory of the output MS.|
 | <step>.sources | string vector | "" | Names of the A-team sources to use. If none are given, all sources in the A-team sky model will be used. |
 | <step>.ateam.threshold | double | 50 for LBA \\ 5 for HBA | Take a source/baseline into account if its maximum estimated amplitude > threshold. |
@@ Line 444: / Line 494: @@
 | .uvmmax | double | -1 | If uvmmax > 0, baselines with UV-distance > uvmmax meter will match. |
 | .freqrange | string vector | [] | Channels in the given frequency ranges will match. Each value in the vector is a range which can be given as start..end or start+-delta. A value can be followed by a unit like KHz. If only one value in a range has a unit, the unit is also applied to the other value. If a range has no unit, it defaults to MHz. For example: ''freqrange=[1.2 .. 1.4 MHz, 1.8MHz+-50KHz]'' flags channels between 1.2MHz and 1.4MHz and between 1.75MHz and 1.85MHz. The example shows that blanks can be used at will. |
-| .chan | string vector | [] | The given channels will match (start counting at 0). Channels exceeding the number of channels are ignored. Similar to ''msin'', it is possible to specify the channels as an expression of ''nchan''. Furthermore, .. can be used to specify ranges. For example: ''chan=[0..nchan/32-1, 31*nchan/32..nchan]'' to flag the first and last 2 or 8 channels (depending on 64 or 256 channels in the observation). |
+| .chan | string vector | [] | The given channels will match (start counting at 0). Channels exceeding the number of channels are ignored. Similar to ''msin'', it is possible to specify the channels as an expression of ''nchan''. Furthermore, .. can be used to specify ranges. For example: ''chan=[0..nchan/32-1, 31*nchan/32..nchan-1]'' to flag the first and last 2 or 8 channels (depending on 64 or 256 channels in the observation). |
 | .amplmin | float vector | -1e30 | Correlation data with amplitude < amplmin will match. It can be given per correlation. For example, ''amplmin=[100,,,100]'' matches data points with XX or YY amplitude < 100. The non-specified amplitudes get the default value. \\ It is also possible to give a single value (without brackets) meaning that it is used as the minimum for all correlations. |
 | .amplmax | float vector | 1e30 | Correlation data with amplitude > amplmax will match. |
@@ Line 456: / Line 506: @@
 ==== ApplyCal ====
 | <step>.type | string | | Case-insensitive step type; must be 'applycal' (or 'correct'). |
-| <step>.parmdb | string | | Path of parmdb in which the parameters are stored. |
+| <step>.parmdb | string | | Path of parmdb in which the parameters are stored. This can also be an H5Parm file, in that case the filename has to end in '.h5' |
-| <step>.correction | string | gain | Type of correction to perform, can be one of 'gain', 'tec', 'clock', 'commonrotationangle', 'commonscalarphase', 'commonscalaramplitude' or 'rotationmeasure' (create multiple ApplyCal steps for multiple corrections). |
+| <step>.solset | string | "" | In case of applying an H5Parm file: the name of the solset to be used. If empty, defaults to the name of one solset present in the H5Parm (if more solsets are present in an H5Parm and solset is left empty, an error will be thrown)) |
+| <step>.correction | string | gain | Type of correction to perform, can be one of 'gain', 'tec', 'clock', '(common)rotationangle' / 'rotation', '(common)scalarphase', '(common)scalaramplitude' or 'rotationmeasure' (create multiple ApplyCal steps for multiple corrections). When using H5Parm, this is for now the name of the soltab; the type will be deduced from the metadata in that soltab, except for full Jones, in which case correction should be 'fulljones'.  |
+| <step>.soltab | string vector | from correction | The name or names of the H5 soltab. Currently only used when correction=fulljones, in which case soltab should list two names (amplitude and phase soltab). |
+| <step>.direction | string | "" | If using H5Parm, the direction of the solution to use |
 | <step>.updateweights | bool | false | Update the weights column, in a way consistent with the weights being inverse proportional to the autocorrelations (e.g. if 'autoweights' was used before). |
+| <step>.interpolation | string | nearest | If using H5Parm, the type of interpolation (in time and frequency) to use, can be one of 'nearest' or 'linear'. |
 | <step>.invert | bool | true | Invert the corrections, to correct the data. Default is true. If you want to corrupt the data, set it to 'false' |
 | <step>.timeslotsperparmupdate | int | 100 | Number of time slots to handle after one read of the parameter file. Optimization to prevent spurious reading from the parmdb. |
+| <step>.steps | list | [] | (new in version 3.1) ApplyCal substeps, e.g. [myApplyCal1, myApplyCal2]. Their parameters can be specified through e.g. <step>.myApplyCal1.correction=tec. If a parameter is not given for the substep, it takes the value from ''<step>.''. |
 ==== GainCal ====
 | <step>.type | string | | Case-insensitive step type; must be 'gaincal' or 'calibrate'. |
 | <step>.caltype | string | | The type of calibration that needs to be performed, can be one of 'fulljones', 'diagonal', 'phaseonly', 'scalarphase'. Experimental values are 'amplitude' or 'scalaramplitude', 'tec', 'tecandphase' |
-| <step>.parmdb | string | | Path of parmdb in which the computed parameters are to be stored. If the parmdb already exists, it will be overwritten. **Note**: You cannot use this parmdb in an applycal step in the same run of DPPP. To apply the solutions of the gaincal directly, use 'gaincal.applysolution' (see below)|
+| <step>.parmdb | string | | Path of parmdb in which the computed parameters are to be stored. If the parmdb already exists, it will be overwritten. **Note**: You cannot use this parmdb in an applycal step in the same run of DPPP. To apply the solutions of the gaincal directly, use 'gaincal.applysolution' (see below).  **New in LOFAR 3.1:** if the parmdb name ends in ''.h5'' , an H5Parm will be written.|
 | <step>.blrange | vector | | Vector of baseline lengths to use for calibration. See [[#Description of baseline selection parameters]]. New in version 2.20 |
+| <step>.uvlambdamin | double | 0 | Ignore baselines / channels with UV < uvlambdamin wavelengths. **Note**: also all other variants of uv flagging described in [[#UVWFlagger]] (uvmmin, uvmrange, uvlambdarange, etc) are supported (New in 3.1)|
 | <step>.baseline | string | | Baseline selection filter for calibration. See [[#Description of baseline selection parameters]]. New in version 2.20 |
 | <step>.applysolution | bool | false | Apply the calibration solution to the visibilities. Note that you should always also inspect the parmdb afterwards to check that the solutions look reasonable. |
 | <step>.solint | int | 1 | Number of time slots on which a solution is assumed to be constant (same as CellSize.Time in BBS). 0 means all time slots. Note that for larger settings of solint, and specially for solint = 0, the memory usage of gaincal will be large (all visibilities for a solint should fit in memory).|
-| <step>.nchan | int | 0 | Number of channels on which a solution is assumed to be constant (same as CellSize.Freq in BBS). 0 means all channels. |
+| <step>.nchan | int | 0 | Number of channels on which a solution is assumed to be constant (same as CellSize.Freq in BBS). 0 means all channels. When caltype = 'tec' or 'tecandphase', the default is 1, meaning that a TEC will be fitted through a phase for each channel. |
 | <step>.usemodelcolumn | bool | false | Use model column. The model column name can be specified with msin.modelcolumn (default MODEL_DATA) |
 | <step>.applybeamtomodelcolumn | bool | false | Apply the beam model (at the phase center) to the visibilities in the model column. If this option is true, all options from [[#applybeam]] are valid as well (except .invert, since the model data will always be corrupted for the beam)|
@@ Line 483: / Line 538: @@
 | <step>.sources | | | Same as in **Predict** step |
 | <step>.usebeammodel | | | Same as in **Predict** step |
-| <step>.operation | | | Same as in **Predict** step |
 | <step>.applycal.* | | | ApplyCal sub-step, same as in **Predict** step |
 | <step>.onebeamperpatch | | | Same as in **ApplyBeam** step |
 | <step>.usechannelfreq | | | Same as in **ApplyBeam** step |
 | <step>.beammode | | | Same as in **ApplyBeam** step |
+==== DDECal ====
+|<step>.type|string| |Case-insensitive step type; must be 'ddecal'.|
+|<step>.sourcedb|string| |Sourcedb (created with `makesourcedb`) with the sky model to calibrate on.|
+|<step>.directions|list|[]|List of directions to calibrate on. Every element of this list should b a list of facets. Default: every facet is a direction.|
+|<step>.usemodelcolumn|bool|false|Use model data from the measurement set. This implies solving for one direction, namely the pointing of the measurement set. If you specify usemodelcolumn to be true, directions and sourcedb are not required|
+|<step>.maxiter|int|50|Maximum number of iterations.|
+|<step>.detectstalling|bool|true|Stop iterating when no improvement is measured anymore (after a minimum of 30 iterations).|
+|<step>.stepsize|double|0.2|stepsize between iterations.|
+|<step>.h5parm|string| |Filename of output H5Parm (to be read by e.g. losoto). If empty, defaults to ''instrument.h5'' within the measurement set.|
+|<step>.solint|int|1|Solution interval in timesteps.|
+|<step>.usebeammodel|bool|false|use the beam model. All beam-related options of the Predict step are also valid.|
+|<step>.mode|string|diagonal|Type of constraint to apply. Options are scalarcomplexgain, scalarphase, scalaramplitude, tec, tecandphase. Modes in development are fulljones, diagonal, phaseonly, amplitudeonly, rotation, rotation+diagonal.|
+|<step>.tolerance|double|1e-5|Controls the accuracy to be reached: when the normalized solutions move less than this value, the solutions are considered to be converged and the algorithm finishes. Lower values will cause more iterations to be performed.|
+|<step>.minvisratio|double|0|Minimum number of visibilities within a solution interval, e.g. 0.6 for at least 60% unflagged vis. Intervals with fewer vis will be flagged.|
+|<step>.propagatesolutions|bool|false|Initialize solver with the solutions of the previous time slot.|
+|<step>.propagateconvergedonly|bool|false|Propagate solutions of the previous time slot only if the solve converged. Only effective when propagatesolutions=true.|
+|<step>.flagunconverged|bool|false|Flag unconverged solutions (i.e., those from solves that did not converge within maxiter iterations).|
+|<step>.flagdivergedonly|bool|false|Flag only the unconverged solutions for which divergence was detected. At the moment, this option is effective only for rotation+diagonal solves, where divergence is detected when the amplitudes of any station are found to be more than a factor of 5 from the mean amplitude over all stations. If divergence for any one station is detected, all stations are flagged for that solution interval. Only effective when flagunconverged=true and mode=rotation+diagonal.|
+|<step>.approximatetec|bool|false|Uses an approximation stage in which the phases are constrained with the piece-wise fitter, to solve local minima problems. Only effective when mode=tec or mode=tecandphase.|
+|<step>.maxapproxiter|int|maxiter/2|Maximum number of iterations during approximating stage.|
+|<step>.approxchunksize|int|0|Size of fitted chunksize during approximation stage in nr of channels. With approxchunksize=1 the constraint is disabled during the approx stage (so channels are solved for independently). Once converged, the solutions are constrained and more iterations are performed until that has converged too. The default is approxchunksize=0, which calculates the chunksize from the bandwidth (resulting in 10 chunks per octave of bandwidth).|
+|<step>.approxtolerance|double|tolerance*10|Tolerance at which the approximating first stage is considered to be converged and the second full-constraining stage is started. The second stage convergences when the tolerance set by the 'tolerance' keyword is reached. Setting approxtolerance to lower values will cause more approximating iterations. Since tolerance is by default 1e-5, approxtolerance is by default 1e-4.|
+|<step>.nchan|int|1|Number of channels in each channel block, for which the solution is assumed to be constant. The default is 1, meaning one solution per channel (or in the case of constraints, fitting the constraint over all channels individually). 0 means one solution for the whole channel range. If the total number of channels is not divisable by nchan, some channelblocks will become slightly larger.|
+|<step>.coreconstraint|double|0|Distance in meters. When unequal to 0, all stations within the given distance from the reference station (0) will be constraint to have the same solution.|
+|<step>.antennaconstraint|list|[]|A list of lists specifying groups of antennas that are to be constrained to have the same solution. Example: "[ [CS002HBA0,CS002HBA1],[CS003HBA0,CS003HBA1] ]" will keep the solutions of CS002HBA0 and 1 the same, and the same for CS003.|
+|<step>.smoothnessconstraint|double|0|Kernel size in Hz. When unequal to 0, will constrain the solutions to be smooth over frequency by convolving the solutions with a kernel of the given size (bandwidth). The default kernel is a Gaussian kernel, and the kernel size parameter is the 3 sigma point where the kernel is cut off.|
+|<step>.statfilename|string| |File to write the step-sizes to. Form of the file is: "<iterationnr> <normalized-stepsize> <unnormalized-stepsize>", and all solution intervals are concatenated. File is not written when this parameter is empty.|
+|<step>.uvlambdamin|double|0|Ignore baselines / channels with UV < uvlambdamin wavelengths. **Note**: also all other variants of uv flagging described in [[#uvwflagger|UVWFlagger]] (uvmmin, uvmrange, uvlambdarange, etc) are supported (New in 3.1).|
+|<step>.subtract|bool|false|Subtracts the corrected model from the data. **NOTE** This may not work when you apply a uv-cut.|
+|<step>.useidg|bool|false|Do image-based prediction using IDG.|
+|<step>.idg.images|list|[]|Filename of ''.fits'' model images, one per frequency term. The terms are defined as for a polynomial source spectra (not logarithmic), e.g. see [[https://sourceforge.net/p/wsclean/wiki/ComponentList/|this WSClean page]]. The frequency in the metadata of the fits files is used as nu<sub>0</sub> in the polynomial evaluation.|
+|<step>.idg.regions|string|""|DS9 regions file describing the facets for IDG prediction.|
+|<step>.idg.buffersize|int|Based on memory|Set the amount of timesteps that are to be used for each IDG buffer|
+|<step>.savefacets|bool|false|Write out each facet as a fits file (named facet<N>.fits). Only useful when useidg=true.|
+|<step>.onlypredict|bool|false|Instead of solving, output the predicted visibilities instead. This is useful for testing, although when doing faceted prediction with IDG, it might be fast for certain cases.|
+|<step>.applycal.*| | |ApplyCal sub-step, same as in Predict step. One can pass an h5parm with as many directions as set in "directions" and each direction model is corrupted accordingly.|
+\\
 ==== Predict ====
@@ Line 493: / Line 588: @@
 | <step>.sourcedb | string | | Path of sourcedb in which a sky model is stored (the output of makesourcedb)|
 | <step>.sources | string vector | [] | Patches to use in the predict step of the calibration |
+| <step>.usebeammodel | bool | false | Use the LOFAR beam in the predict part of the calibration |
+| <step>.operation | string | replace | Should the predicted visibilities replace those being processed (''replace'', default), should they be subtracted from those being processed (''subtract'') or added to them (''add'') |
+| <step>.applycal.* | | | Set of options for applycal to apply to this predict. For this applycal-substep, .invert is off by default, so the predicted visibilities will be corrupted with the parmdb |
+| <step>.onebeamperpatch | | | Same as in **ApplyBeam** step |
+| <step>.usechannelfreq | | | Same as in **ApplyBeam** step |
+| <step>.beammode | | | Same as in **ApplyBeam** step |
+==== H5ParmPredict ====
+| <step>.type | string | | Case-insensitive step type; must be 'h5parmpredict' |
+| <step>.sourcedb | string | | Path of sourcedb in which a sky model is stored (the output of makesourcedb)|
+| <step>.applycal.parmdb | string | | Path of the h5parm in which the corruptions are stored |
+| <step>.applycal.correction | string | | SolTab which contains the directions to be predicted, or "fulljones".|
+| <step>.directions | string vector | [] | List of directions to include. Each of those directions needs to be in the h5parm soltab. If empty, all directions in the soltab are predicted.  The names of the directions need to look like ''[dir1,dir2]'', where ''dir1'' and ''dir2'' are patches in the sourcedb. By default, the full list of directions is taken from the H5Parm. The convention for naming directions in DDECal in H5Parm is ''[patch1,patch2]''. This directions parameter can be used to predict / subtract a subset of the directions.||
 | <step>.usebeammodel | bool | false | Use the LOFAR beam in the predict part of the calibration |
 | <step>.operation | string | replace | Should the predicted visibilities replace those being processed (''replace'', default), should they be subtracted from those being processed (''subtract'') or added to them (''add'') |
@@ Line 502: / Line 610: @@
 ==== ApplyBeam ====
 | <step>.type | string | | Case-insensitive step type; must be 'applybeam' |
-| <step>.onebeamperpatch | bool | true | Compute the beam only for the center of each patch (saves computation time, but you should set this to false for large patches. This option is only useful if the beam is applied as part of a [[#predict]] step. |
+| <step>.direction | string vector | [] | A RA/Dec value specifying in what direction to correct the beam. See phaseshift.phasecenter for syntax. If empty, the beam is corrected in the direction of the current phase center. |
+| <step>.onebeamperpatch | bool | false | Compute the beam only for the center of each patch (saves computation time, but you should set this to false for large patches). In the ApplyBeam step, this setting does not make sense (but it does if the applybeam is part of predict, ddecal, gaincal, h5parmpredict, etc.). Generally, FALSE is the right setting for this option. The default has changed to false in a recent (Nov 2018) version. |
 | <step>.usechannelfreq | bool | **true** | Compute the beam for each channel of the measurement set separately. This is useful for merged / concatenated measurement sets. For raw LOFAR data you should set it to false, so that the beam will be formed as in the station hardware. Also, setting it to false is faster. |
 | <step>.updateweights | bool | false | Update the weights column, in a way consistent with the weights being inverse proportional to the autocorrelations (e.g. if 'autoweights' was used before). |
 | <step>.invert | bool | **true** | Invert the beam. When applying the beam to transfer calibration solutions, this should be true. In other words: ''invert=true'' means correcting for the beam, ''invert=false'' means corrupting with the beam. When using the beam in a predict (or gaincal) step, this option defaults to ''false'' (so it will corrupt for the beam). |
 | <step>.beammode | string | "default" | Beam mode to apply, can be "array_factor", "element" or "default". Default is to apply both the element beam and the array factor. |
+==== SetBeam ====
+SetBeam is an expert option and should only be used in rare cases. It allows direct manipulation of the beam-keywords for a column in a measurement set. Normally, DP3 registers whether the visibilities in a column are corrected for a beam or not, and if so, in what direction the beam was corrected for. This avoids incorrect corrections / scaling by the beam. However, certain actions can change the scaling of the visibilities without that the beam keywords are changed, in particular when predicting (either with DP3 or with another tool). When predicting a single source and not applying the beam, the visibilities are 'corrected' for the beam in the direction of the source. Under those circumstances, SetBeam can be used to modify the beam keywords. In that case, set ''direction'' to the source direction and ''beammode'' to default.
+| <step>.type | string | | Case-insensitive step type; must be 'setbeam' |
+| <step>.direction | string vector | [] | A RA/Dec value specifying in what direction the beam is corrected. |
+| <step>.beammode | string | "default" | Beam mode to apply, can be "array_factor", "element" or "default". Default means that sources in the given direction have corrected (intrinsic) flux values, i.e. they are corrected for the full beam. |
 ==== UVWFlagger ====
-| <step>.type | string | | Case-insensitive step type; must be 'uvwflagger' or 'uvwflag'. |
-| <step>.count.save | bool | false | If true, the flag percentages per frequency are saved to a table with extension ''.flagfreq'' and percentages per station to a table with extension ''.flagstat''. The basename of the table is the MS name (without extension) followed by the stepname and extension. |
+|<step>.type|string| |Case-insensitive step type; must be 'uvwflagger' or 'uvwflag'.|
-| <step>.count.path | string | "" | The directory where to create the flag percentages table. If empty, the path of the input MS is used. |
+|<step>.count.save|bool|false|If true, the flag percentages per frequency are saved to a table with extension ''.flagfreq'' and percentages per station to a table with extension ''.flagstat''. The basename of the table is the MS name (without extension) followed by the stepname and extension.|
-| <step>.uvmrange | string vector | [] | Flag baselines with UV within one the given ranges (in meters). Delimiters .. and +- can be used to specify a range. E.g., ''uvmrange = [20..30, 40+-5]'' flags baselines with UV in range 20-30 meter and 35-45 meter. |
+|<step>.count.path|string|""|The directory where to create the flag percentages table. If empty, the path of the input MS is used.|
-| <step>.uvmmin | double | 0 | Flag baselines with UV < uvmmin meter. |
+|<step>.uvmrange|string vector|[]|Flag baselines with UV within one the given ranges (in meters). Delimiters .. and +- can be used to specify a range. E.g., ''uvmrange = [20..30, 40+-5]'' flags baselines with UV in range 20-30 meter and 35-45 meter.|
-| <step>.uvmmax | double | 1e15 | Flag baselines with UV > uvmmax meter. |
+|<step>.uvmmin|double|0|Flag baselines with UV < uvmmin meter.|
-| <step>.umrange | string vector | [] | Flag baselines with U within one of the given ranges (in meters). |
+|<step>.uvmmax|double|1e15|Flag baselines with UV > uvmmax meter.|
-| <step>.ummin | double | 0 | Flag baselines with U < ummin meter. |
+|<step>.umrange|string vector|[]|Flag baselines with U within one of the given ranges (in meters).|
-| <step>.ummax | double | 1e15 | Flag baselines with U > ummax meter. |
+|<step>.ummin|double|0|Flag baselines with U < ummin meter.|
-| <step>.vmrange | string vector | [] | Flag baselines with V within one of the given ranges (in meters). |
+|<step>.ummax|double|1e15|Flag baselines with U > ummax meter.|
-| <step>.vmmin | double | 0 | Flag baselines with V < vmmin meter. |
+|<step>.vmrange|string vector|[]|Flag baselines with V within one of the given ranges (in meters).|
-| <step>.vmmax | double | 1e15 | Flag baselines with V > vmmax meter. |
+|<step>.vmmin|double|0|Flag baselines with V < vmmin meter.|
-| <step>.wmrange | string vector | [] | Flag baselines with W within one of the given ranges (in meters). |
+|<step>.vmmax|double|1e15|Flag baselines with V > vmmax meter.|
-| <step>.wmmin | double | 0 | Flag baselines with W < wmmin meter. |
+|<step>.wmrange|string vector|[]|Flag baselines with W within one of the given ranges (in meters).|
-| <step>.wmmax | double | 1e15 | Flag baselines with W > wmmax meter. |
+|<step>.wmmin|double|0|Flag baselines with W < wmmin meter.|
-| <step>.uvlambdarange | string vector | [] | Flag baselines/channels with UV within one the given ranges (in wavelengths). Delimiters .. and +- can be used to specify a range. E.g., ''uvlambdarange = [20..30, 40+-5]'' flags baselines/channels with UV in range 20-30 wavelengths and 35-45 wavelengths. |
+|<step>.wmmax|double|1e15|Flag baselines with W > wmmax meter.|
-| <step>.uvlambdamin | double | 0 | Flag baselines/channels with UV < uvlambdamin wavelengths |
+|<step>.uvlambdarange|string vector|[]|Flag baselines/channels with UV within one the given ranges (in wavelengths). Delimiters .. and +- can be used to specify a range. E.g., ''uvlambdarange = [20..30, 40+-5]'' flags baselines/channels with UV in range 20-30 wavelengths and 35-45 wavelengths.|
-| <step>.uvlambdamax | double | 1e15 | Flag baselines/channels with UV > uvlambdamax wavelengths |
+|<step>.uvlambdamin|double|0|Flag baselines/channels with UV < uvlambdamin wavelengths|
-| <step>.ulambdarange | string vector | [] | Flag baselines/channels with U within one the given ranges (in wavelengths). |
+|<step>.uvlambdamax|double|1e15|Flag baselines/channels with UV > uvlambdamax wavelengths|
-| <step>.ulambdamin | double | 0 | Flag baselines/channels with U < ulambdamin wavelengths |
+|<step>.ulambdarange|string vector|[]|Flag baselines/channels with U within one the given ranges (in wavelengths).|
-| <step>.ulambdamax | double | 1e15 | Flag baselines/channels with U > ulambdamax wavelengths |
+|<step>.ulambdamin|double|0|Flag baselines/channels with U < ulambdamin wavelengths|
-| <step>.vlambdarange | string vector | [] | Flag baselines/channels with V within one the given ranges (in wavelengths). |
+|<step>.ulambdamax|double|1e15|Flag baselines/channels with U > ulambdamax wavelengths|
-| <step>.vlambdamin | double | 0 | Flag baselines/channels with V < vlambdamin wavelengths |
+|<step>.vlambdarange|string vector|[]|Flag baselines/channels with V within one the given ranges (in wavelengths).|
-| <step>.vlambdamax | double | 1e15 | Flag baselines/channels with V > vlambdamax wavelengths |
+|<step>.vlambdamin|double|0|Flag baselines/channels with V < vlambdamin wavelengths|
-| <step>.wlambdarange | string vector | [] | Flag baselines/channels with W within one the given ranges (in wavelengths). |
+|<step>.vlambdamax|double|1e15|Flag baselines/channels with V > vlambdamax wavelengths|
-| <step>.wlambdamin | double | 0 | Flag baselines/channels with W < wlambdamin wavelengths |
+|<step>.wlambdarange|string vector|[]|Flag baselines/channels with W within one the given ranges (in wavelengths).|
-| <step>.wlambdamax | double | 1e15 | Flag baselines/channels with W > wlambdamax wavelengths |
+|<step>.wlambdamin|double|0|Flag baselines/channels with W < wlambdamin wavelengths|
-| <step>.phasecenter | string vector | [] | If given, use this phase center to calculate the UVW coordinates to flag on. The vector can consist of 1, 2 or, 3 values. If one value is given, it must be the name of a moving source (e.g. SUN or JUPITER). Otherwise the first two values must contain a source position that can be given in sexagesimal format or as a value followed by a unit. The third value can contain the direction type; it defaults to J2000. Possible types are GALACTIC, ECLIPTIC, SUPERGAL, J2000, B1950 (as defined in the casacore ''Measures'' system). |
+|<step>.wlambdamax|double|1e15|Flag baselines/channels with W > wlambdamax wavelengths|
+|<step>.phasecenter|string vector|[]|If given, use this phase center to calculate the UVW coordinates to flag on. The vector can consist of 1, 2 or, 3 values. If one value is given, it must be the name of a moving source (e.g. SUN or JUPITER). Otherwise the first two values must contain a source position that can be given in sexagesimal format or as a value followed by a unit. The third value can contain the direction type; it defaults to J2000. Possible types are GALACTIC, ECLIPTIC, SUPERGAL, J2000, B1950 (as defined in the casacore ''Measures'' system).|
+==== Split ====
+|<step>.type|string| |Case-insensitive step type; must be 'split' or 'explode'|
+|<step>.steps|string vector|[]|List of next steps; each step will run after this step. E.g. ''[average, msout]'' |
+|<step>.replaceparms|string vector|[]|The substep keys that should be different for each of the next steps. Instead of their default type, they should now be a list of those things. E.g. ''[average.timestep, msout.name]'' |
+\\
+==== Interpolate ====
+The interpolate step replaces flagged values by interpolating them using "neighbouring" samples (samples close in time and frequency). It calculates the Gaussian weighted sum over non-flagged samples, with a sigma parameter of one timestep/one channel. The flags are removed after interpolation. This is in particular useful in combination with averaging; by replacing flagged values before averaging, the output visibilities will more accurately represent the true sky. This step was aimed to solve frequency structure from flagging/averaging for the EoR experiment, but might be useful in other cases as a more accurate averaging step. Details are published in [[https://arxiv.org/abs/1901.04752|Offringa, Mertens and Koopmans (2018)]].
+| <step>.type | string | | Case-insensitive step type; must be 'interpolate'. |
+| <step>.windowsize | int | 15 | Size of the window over which a value is interpolated. Should be odd. |
 ==== Description of baseline selection parameters ====