===== Raw OLAP data formats (obsolete) ====
OLAP produces several data formats, which are intended to be replaced by their final format, such as HDF5.
===== After 2011-10-24 =====
Files adhere to the following naming scheme: ''Liiiii_SAPsssss_Bbbb_Sz_bf.raw'', with:
- ''iiiii'' = SAS observation ID
- ''sssss'' = Station beam number (SAP)
- ''bbb'' = Tied-array beam number (TAB)
- ''z'' = Stokes number
The stokes numbers are to be interpreted as follows:
- Complex Voltages:
- z = 0 -> Xr (X polarisation, real part)
- z = 1 -> Xi (X polarisation, imaginary part)
- z = 2 -> Yr (Y polarisation, real part)
- z = 3 -> Yi (Y polarisation, imaginary part)
- Coherent/incoherent Stokes:
- z = 0 -> I
- z = 1 -> Q
- z = 2 -> U
- z = 3 -> V
The data is encoded as follows. Each .raw file is a multiple of the following structure. All data is written as big-endian 32-bit IEEE floats.
struct block {
float sample[SUBBANDS][CHANNELS];
};
The constants used can be derived from the parset:
SUBBANDS = len(parset["Observation.subbandList"])
if (complex voltages || coherent stokes) {
CHANNELS = parset["OLAP.CNProc_CoherentStokes.channelsPerSubband"]
if (CHANNELS == 0) CHANNELS = parset["Observation.channelsPerSubband"]
} elif (incoherent stokes) {
CHANNELS = parset["OLAP.CNProc_IncoherentStokes.channelsPerSubband"]
if (CHANNELS == 0) CHANNELS = parset["Observation.channelsPerSubband"]
}
The sampling rate can be derived as follows:
# clock frequency (f.e. 200 MHz)
clock_hz = parset["Observation.sampleClock"] * 1.0e6
# subband frequency (f.e. 195 kHz)
base_subband_hz = clock_hz / 1024
# channel frequency (f.e. 763 Hz)
base_nrchannels = parset["Observation.channelsPerSubband"]
base_channel_hz = base_subband_hz / base_nrchannels
if(complex voltages || coherent stokes) {
cs_temporalintegration = parset["OLAP.CNProc_CoherentStokes.timeIntegrationFactor"]
sample_hz = base_channel_hz / cs_temporalintegration
} elif(incoherent stokes) {
is_temporalintegration = parset["OLAP.CNProc_IncoherentStokes.timeIntegrationFactor"]
sample_hz = base_channel_hz / is_temporalintegration
}
===== Before 2011-10-24 =====
Data can be recorded as either complex voltages (yielding X and Y polarisations) or one or more stokes. In either case, a sequence of blocks will be stored, each of which consists of a header and data. The header is defined as:
struct header {
uint32 sequence_number; /* big endian */
char padding[508];
};
in which sequence_number starts at 0, and is increased by 1 for every block. Missing sequence numbers implies missing data. The padding can have any value and is to be ignored.
==== Complex Voltages ====
Each (pencil) beam produces two files: one containing the X polarisation, and one containing the Y polarisation. The names of these files adhere to the following scheme:
|Lxxxxx_Byyy_S0_bf.raw|X polarisations of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S1_bf.raw|Y polarisations of beam yyy of observation xxxxx|
Proposed is the following scheme:
|Lxxxxx_Byyy_S0_bf.raw|X polarisation (real part) of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S1_bf.raw|X polarisation (imaginary part) of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S2_bf.raw|Y polarisation (real part) of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S3_bf.raw|Y polarisation (imaginary part) of beam yyy of observation xxxxx|
Each file is a sequence of blocks of the following structure:
struct block {
struct header header;
/* each block contains SAMPLES samples. The data structure is two samples larger (|2) for
technical reasons, but those two samples do not actually exist, and thus should be read
and immediately discarded. Time should just be incremented SAMPLES samples per block. */
/* big endian */
// 2010-09-20 release and later:
fcomplex voltages[SAMPLES|2][SUBBANDS][CHANNELS];
/*
// 2010-06-29 release and earlier stored data per subband instead of per beam:
fcomplex voltages[BEAMS][CHANNELS][SAMPLES|2][POLARIZATIONS];
*/
};
Older releases:
2010-09-20:
- filenames ended in -bf.raw instead of _bf.raw
==== Coherent Stokes ====
Each (pencil) beam produces one or four files: one containing the Stokes I (power) values, and optionally three files for Stokes Q, U, and V, respectively. The names of these files adhere to the following scheme:
|Lxxxxx_Byyy_S0_bf.raw|Stokes I of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S1_bf.raw|Stokes Q of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S2_bf.raw|Stokes U of beam yyy of observation xxxxx|
|Lxxxxx_Byyy_S3_bf.raw|Stokes V of beam yyy of observation xxxxx|
Each file is a sequence of blocks of the following structure:
// Since 2011-10-24, Stokes are just a continuous stream of samples:
struct block {
float stokes[SAMPLES][SUBBANDS][CHANNELS];
};
// Before 2011-10-24:
struct block {
struct header header;
/* each block contains SAMPLES samples. The data structure is two samples larger (|2) for
technical reasons, but those two samples do not actually exist, and thus should be read
and immediately discarded. Time should just be incremented SAMPLES samples per block. */
/* big endian */
// 2010-09-20 release and later:
float stokes[SAMPLES|2][SUBBANDS][CHANNELS];
/*
// 2010-06-29 release and earlier stored data per subband instead of per beam:
fcomplex voltages[BEAMS][CHANNELS][SAMPLES|2][STOKES];
*/
};
Older releases:
2010-09-20:
- Values of Stokes U and V are multiplied by 1/2
- filenames ended in -bf.raw instead of _bf.raw
==== Incoherent Stokes ====
Incoherent stokes are stored per subband, with one or four stokes per file, using the following naming convention:
|Lxxxxx_SByyy_bf.incoherentstokes|Stokes of subband yyy of observation xxxxx|
Each file is a sequence of blocks of the following structure:
struct block {
struct header header;
/* each block contains SAMPLES samples. The data structure is two samples larger (|2) for
technical reasons, but those two samples do not actually exist, and thus should be read
and immediately discarded. Time should just be incremented SAMPLES samples per block. */\
/* big endian */
// 2010-10-25 release and later:
float stokes[STOKES][CHANNELS][SAMPLES|2];
/*
// 2010-09-20 release:
float stokes[STOKES][SAMPLES|2][CHANNELS];
// 2010-06-29 release and earlier:
float stokes[CHANNELS][SAMPLES|2][STOKES];
*/
};
The order in which the Stokes values are stored is: I, Q, U, V.
Older releases:
2010-09-20:
- Values of Stokes U and V are multiplied by 1/2
- filenames ended in -bf.raw instead of _bf.raw
- data order changed
==== BFRaw format ====
Raw station data can be stored in a format called BFRaw. This format is used for debugging purposes and is not a regular observation mode, it takes more manpower to record it. The BFRaw format is recorded below for those who need to access it.
A BFRaw file starts with a file header containing the configuration:
struct file_header
{
// 0x3F8304EC, also determines endianness
uint32_t magic;
// The number of bits per sample (16)
uint8_t bitsPerSample;
// The number of polarizations (2)
uint8_t nrPolarizations;
// Number of subbands, maximum of 62
uint16_t nrSubbands;
// 155648 (160Mhz) or 196608 (200Mhz)
uint32_t nrSamplesPerSubband;
// Name of the station
char station[20];
// The sample rate: 156250.0 or 195312.5 .. double (number of samples per second for each subband)
double sampleRate;
// The frequencies within a subband
double subbandFrequencies[62];
// The beam pointing directions (RA, DEC in J2000)
double beamDirections[8][2];
// mapping from subbands to beams (SAPs)
int16_t subbandToSAPmapping[62];
// Padding to circumvent 8-byte alignment
uint32_t padding;
};
After the file header, there is a series of blocks until the end of file, configured using values from the file header:
struct block
// 0x2913D852
uint32_t magic;
// per-SAP information (up to 8 SAPs can be defined, but typically only 1 is used)
// number of samples the signal is shifted to align the station beam to the reference
// phase center (=Observation.referencePhaseCenter in the parset)
int32_t coarseDelayApplied[8];
// Padding to circumvent 8-byte alignment
uint8_t padding[4];
// the sub-sample delay which still has to be compensated for (in seconds),
// at the beginning and at the end of the block
double fineDelayRemainingAtBegin[8];
double fineDelayRemainingAfterEnd[8];
// Compatible with TimeStamp class (see below)
int64_t time[8];
struct marshalledFlags
{
// up to 16 ranges of flagged samples within this block
uint32_t nrFlagsRanges;
struct range
{
uint32_t begin; // inclusive
uint32_t end; // exclusive
} flagsRanges[16];
} flags[8];
std::complex
samples[fileHeader.nrSubbands][fileHeader.nrSamplesPerSubband][fileHeader.nrPolarizations];
};
To convert a TimeStamp-compatible int64_t to a C-readable timestamp, use
/* clockspeed is in Hz */
int64 nanoseconds = (int64) (timestamp * 1024 * 1e9 / clockspeed);
struct timespec ts;
ts.tv_sec = nanoseconds / 1000000000ULL;
ts.tv_nsec = nanoseconds % 1000000000ULL;
==== Types and constants ====
=== Types ===
A 'float' is a 32-bit IEEE floating point number. An 'fcomplex' is a complex number defined as
struct fcomplex {
float real;
float imag;
};
=== Constants ===
Constants can be computed using the parset file. Below is a translation between the C constants used above and their respective parset keys:
|SAMPLES |The number of time samples in a block |OLAP.CNProc.integrationSteps / OLAP.Stokes.integrationSteps|
|SUBBANDS|The number of subbands (beamlets) specified |len(Oberservation.subbandList)|
|CHANNELS|The number of channels per subband |Observation.channelsPerSubband|
|STOKES |The number of stokes calculated (1 or 4) |len(OLAP.Stokes.which)|
==== Useful routines ====
The following routines might be useful when reading raw OLAP data.
=== Byte swapping ===
Needed if you read data on a machine which used a different endianness. Typically, x86 machines (intel, amd) are little-endian, while the rest (sparc, powerpc, including the BlueGene/P) is big-endian.
#include // for uint32_t. On Windows, use UINT32.
uint32_t swap_uint32( uint32_t x )
{
union {
char c[4];
uint32_t i;
} src,dst;
src.i = x;
dst.c[0] = src.c[3];
dst.c[1] = src.c[2];
dst.c[2] = src.c[1];
dst.c[3] = src.c[0];
return dst.i;
}
/* Do NOT take a float as an argument. An incorrectly read float
(because it has the wrong endianness) is subject to modification
by the platform/compiler (normalisation etc). */
float swap_float( char *x )
{
union {
char c[4];
float f;
} dst;
dst.c[0] = x[3];
dst.c[1] = x[2];
dst.c[2] = x[1];
dst.c[3] = x[0];
return dst.f;
}
=== Variable-sized arrays ===
Since the dimensions of the arrays produced by OLAP depend on the parset, it's handy to have access to arrays with variable size. The easiest way is to use C++ and the boost library (which is often installed by default):
#include "boost/multi_array.hpp"
int main() {
/* create an array of floats with 2 dimensions, and initialise it to have dimensions [2][3] */
boost::multi_array myarray(boost::extents[2][3]);
/* getting and setting is the same as with regular C arrays */
myarray[1][2] = 1.0;
/* note: &myarray[0][0] (or myarray.origin()) is the address of the first element, which can be
used if the full array needs to be read from disk. */
return 0;
}
See also http://www.boost.org/doc/libs/1_43_0/libs/multi_array/doc/user.html
If you need to use C, things become a bit more cumbersome. You need to roll out your own multi-dimensional array, although you'll have to customise your code for each number of dimensions in order to keep your code readable. For example:
/* create an array of floats with 2 dimensions, max1 and max2 in size respectively */
struct myarray {
float *data;
unsigned max1,max2;
};
/* return myarray[one][two] */
float get( struct myarray *array, unsigned one, unsigned two )
{
return *(myarray.data + one * myarray.max2 + two);
}
/* set myarray[one][two] to value */
void set( struct myarray *array, unsigned one, unsigned two, float value )
{
*(myarray.data + one * myarray.max2 + two) = value;
}
int main() {
/* create an array of floats */
struct array myarray;
/* allocate the array with dimensions [2][3] */
myarray.max1 = 2;
myarray.max2 = 3;
myarray.data = malloc( myarray.max1 * myarray.max2 * sizeof *myarray );
/* emulate myarray[1][2] = 1.0 */
set(&myarray,1,2,1.0);
/* note: myarray.data is the address of the first element, which can be used if the full
array needs to be read from disk. */
/* free the array */
free( myarray.data );
return 0;
}
Keep in mind that if you need to switch endianness as well, you first need to read into a char array, and convert it to a float array after reading from disk. This is included in the example below.
== Example reading of OLAP data using (minimal) C++ and Boost ==
The following code reads raw complex voltages from disk.
#include "boost/multi_array.hpp"
#include
#include // for uint32_t. On Windows, use UINT32.
struct header {
uint32_t sequence_number;
char padding[508];
};
int is_bigendian() {
union {
char c[4];
uint32_t i;
} u;
u.i = 0x12345678;
return u.c[0] == 0x12;
}
uint32_t swap_uint32( uint32_t x )
{
union {
char c[4];
uint32_t i;
} src,dst;
src.i = x;
dst.c[0] = src.c[3];
dst.c[1] = src.c[2];
dst.c[2] = src.c[1];
dst.c[3] = src.c[0];
return dst.i;
}
float swap_float( char *x )
{
union {
char c[4];
float f;
} dst;
dst.c[0] = x[3];
dst.c[1] = x[2];
dst.c[2] = x[1];
dst.c[3] = x[0];
return dst.f;
}
int main()
{
// example file (60MB!) is available at
// http://www.astron.nl/~mol/L09330_B000_S0-example-stokes-I-248-subbands-16-channels-763-samples.raw
unsigned SUBBANDS = 248; // |Observation.subbandList|
unsigned CHANNELS = 16; // Observation.channelsPerSubband
unsigned SAMPLES = 12208 / 16; // OLAP.CNProc.integrationSteps / OLAP.Stokes.integrationSteps
unsigned FLOATSPERSAMPLE = 1; // 1 for Stokes, 2 for Complex Voltages (real and imaginary parts)
struct header header;
int swap_endian = !is_bigendian();
// the raw_array is read from disk and converted to the float_array
// the extra dimension [4] covers the size of a float in chars in the raw_array
boost::multi_array raw_array(boost::extents[SAMPLES|2][SUBBANDS][CHANNELS][FLOATSPERSAMPLE][4]);
boost::multi_array float_array(boost::extents[SAMPLES|2][SUBBANDS][CHANNELS][FLOATSPERSAMPLE]);
FILE *f = fopen( "L09330_B000_S0-example-stokes-I-248-subbands-16-channels-763-samples.raw", "rb" );
if (!f) {
puts( "Could not open input file." );
return 1;
}
while( !feof(f) ) {
// read header
if( fread( f, &header, sizeof header, 1 ) < 1 )
break;
if( swap_endian )
header.sequence_number = swap_uint32( header.sequence_number );
printf( "Reading block %u...\n", header.sequence_number );
// read data
if( swap_endian ) {
if( fread( f, raw_array.origin(), raw_array.num_elements(), 1 ) < 1 )
break;
// swap all data regardless of array dimensions
char *src = raw_array.origin();
float *dst = float_array.origin();
for( unsigned i = 0; i < float_array.num_elements(); i++ ) {
*dst = swap_float( src );
dst++; src += 4;
}
} else
if( fread( f, float_array.origin(), float_array.num_elements(), 1 ) < 1 )
break;
// process block here
}
fclose( f );
return 0;
}
==== Changelog for each release ====
|2010-10-25|Incoherent Stokes data order changed|
| |File naming scheme changed (-bf -> _bf)|
| |Stokes U and V are no longer multiplied by 1/2|
|2010-09-20|First release documented|