Tables
[tables package (libtables)]


Detailed Description

Tables are the data storage mechanism for AIPS++.

See below for an overview of the classes in this module.

Intended use:

Part of API

Review Status

Reviewed By:
jhorstko
Date Reviewed:
1994/08/30

Prerequisite

Etymology

"Table" is a formal term from relational database theory: <cite> "The organizing principle in a relational database is the TABLE, a rectangular, row/column arrangement of data values."</cite> AIPS++ tables are extensions to traditional tables, but are similar enough that we use the same name. There is also a strong resemblance between the uses of AIPS++ tables, and FITS binary tables, which provides another reason to use "Tables" to describe the AIPS++ data storage mechanism.

Synopsis

Tables are the fundamental storage mechanism for AIPS++. This document explains why they had to be made, what their properties are, and how to use them. The last subject is discussed and illustrated in a sequence of sections:

Motivation

The AIPS++ tables are mainly based upon the ideas of Allen Farris, as laid out in the AIPS++ Database document, from where the following paragraph is taken:

<BLOCKQUOTE> Traditional relational database tables have two features that decisively limit their applicability to scientific data. First, an item of data in a column of a table must be atomic -- it must have no internal structure. A consequence of this restriction is that relational databases are unable to deal with arrays of data items. Second, an item of data in a column of a table must not have any direct or implied linkages to other items of data or data aggregates. This restriction makes it difficult to model complex relationships between collections of data. While these restrictions may make it easy to define a mathematically complete set of data manipulation operations, they are simply intolerable in a scientific data-handling context. Multi-dimensional arrays are frequently the most natural modes in which to discuss and think about scientific data. In addition, scientific data often requires complex calibration operations that must draw on large bodies of data about equipment and its performance in various states. The restrictions imposed by the relational model make it very difficult to deal with complex problems of this nature. </BLOCKQUOTE>

In response to these limitations, and other needs, the AIPS++ tables were designed.

Table Properties

AIPS++ tables have the following properties:

Tables can be in one of three forms:

Concurrent access from different processes to the same plain table is fully supported by means of a locking/synchronization mechanism. Concurrent access over NFS is also supported.

A (somewhat primitive) mechanism is available to do a table lookup based on the contents of a key. In the future this might be replaced by a proper B+-tree index mechanism.

Opening an Existing Table

To open an existing table you just create a Table object giving the name of the table, like:

        Table readonly_table ("tableName");
        // or
        Table read_and_write_table ("tableName", Table::Update);

The constructor option determines whether the table will be opened as readonly or as read/write. A readonly table file must be opened as readonly, otherwise an exception is thrown. The functions Table::isWritable(.\..) can be used to determine if a table is writable.

When the table is opened, the data managers are reinstantiated according to their definition at table creation.

Reading from a Table

You can read data from a table column with the "get" functions in the classes ROScalarColumn<T> and ROArrayColumn<T> . For scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could instead use ROTableColumn::getScalar(.\..) or ROTableColumn::asXXX(.\..) . These functions offer an extra: they do automatic data type promotion; so that you can, for example, get a double value from a float column.

These "get" functions are used in the same way as the simple"put" functions described in the previous section.

ScalarColumn<T> is derived from ROScalarColumn<T>, and therefore has the same "get" functions. However, if a ScalarColumn<T> object is constructed for a non-writable column, an exception is thrown. Only ROScalarColumn<T> objects can be constructed for nonwritable columns. The same is true for ArrayColumn<T> and TableColumn .

A typical program could look like:

    #include <tables/Tables/Table.h>
    #include <tables/Tables/ScalarColumn.h>
    #include <tables/Tables/ArrayColumn.h>
    #include <casa/Arrays/Vector.h>
    #include <casa/Arrays/Slicer.h>
    #include <casa/Arrays/ArrayMath.h>
    #include <iostream>
    
    main()
    {
        // Open the table (readonly).
        Table tab ("some.name");
   
        // Construct the various column objects.
        // Their data type has to match the data type in the table description.
        ROScalarColumn<Int> acCol (tab, "ac");
        ROArrayColumn<Float> arr2Col (tab, "arr2");
   
        // Loop through all rows in the table.
        uInt nrrow = tab.nrow();
        for (uInt i=0; i<nrow; i++) {
            // Read the row for both columns.
            cout << "Column ac in row i = " << acCol(i) << endl;
            Array<Float> array = arr2Col.get (i);
        }
   
        // Show the entire column ac,
        // and show the 10th element of arr2 in each row.\.
        cout << ac.getColumn();
        cout << arr2.getColumn (Slicer(Slice(10)));
    }

Creating a Table

The creation of a table is a multi-step process:

  1. Create a table description.
  2. Create a SetupNewTable object with the name of the new table.
  3. Create the necessary data managers.
  4. Bind each column to the appropriate data manager. The system will bind unbound columns to data managers which are created internally using the default data manager name defined in the column description.
  5. Define the shape of direct columns (if that was not already done in the column description).
  6. Create the Table object from the SetupNewTable object. Here, a final check is performed and the necessary files are created.
The recipe above is meant for the creation a plain table, but the creation of a memory table is exactly the same. The only difference is that in call to construct the Table object the Table::Memory type has to be given. Note that in the SetupNewTable object the columns can be bound to any data manager. MemoryTable will rebind stored columns to the MemoryStMan storage manager, but virtual columns bindings are not changed.

The following example shows how you can create a table. An example specifically illustrating the creation of the table description is given in that section. Other sections discuss the access to the table.

    #include <tables/Tables/TableDesc.h>
    #include <tables/Tables/SetupNewTab.h>
    #include <tables/Tables/Table.h>
    #include <tables/Tables/ScaColDesc.h>
    #include <tables/Tables/ScaRecordColDesc.h>
    #include <tables/Tables/ArrColDesc.h>
    #include <tables/Tables/StandardStMan.h>
    #include <tables/Tables/IncrementalStMan.h>
    
    main()
    {
        // Step1 -- Build the table description.
        TableDesc td("tTableDesc", "1", TableDesc::Scratch);
        td.comment() = "A test of class SetupNewTable";
        td.addColumn (ScalarColumnDesc<Int> ("ab" ,"Comment for column ab"));
        td.addColumn (ScalarColumnDesc<Int> ("ac"));
        td.addColumn (ScalarColumnDesc<uInt> ("ad","comment for ad"));
        td.addColumn (ScalarColumnDesc<Float> ("ae"));
        td.addColumn (ScalarRecordColumnDesc ("arec"));
        td.addColumn (ArrayColumnDesc<Float> ("arr1",3,ColumnDesc::Direct));
        td.addColumn (ArrayColumnDesc<Float> ("arr2",0));
        td.addColumn (ArrayColumnDesc<Float> ("arr3",0,ColumnDesc::Direct));
    
        // Step 2 -- Setup a new table from the description.
        SetupNewTable newtab("newtab.data", td, Table::New);
   
        // Step 3 -- Create storage managers for it.
        StandardStMan stmanStand_1;
        StandardStMan stmanStand_2;
        IncrementalStMan stmanIncr;
    
        // Step 4 -- First, bind all columns to the first storage
        // manager. Then, bind a few columns to another storage manager
        // (which will overwrite the previous bindings).
        newtab.bindAll (stmanStand_1);
        newtab.bindColumn ("ab", stmanStand_2);
        newtab.bindColumn ("ae", stmanIncr);
        newtab.bindColumn ("arr3", stmanIncr);
    
        // Step 5 -- Define the shape of the direct columns.
        // (this could have been done in the column description).
        newtab.setShapeColumn( "arr1", IPosition(3,2,3,4));
        newtab.setShapeColumn( "arr3", IPosition(3,3,4,5));
    
        // Step 6 -- Finally, create the table consisting of 10 rows.
        Table tab(newtab, 10);
    
        // Now we can fill the table, which is shown in a next section.
        // The Table destructor will flush the table to the files.
    }
To create a table in memory, only step 6 has to be modified slightly to:
        Table tab(newtab, Table::Memory, 10);

Writing into a Table

Once a table has been created or has been opened for read/write, you want to write data into it. Before doing that you may have to add one or more rows to the table. Tip: When a table was created with a given number of rows, you do not need to add rows; you may not even be able to do so.

When adding new rows to the table, either via the Table(.\..) constructor or via the Table::addRow(.\..) function, you can choose to have those rows initialized with the default values given in the description.

To actually write the data into the table you need the classes ScalarColumn<T> and ArrayColumn<T> . For each column you can construct one or more of these objects. Their put(.\..) functions let you write a value at a time or the entire column in one go. For arrays you can "put" subsections of the arrays.

As an alternative for scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could use the functions TableColumn::putScalar(.\..) . These functions offer an extra: automatic data type promotion; so that you can, for example, put a float value in a double column.

A typical program could look like:

    #include <tables/Tables/TableDesc.h>
    #include <tables/Tables/SetupNewTab.h>
    #include <tables/Tables/Table.h>
    #include <tables/Tables/ScaColDesc.h>
    #include <tables/Tables/ArrColDesc.h>
    #include <tables/Tables/ScalarColumn.h>
    #include <tables/Tables/ArrayColumn.h>
    #include <casa/Arrays/Vector.h>
    #include <casa/Arrays/Slicer.h>
    #include <casa/Arrays/ArrayMath.h>
    #include <iostream>
    
    main()
    {
        // First build the table description.
        TableDesc td("tTableDesc", "1", TableDesc::Scratch);
        td.comment() = "A test of class SetupNewTable";
        td.addColumn (ScalarColumnDesc<Int> ("ac"));
        td.addColumn (ArrayColumnDesc<Float> ("arr2",0));
    
        // Setup a new table from the description,
        // and create the (still empty) table.
        // Note that since we do not explicitly bind columns to
        // data managers, all columns will be bound to the default
        // standard storage manager StandardStMan.
        SetupNewTable newtab("newtab.data", td, Table::New);
        Table tab(newtab);
   
        // Construct the various column objects.
        // Their data type has to match the data type in the description.
        ScalarColumn<Int> ac (tab, "ac");
        ArrayColumn<Float> arr2 (tab, "arr2");
        Vector<Float> vec2(100);
   
        // Write the data into the columns.
        // In each cell arr2 will be a vector of length 100.
        // Since its shape is not set explicitly, it is done implicitly.
        for (uInt i=0; i<10; i++) {
            tab.addRow();               // First add a row.
            ac.put (i, i+10);           // value is i+10 in row i
            indgen (vec2, float(i+20)); // vec2 gets i+20, i+21, .\.., i+119
            arr2.put (i, vec2); 
        }
   
        // Finally, show the entire column ac,
        // and show the 10th element of arr2.
        cout << ac.getColumn();
        cout << arr2.getColumn (Slicer(Slice(10)));
   
        // The Table destructor writes the table.
    }

In this example we added rows in the for loop, but we could also have created 10 rows straightaway by constructing the Table object as:

        Table tab(newtab, 10);
in which case we would not include
        tab.addRow()

The classes TableColumn , ScalarColumn<T> , and ArrayColumn<T> contain several functions to put values into a single cell or into the whole column. This may look confusing, but is actually quite simple. The functions can be divided in two groups:

  1. Put the given value into the column cell(s).

  2. Copy values from another column to this column.
    These functions have the advantage that the data type of the input and/or output column can be unknown. The generic (RO)TableColumn objects can be used for this purpose. The put(Column) function checks the data types and, if possible, converts them. If the conversion is not possible, it throws an exception. Each class has its own set of these functions.

Accessing rows in a Table

Apart from accessing a table column-wise as described in the previous two sections, it is also possible to access a table row-wise. The TableRow class makes it possible to access multiple fields in a table row as a whole. Note that like the XXColumn classes described above, there is also an ROTableRow class for access to readonly tables.

On construction of a TableRow object it has to be specified which fields (i.e. columns) are part of the row. For these fields a fixed structured TableRecord object is constructed as part of the TableRow object. The TableRow::get function will fill this record with the table data for the given row. The user has access to the record and can use RecordFieldPtr objects for speedier access to the record.

The class could be used as shown in the following example.

    // Open the table as readonly and define a row object to contain
    // the given columns.
    // Note that the function stringToVector is a very convenient
    // way to construct a Vector<String>.
    // Show the description of the fields in the row.
    Table table("Some.table");
    ROTableRow row (table, stringToVector("col1,col2,col3"));
    cout << row.record().description();
    // Since the structure of the record is known, the RecordFieldPtr
    // objects could be used to allow for easy and fast access to
    // the record which is refilled for each get.
    RORecordFieldPtr<String> col1(row.record(), "col1");
    RORecordFieldPtr<Double> col2(row.record(), "col2");
    RORecordFieldPtr<Array<Int> > col3(row.record(), "col3");
    for (uInt i=0; i<table.nrow(); i++) {
        row.get (i);
        someString = *col1;
        somedouble = *col2;
        someArrayInt = *col3;
    }
The description of TableRow contains some more extensive examples.

Table Selection and Sorting

The result of a select and sort of a table is another table, which references the original table. This means that an update of a sorted or selected table results in the update of the original table. The result is, however, a table in itself, so all table functions (including select and sort) can be used with it. Note that a true copy of such a reference table can be made with the Table::deepCopy function.

Rows or columns can be selected from a table. Columns can be selected by the Table::project(.\..) function, while rows can be selected by the various Table operator() functions. Usually a row is selected by giving a select expression with TableExprNode objects. These objects represent the various nodes in an expression, e.g. a constant, a column, or a subexpression. The Table function Table::col(.\..) creates a TableExprNode object for a column. The function Table::key(.\..) does the same for a keyword by reading the keyword value and storing it as a constant in an expression node. All column nodes in an expression must belong to the same table, otherwise an exception is thrown. In the following example we select all rows with RA>10:

       #include <tables/Tables/ExprNode.h>
       Table table ("Table.name");
       Table result = table (table.col("RA") > 10);
while in the next one we select rows with RA and DEC in the given intervals:
       Table result = table (table.col("RA") > 10
                          && table.col("RA") < 14
                          && table.col("DEC") >= -10
                          && table.col("DEC") <= 10);
The following operators can be used to form arbitrarily complex expressions: Many functions (like sin, max, conj) can be used in an expression. Class TableExprNode shows the available functions. E.g.
       Table result = table (sin (table.col("RA")) > 0.5);
Function in can be used to select from a set of values. A value set can be constructed using class TableExprNodeSet .
       TableExprNodeSet set;
       set.add (TableExprNodeSetElem ("abc"));
       set.add (TableExprNodeSetElem ("defg"));
       set.add (TableExprNodeSetElem ("h"));
       Table result = table (table.col("NAME).in (set));
select rows with a NAME equal to abc, defg, or h.

You can sort a table on one or more columns containing scalars. In this example we simply sort on column RA (default is ascending):

       Table table ("Table.name");
       Table result = table.sort ("RA");
Multiple Table::sort(.\..) functions exist which allow for more flexible control over the sort order. In the next example we sort first on RA in descending order and then on DEC in ascending order:
       Table table ("Table.name");
       Block<String> sortKeys(2);
       Block<int>    sortOrders(2);
       sortKeys(0)   = "RA";
       sortOrders(0) = Sort::Descending;
       sortKeys(1)   = "DEC";
       sortOrders(1) = Sort::Ascending;
       Table result = table.sort (sortKeys, sortOrders);

Tables stemming from the same root, can be combined in several ways with the help of the various logical Table operators (operator|, etc.).

Table Query Language

The selection and sorting mechanism described above can only be used in a hard-coded way in a C++ program. There is, however, another way. Strings containing selection and sorting commands can be used. The syntax of these commands is based on SQL and is described in the Table Query Language (TaQL).
Such a command can be executed with the static function TableParse::tableCommand defined in class TableParse .

Table Iterators

You can iterate through a table in an arbitrary order by getting a subset of the table consisting of the rows in which the iteration columns have the same value. An iterator object is created by constructing a TableIterator object with the appropriate column names.

In the next example we define an iteration on the columns Time and Baseline. Each iteration step returns a table subset in which Time and Baseline have the same value.

       // Iterate over Time and Baseline (by default in ascending order).
       // Time is the main iteration order, thus the first column specified.
       Table t;
       Table tab ("UV_Table.data");
       Block<String> iv0(2);
       iv0[0] = "Time";
       iv0[1] = "Baseline";
       //
       // Create the iterator. This will prepare the first subtable.
       TableIterator iter(tab, iv0);
       Int nr = 0;
       while (!iter.pastEnd()) {
           // Get the first subtable.
           // This will contain rows with equal Time and Baseline.
           t = iter.table();
           cout << t.nrow() << " ";
           nr++;
           // Prepare the next subtable with the next Time,Baseline value.
           iter.next();
       }
       cout << endl << nr << " iteration steps" << endl;

You can define more than one iterator on the same table; they operate independently.

Note that the result of each iteration step is a table in itself which references the original table, just as in the case of a sort or select. This means that the resulting table can be used again in a sort, select, iteration, etc.\.

Table Vectors

A table vector makes it possible to treat a column in a table as a vector. Almost all operators and functions defined for normal vectors, are also defined for table vectors. So it is, for instance, possible to add a constant to a table vector. This has the effect that the underlying column gets changed.

You can use the templated classes ROTableVector and TableVector and to define a table vector (readonly and read/write, respectively) for a scalar column. Columns containing arrays or tables are not supported. The data type of the (RO)TableVector object must match the data type of the column. A table vector can also hold a normal vector so that (temporary) results of table vector operations can be handled.

In the following example we double the data in column COL1 and store the result in a temporary table vector.

       // Create a table vector for column COL1.
       // It has to be a ROTableVector, because the table is opened
       // as readonly.
       Table tab ("Table.data");
       ROTableVector<Int> tabvec(tab, "COL1");
       // Multiply it by a constant.
       // The result has to be stored in a TableVector,
       // since a ROTableVector cannot be written to.
       TableVector<Int> temp = 2 * tabvec;

In the next example we double the data in COL1 and put the result back in the column.

       // Create a table vector for column COL1.
       // It has to be a TableVector to be able to change the column.
       Table tab ("Table.data", Table::Update);
       TableVector<Int> tabvec(tab, "COL1");
       // Multiply it by a constant.
       tabvec *= 2;

Table Keywords

Any number of keyword/value pairs may be attached to the table as a whole, or to any individual column. They may be freely added, retrieved, re-assigned, or deleted. They are, in essence, a self-resizing list of values (any of the primitive types) indexed by Strings (the keyword).

A table keyword/value pair might be

         Observer = Grote Reber
         Date = 10 october 1942
Column keyword/value pairs might be
         Units = mJy
         Reference Pixel = 320
The class TableRecord represents the keywords in a table. It is (indirectly) derived from the standard record classes in the class Record

Table Description

A table contains a description of itself, which defines the layout of the columns and the keyword sets for the table and for the individual columns. It may also define initial keyword sets and default values for the columns. Such a default value is automatically stored in a cell in the table column, whenever a row is added to the table.

The creation of the table descriptor is the first step in the creation of a new table. The description is part of the table itself, but may also exist in a separate file. This is useful when you need to create a number of tables with the same structure; in other circumstances it probably should be avoided.

The public classes to set up a table description are:

Here follows a typical example of the construction of a table description. For more specialized things -- like the definition of a default data manager -- we refer to the descriptions of the above mentioned classes.

    #include <tables/Tables/TableDesc.h>
    #include <tables/Tables/ScaColDesc.h>
    #include <tables/Tables/ArrColDesc.h>
    #include <aips/Tables/ScaRecordTabDesc.h>
    #include <tables/Tables/TableRecord.h>
    #include <casa/Arrays/IPosition.h>
    #include <casa/Arrays/Vector.h>
   
    main()
    {
        // Create a new table description
        // Define a comment for the table description.
        // Define some keywords.
        ColumnDesc colDesc1, colDesc2;
        TableDesc td("tTableDesc", "1", TableDesc::New);
        td.comment() = "A test of class TableDesc";
        td.rwKeywordSet().define ("ra" float(3.14));
        td.rwKeywordSet().define ("equinox", double(1950));
        td.rwKeywordSet().define ("aa", Int(1));
   
        // Define an integer column ab.
        td.addColumn (ScalarColumnDesc<Int> ("ab", "Comment for column ab"));
   
        // Add a scalar integer column ac, define keywords for it
        // and define a default value 0.
        // Overwrite the value of keyword unit.
        ScalarColumnDesc<Int> acColumn("ac");
        acColumn.rwKeywordSet().define ("scale" Complex(0,0));
        acColumn.rwKeywordSet().define ("unit", "");
        acColumn.setDefault (0);
        td.addColumn (acColumn);
        td.rwColumnDesc("ac").rwKeywordSet().define ("unit", "DEG");
   
        // Add a scalar string column ad and define its comment string.
        td.addColumn (ScalarColumnDesc<String> ("ad","comment for ad"));
   
        // Now define array columns.
        // This one is indirect and has no dimensionality mentioned yet.
        td.addColumn (ArrayColumnDesc<Complex> ("Arr1","comment for Arr1"));
        // This one is indirect and has 3-dim arrays.
        td.addColumn (ArrayColumnDesc<Int> ("A2r1","comment for Arr1",3));
        // This one is direct and has 2-dim arrays with axes length 4 and 7.
        td.addColumn (ArrayColumnDesc<uInt> ("Arr3","comment for Arr1",
                                             IPosition(2,4,7),
                                             ColumnDesc::Direct));
   
        // Add columns containing records.
        td.addColumn (ScalarRecordColumnDesc ("Rec1"));
    }

Data Managers

Data managers take care of the actual access to the data in a column. There are two kinds of data managers:

  1. Storage managers -- which store the data as such. They can only handle the standard data type (Bool,.\..,String) as discussed in the section about the table properties).
  2. Virtual column engines -- which manipulate the data. An engine could be a simple thing like scaling the data (as done in classic AIPS to reduce data storage), but it could also be an elaborate thing like applying corrections on-the-fly.
    An engine must be used to store data objects with a non-standard type. It has to break down the object into items with standard data types which can be stored with a storage manager.
In general the user of a table does not need to be aware which data managers are being used underneath. Only when the table is created data managers have to be bound to the columns. Thereafter it is completely transparent.

Storage Managers

Several storage managers are currently supported. The default and preferred storage manager is StandardStMan. Other storage managers should only be used when they pay off in file space (like IncrementalStMan for slowly varying data) or access speed (like the tiled storage managers for large data arrays).
The storage managers store the data in a big or little endian canonical format. The format can be specified when the table is created. By default it uses the endian format as specified in the aipsrc variable table.endianformat which can have the value local, big, or little. The default is local.

  1. StandardStMan stores all the values in so-called buckets (equally sized chunks in the file). It requires little memory.
    It replaces the old StManAipsIO.

  2. IncrementalStMan uses a storage mechanism resembling "incremental backups". A value is only stored when it is different from the previous row. It is very well suited for slowly varying data.
    The class ROIncrementalStManAccessor can be used to tune the behaviour of the IncrementalStMan. It contains functions to deal with the cache size and to show the behaviour of the cache.

  3. The Tiled Storage Managers store the data as a tiled hypercube allowing for more or less equally efficient data access along all main axes. It can be used for UV-data as well as for image data.

  4. StManAipsIO uses AipsIO to store the data in the columns. It supports all table functionality, but its I/O is probably not as efficient as other storage managers. It also requires that a large part of the table fits in memory.
    It should not be used anymore, because it uses a lot of memory for larger tables and because it is not very robust in case an application or system crashes.

MemoryStMan holds the data in memory. It means that data 'stored' with this storage manager are NOT persistent.
This storage manager is primarily meant for tables held in memory, but it can also be useful for temporary columns in normal tables. Note, however, that when a table is accessed concurrently from multiple processes, MemoryStMan data cannot be synchronized.

The storage manager framework makes it possible to support arbitrary files as tables. This has been used in a case where a file is filled by the data acquisition system of a telescope. The file is simultaneously used as a table using a dedicated storage manager. The table system and storage manager provide a sync function to synchronize the processes, i.e. to make the table system aware of changes in the file size (thus in the table size) by the filling process.

Tip: Not all data managers support all the table functionality. So, the choice of a data manager can greatly influence the type of operations you can do on the table as a whole. For example, if a column uses the tiled storage manager, it is not possible to delete rows from the table, because that storage manager will not support deletion of rows. However, it is always possible to delete all columns of a data manager in one single call.

Tiled Storage Manager

The Tiled Storage Managers allow one to store the data of one or more columns in a tiled way. Tiling means that the data are stored without a preferred order to make access along the different main axes equally efficient. This is done by storing the data in so-called tiles (i.e. equally shaped subsets of an array) to increase data locality. The user can define the tile shape to optimize for the most frequently used access.

The Tiled Storage Manager has the following properties:

The Tiled Storage Managers use internal caches to minimize IO. It is possible to define a maximum cache size. The description of class ROTiledStManAccessor contains a discussion about the effect of defining a maximum cache size.

The following Tiled Storage Managers are available:

TiledCellStMan
creates (automatically) a new hypercube for each row. Thus each row of the hypercolumn is stored in a separate hypercube. Note that the row number serves as the id value. So an id column is not needed, although there are multiple hypercubes.
This storage manager is meant for tables where the data arrays in the different rows are not accessed together. One can think of a column containing images. Each row contains an image and only one image is shown at a time.
TiledColumnStMan
creates one hypercube for the entire hypercolumn. Thus all cells in the hypercube have to have the same shape and therefore this storage manager is only possible when all columns in the hypercolumn have the attribute FixedShape.
This storage manager could be used for a table with a column containing images for the Stokes parameters I, Q, U, and V. By storing them in one hypercube, it is possible to retrieve the 4 Stokes values for a subset of the image or for an individual pixel in a very efficient way.
TiledDataStMan
allows one to control the creation and extension of hypercubes. This is done by means of the class

TiledDataStManAccessor . This makes it possible to store, say, row 0-9 in hypercube A, row 10-34 in hypercube B, row 35-54 in hypercube A again, etc.\.
This storage manager could be used to store UV-data with a mix of continuum and line data.

TiledShapeStMan
can be seen as a specialization of TiledDataStMan by using the array shape as the id value. Similarly to TiledDataStMan it can maintain multiple hypercubes and store multiple rows in a hypercube, but is is easier to use, because the special addHypercube and extendHypercube functions are not needed. An hypercube is automatically added when a new array shape is encountered.
This storage manager could be used for a table with a column containing line and continuum data, which will result in 2 hypercubes.

For example:
UV-data and weights have to be stored in a table. The data have the coordinates Pol, Freq, Baseline and Time. There is continuum and line data, which have to be stored in 2 separate hypercubes. This could lead to the following scenario when creating/filling the table:

An alternative scenario could be that the data in the source is not in time order, but that the size of the data is known. In that case the hypercubes can be defined with their correct shape and putColumn (with a Slicer) can be used to put the data (and reorder them implicitly).
Another alternative is to use TiledShapeStMan, so the hypercubes are added or extended automatically.

Virtual Column Engines

Virtual column engines are used to implement the virtual (i.e. calculated-on-the-fly) columns. The Table system provides an abstract base class (or "interface class") VirtualColumnEngine that specifies the protocol for these engines. The programmer must derive a concrete class to implement the application-specific virtual column.

For example: the programmer needs a column in a table which is the difference between two other columns. (Perhaps these two other columns are updated periodically during the execution of a program.) A good way to handle this would be to have a virtual column in the table, and write a virtual column engine which knows how to calculate the difference between corresponding cells of the two other columns. So the result is that accessing a particular cell of the virtual column invokes the virtual column engine, which then gets the values from the other two columns, and returns their difference. This particular example could be done using VirtualTaQLColumn .

Several virtual column engines exist:

  1. The class VirtualTaQLColumn makes it possible to define a column as an arbitrary expression of other columns. It uses the TaQL CALC command. The virtual column can be a scalar or an array and can have one of the standard data types supported by the Table System.
  2. The class CompressFloat compresses a single precision floating point array by scaling the values to shorts (16-bit integer).
  3. The class CompressComplex compresses a single precision complex array by scaling the values to shorts (16-bit integer). In fact, the 2 parts of the complex number are combined to an 32-bit integer.
  4. The class CompressComplexSD does the same as CompressComplex, but optimizes for the case where the imaginary part is zero (which is often the case for Single Dish data).
  5. The double templated class ScaledArrayEngine scales the data in an array from, for example, float to short before putting it.
  6. The double templated class MappedArrayEngine converts the data from one data type to another. Sometimes it might be needed to store the residual data in an MS in double precision. Because the imaging task can only handle single precision, this enigne can be used to map the data from double to single precision.
  7. The double templated class RetypedArrayEngine converts the data from one data type to another with the possibility to reduce the number of dimensions. For example, it can be used to store an 2-d array of StokesVector objects as a 3-d array of floats by treating the 4 data elements as an extra array axis. When the StokesVector class is simple, it can be done very efficiently.
  8. The class

    ForwardColumnEngine forwards the gets and puts on a row in a column to the same row in a column with the same name in another table. This provides a virtual copy of the referenced column.

  9. The class

    ForwardColumnIndexedRowEngine is similar to ForwardColumnEngine.. However, instead of forwarding it to the same row it uses a a column to map its row number to a row number in the referenced table. In this way multiple rows can share the same data. This data manager only allows for get operations.

  10. The calibration module has implemented a virtual column engine to do on-the-fly calibration in a transparent way.
To handle arbitrary data types the templated abstract base class VSCEngine has been written. An example of how to use this class can be found in the demo program dVSCEngine.cc.

Table locking and synchronization

Multiple concurrent readers and writers (also via NFS) of a table are supported by means of a locking/synchronization mechanism. This mechanism is not very sophisticated in the sense that it is very coarsely grained. When locking, the entire table gets locked. A special lock file is used to lock the table. This lock file also contains some synchronization data.

Five ways of locking are supported (see class TableLock ):

TableLock::PermanentLocking(Wait)
locks the table permanently (from open till close). This means that one writer OR multiple readers are possible.
TableLock::AutoLocking
does the locking automatically. This is the default mode. This mode makes it possible that a table is shared amongst processes without the user needing to write any special code. It also means that a lock is only released when needed.
TableLock::AutoNoReadLocking
is similar to AutoLocking. However, no lock is acquired when reading the table making it possible to read the table while another process holds a write-lock. It also means that for read purposes no automatic synchronization is done when the table is updated in another process. Explicit synchronization can be done by means of the function Table::resync.
TableLock::UserLocking
requires that the programmer explicitly acquires and releases a lock on the table. This makes some kind of transaction processing possible. E.g. set a write lock, add a row, write all data into the row and release the lock. The Table functions lock and unlock have to be used to acquire and release a (read or write) lock.
TableLock::UserNoReadLocking
is similar to UserLocking. However, similarly to AutoNoReadLocking no lock is needed to read the table.
Synchronization of the processes accessing the same table is done by means of the lock file. When a lock is released, the storage managers flush their data into the table files. Some synchronization data is written into the lock file telling the new number of table rows and telling which storage managers have written data. This information is read when another process acquires the lock and is used to determine which storage managers have to refresh their internal caches.
Note that for the NoReadLocking modes (see above) explicit synchronization might be needed using Table::resync.

The function Table::hasDataChanged can be used to check if a table is (being) changed by another process. In this way a program can react on it. E.g. the table browser can refresh its screen when the underlying table is changed.

In general the default locking option will do. From the above it should be clear that heavy concurrent access results in a lot of flushing, thus will have a negative impact on performance. When uninterrupted access to a table is needed, the PermanentLocking option should be used. When transaction-like processing is done (e.g. updating a table containing an observation catalogue), the UserLocking option is probably best.

Creation or deletion of a table is not possible when that table is still open in another process. The function Table::isMultiUsed() can be used to check if a table is open in other processes.
The function deleteTable should be used to delete a table. Before deleting the table it ensures that it is writable and that it is not open in the current or another process

The following example wants to read the table uninterrupted, thus it uses the PermanentLocking option. It also wants to wait until the lock is actually acquired. Note that the destructor closes the table and releases the lock.

    // Open the table (readonly).
    // Acquire a permanent (read) lock.
    // It waits until the lock is acquired.
    Table tab ("some.name",
               TableLock(TableLock::PermanentLockingWait));

The following example uses the automatic locking.\. It tells the system to check about every 20 seconds if another process wants access to the table.

    // Open the table (readonly).
    Table tab ("some.name",
               TableLock(TableLock::AutoLocking, 20));

The following example gets data (say from a GUI) and writes it as a row into the table. The lock the table as little as possible the lock is acquired just before writing and released immediately thereafter.

    // Open the table (writable).
    Table tab ("some.name",
               TableLock(TableLock::UserLocking),
               Table::Update);
    while (True) {
        get input data
        tab.lock();     // Acquire a write lock and wait for it.
        tab.addRow();
        write data into the row
        tab.unlock();   // Release the lock.
    }

The following example deletes a table when it is not used in another process.

    Table tab ("some.name");
    if (! tab.isMultiUsed()) {
        tab.markForDelete();
    }

Table lookup based on a key

Class ColumnsIndex offers the user a means to find the rows matching a given key or key range. It is a somewhat primitive replacement of a B-tree index and in the future it may be replaced by a proper B+-tree implementation.

The ColumnsIndex class makes it possible to build an in-core index on one or more columns. Looking a key or key range is done using a binary search on that index. It returns a vector containing the row numbers of the rows matching the key (range).

The class is not capable of tracing changes in the underlying column(s). It detects a change in the number of rows and updates the index accordingly. However, it has to be told explicitly when a value in the underlying column(s) changes.

The following example shows how the class can be used.

Example

Suppose one has an antenna table with key ANTENNA.

    // Open the table and make an index for column ANTENNA.
    Table tab("antenna.tab")
    ColumnsIndex colInx(tab, "ANTENNA");
    // Make a RecordFieldPtr for the ANTENNA field in the index key record.
    // Its data type has to match the data type of the column.
    RecordFieldPtr<Int> antFld(colInx.accessKey(), "ANTENNA");
    // Now loop in some way and find the row for the antenna
    // involved in that loop.
    Bool found;
    while (.\..) {
        // Fill the key field and get the row number.
        // ANTENNA is a unique key, so only one row number matches.
        // Otherwise function getRowNumbers had to be used.
        *antFld = antenna;
        uInt antRownr = colInx.getRowNumber (found);
        if (!found) {
            cout << "Antenna " << antenna << " is unknown" << endl;
        } else {
            // antRownr can now be used to get data from that row in
            // the antenna table.
        }
    }

ColumnsIndex itself contains a more advanced example. It shows how to use a private compare function to adjust the lookup when the index does not contain single key values, but intervals instead. This is useful when a row in a (sub)table is valid for, say, a time range instead of a single timestamp.

Performance and robustness considerations

The Table System resembles a database system, but it is not as robust. It lacks the transaction and logging facilities common to data base systems. It means that in case of a crash data might be lost. To reduce the risk of data loss to a minimum, it is advisable to regularly do a flush, optionally with an fsync to ensure that all data are really written. However, that can degrade the performance because it involves extra writes. So one should find the right balance between robustness and performance.

To get a good feeling for the performance issues, it is important to understand some of the internals of the Table System.
The storage managers drive the performance. All storage managers use buckets (called tiles for the TiledStMan) which contain the data. All IO is done by bucket. The bucket/tile size is defined when creating the storage manager objects. Sometimes the default will do, but usually it is better to set it explicitly.

It is best to do a flush when a tile is full. For example:
When creating a MeasurementSet containing N antennae (thus N*(N-1) baselines or N*(N+1) if auto-correlations are stored as well) it makes sense to store, say, N/2 rows in a tile and do a flush each time all baselines are written. In that way tiles are fully filled when doing the flush, so no extra IO is involved.
Here is some code showing this when creating a MeasurementSet. The code should speak for itself.

    MS* createMS (const String& msName, int nrchan, int nrant)
    {
      // Get the MS main default table description.
      TableDesc td = MS::requiredTableDesc();
      // Add the data column and its unit.
      MS::addColumnToDesc(td, MS::DATA, 2);
      td.rwColumnDesc(MS::columnName(MS::DATA)).rwKeywordSet().
                                                    define("UNIT","Jy");
      // Store the DATA and FLAG column in two separate files.
      // In this way accessing FLAG only is much cheaper than
      // when combining DATA and FLAG.
      // All data have the same shape, thus use TiledColumnStMan.
      // Also store UVW with TiledColumnStMan.
      Vector<String> tsmNames(1);
      tsmNames[0] = MS::columnName(MS::DATA);
      td.rwColumnDesc(tsmNames[0]).setShape (IPosition(2,itsNrCorr,itsNrFreq));
      td.defineHypercolumn("TiledData", 3, tsmNames);
      tsmNames[0] = MS::columnName(MS::FLAG);
      td.rwColumnDesc(tsmNames[0]).setShape (IPosition(2,itsNrCorr,itsNrFreq));
      td.defineHypercolumn("TiledFlag", 3, tsmNames);
      tsmNames[0] = MS::columnName(MS::UVW);
      td.defineHypercolumn("TiledUVW", 2, tsmNames);
      // Setup the new table.
      SetupNewTable newTab(msName, td, Table::New);
      // Most columns vary slowly and use the IncrStMan.
      IncrementalStMan incrStMan("ISMData");
      // A few columns use he StandardStMan (set an appropriate bucket size).
      StandardStMan    stanStMan("SSMData", 32768);
      // Store all pol and freq and some rows in a single tile.
      // autocorrelations are written, thus in total there are
      // nrant*(nrant+1)/2 baselines. Ensure a baseline takes up an
      // integer number of tiles.
      TiledColumnStMan tiledData("TiledData",
                                 IPosition(3,4,nchan,(nrant+1)/2));
      TiledColumnStMan tiledFlag("TiledFlag",
                                 IPosition(3,4,nchan,8*(nrant+1)/2));
      TiledColumnStMan tiledUVW("TiledUVW", IPosition(2,3,));
                                IPosition(2,3,nrant*(nrant+1)/2));
      newTab.bindAll (incrStMan);
      newTab.bindColumn(MS::columnName(MS::ANTENNA1),stanStMan);
      newTab.bindColumn(MS::columnName(MS::ANTENNA2),stanStMan);
      newTab.bindColumn(MS::columnName(MS::DATA),tiledData);
      newTab.bindColumn(MS::columnName(MS::FLAG),tiledFlag);
      newTab.bindColumn(MS::columnName(MS::UVW),tiledUVW);
      // Create the MS and its subtables.
      // Get access to its columns.
      MS* msp = new MeasurementSet(newTab);
      // Create all subtables.
      // Do this after the creation of optional subtables,
      // so the MS will know about those optional sutables.
      msp->createDefaultSubtables (Table::New);
      return msp;
    }

Some more performance considerations

Which storage managers to use and how to use them depends heavily on the type of data and the access patterns to the data. Here follow some guidelines:

  1. Scalar data can be stored with the StandardStMan (SSM) or IncrementalStMan (ISM). For slowly varying data (e.g. the TIME column in a MeasurementSet) it is best to use the ISM. Otherwise the SSM. Note that very long strings (longer than the bucketsize) can only be stored with the SSM.
  2. Any number of storage managers can be used. In fact, each column can have a storage manager of its own resulting in column-wise stored data which is more and more used in data base systems. In that way a query or sort on that column is very fast, because the buckets to read only contain data of that column. In practice one can decide to combine a few frequently used columns in a storage manager.
  3. Array data can be stored with any column manager. Small fixed size arrays can be stored directly with the SSM (or ISM if not changing much). However, they can also be stored with a TiledStMan (TSM) as shown for the UVW column in the example above.
    Large arrays should usually be stored with a TSM. However, if it must be possible to change the shape of an array after it was stored, the SSM (or ISM) must be used. Note that in that case a lot of disk space can be wasted, because the SSM and ISM store the array data at the end of the file when the array got bigger and do not reuse the old space. The only way to reclaim it is by making a deep copy of the entire table.
  4. When an array is stored with a TSM, it is important to decide which TSM to use.
    1. The TiledColumnStMan is the most efficient, but only suitable for arrays having the same shape in the entire column.
    2. The TiledShapeStMan is suitable for columns where the arrays can have a few shapes.
    3. The TiledCellStMan is suitable for columns where the arrays can have many different shapes.
    This is discussed in more detail above.
  5. When storing an array with a TSM, it can be very important to choose the right tile shape. Not only does this define the size of a tile, but it also defines if access in other directions than the natural direction can be fast. It is also discussed in more detail above.
  6. Columns can be combined in a single TiledStMan. For instance, combining DATA and FLAG is advantageous if FLAG is always used with DATA. However, if FLAG is used on its own (e.g. in combination with CORRECTED_DATA), it is better to separate them, otherwise tiles containing FLAG also contain DATA making the tiles much bigger, thus more expensive to access.


Modules

 Tables_internal_classes
 Internal Tables classes and functions.

Classes

class  casa::ROArrayColumn< T >
 Readonly access to an array table column with arbitrary data type. More...
class  casa::ArrayColumn< T >
 Read/write access to an array table column with arbitrary data type. More...
class  casa::ArrayColumnDesc< T >
 Templated class for description of table array columns. More...
class  casa::BaseMappedArrayEngine< VirtualType, StoredType >
 Templated virtual column engine for a table array of any type. More...
class  casa::ColumnDesc
 Envelope class for the description of a table column. More...
class  casa::ColumnsIndex
 Index to one or more columns in a table. More...
class  casa::ColumnsIndexArray
 Index to an array column in a table. More...
class  casa::CompressComplex
 Virtual column engine to scale a table Complex array. More...
class  casa::CompressComplexSD
 Virtual column engine to scale a table Complex array for Single Dish data. More...
class  casa::CompressFloat
 Virtual column engine to scale a table float array. More...
class  casa::DataManError
 Base error class for table data manager. More...
class  casa::DataManInternalError
 Internal table data manager error. More...
class  casa::DataManUnknownCtor
 Table DataManager error; invalid data manager. More...
class  casa::DataManInvDT
 Table DataManager error; invalid data type. More...
class  casa::DataManInvOper
 Table DataManager error; invalid operation. More...
class  casa::DataManUnknownVirtualColumn
 Table DataManager error; unknown virtual column. More...
class  casa::TSMError
 Table DataManager error; error in TiledStMan. More...
class  casa::TableExprNode
 Handle class for a table column expression tree. More...
class  casa::TableExprNodeSetElem
 Class to hold the table expression nodes for an element in a set. More...
class  casa::TableExprNodeSet
 Class to hold multiple table expression nodes. More...
class  casa::ForwardColumnEngine
 Virtual column engine forwarding to other columns. More...
class  casa::ForwardColumnIndexedRowEngine
 Virtual column engine forwarding to other columns/rows. More...
class  casa::IncrementalStMan
 The Incremental Storage Manager. More...
class  casa::ROIncrementalStManAccessor
 Give access to some IncrementalStMan functions. More...
class  casa::MappedArrayEngine< VirtualType, StoredType >
 Templated virtual column engine to map the data type of a table array. More...
class  casa::MemoryStMan
 Memory-based table storage manager class. More...
struct  casa::ReadAsciiTable_global_functions_readAsciiTable
 Filling a table from an Ascii file. More...
struct  casa::RecordExpr_global_functions_RecordExpr
 Global functions to make a expression node for a record field. More...
class  casa::RetypedArrayEngine< VirtualType, StoredType >
 Virtual column engine to retype and reshape arrays. More...
struct  casa::RetypedArraySetGet_global_functions_RetypedArrayEngineSetGet
 Helper functions for users of RetypedArrayEngine. More...
class  casa::RowCopier
 RowCopier copies all or part of a row from one table to another. More...
class  casa::ScalarColumnDesc< T >
 Templated class to define columns of scalars in tables. More...
class  casa::ROScalarColumn< T >
 Readonly access to a scalar table column with arbitrary data type. More...
class  casa::ScalarColumn< T >
 Read/write access to a scalar table column with arbitrary data type. More...
class  casa::ScaledArrayEngine< VirtualType, StoredType >
 Templated virtual column engine to scale a table array. More...
class  casa::ScaledComplexData< VirtualType, StoredType >
 Templated virtual column engine to scale a complex table array. More...
class  casa::ScalarRecordColumnDesc
 Class to define columns of scalar records in tables. More...
class  casa::SetupNewTable
 Create a new table - define shapes, data managers, etc. More...
class  casa::StandardStMan
 The Standard Storage Manager. More...
class  casa::ROStandardStManAccessor
 Give access to some StandardStMan functions. More...
class  casa::StManAipsIO
 AipsIO table storage manager class. More...
class  casa::SubTableDesc
 Description of columns containing tables. More...
class  casa::Table
 Main interface class to a read/write table. More...
class  casa::ROTableColumn
 Readonly access to a table column. More...
class  casa::TableColumn
 Non-const access to a table column. More...
class  casa::TableCopy
 Class with static functions for copying a table. More...
class  casa::TableDesc
 Define the structure of an AIPS++ table. More...
class  casa::TableError
 Base error class for storage manager. More...
class  casa::TableInternalError
 Internal table error. More...
class  casa::TableDuplFile
 Table error; table (description) already exists. More...
class  casa::TableNoFile
 Table error; table (description) not found. More...
class  casa::TableDescNoName
 Table error; no name given to table description. More...
class  casa::TableInvOpt
 Table error; invalid table (description) option. More...
class  casa::TableInvType
 Table error; table type mismatch. More...
class  casa::TableInvColumnDesc
 Table error; invalid column description. More...
class  casa::TableInvHyperDesc
 Table error; invalid hypercolumn description. More...
class  casa::TableUnknownDesc
 Table error; unknown column description. More...
class  casa::TableInvDT
 Table error; invalid data type. More...
class  casa::TableInvOper
 Table error; invalid operation. More...
class  casa::TableArrayConformanceError
 Table error; non-conformant array. More...
class  casa::TableConformanceError
 Table error; table length conformance error. More...
class  casa::TableInvSort
 Table error; invalid sort. More...
class  casa::TableInvLogic
 Table error; invalid logical operation. More...
class  casa::TableInvExpr
 Table error; invalid select expression. More...
class  casa::TableVectorNonConform
 Table error; non-conformant table vectors. More...
class  casa::TableParseError
 Table error; invalid table command. More...
class  casa::TableExprData
 Abstract base class for data object in a TaQL expression. More...
class  casa::TableExprId
 The identification of a TaQL selection subject. More...
class  casa::TableIndexProxy
 Proxy for table index access. More...
class  casa::TableIterator
 Iterate through a Table. More...
class  casa::TableIterProxy
 Proxy for table iterator access. More...
class  casa::TableLocker
 Class to hold a (user) lock on a table. More...
class  casa::TableProxy
 High-level interface to tables. More...
class  casa::TableRecord
 A hierarchical collection of named fields of various types. More...
class  casa::ROTableRow
 Readonly access to a table row. More...
class  casa::TableRow
 Read/write access to a table row. More...
class  casa::TableRowProxy
 Proxy for table row access. More...
class  casa::ROTableVector< T >
 Templated readonly table column vectors. More...
class  casa::TableVector< T >
 Templated read/write table column vectors. More...
struct  casa::TabVecMath_global_functions_basicMath
 Basic math for table vectors. More...
struct  casa::TabVecMath_global_functions_basicTransMath
 Transcendental math for table vectors. More...
struct  casa::TabVecMath_global_functions_advTransMath
 Further transcendental math for table vectors. More...
struct  casa::TabVecMath_global_functions_miscellaneous
 Miscellaneous table vector operations. More...
struct  casa::TabVecMath_global_functions_vectorMath
 Vector operations on a table vector. More...
class  casa::TiledCellStMan
 Tiled Cell Storage Manager. More...
class  casa::TiledColumnStMan
 Tiled Column Storage Manager. More...
class  casa::TiledDataStMan
 Tiled Data Storage Manager. More...
class  casa::TiledDataStManAccessor
 Give access to some TiledDataStMan functions. More...
class  casa::TiledFileAccess
 Tiled access to an array in a file. More...
class  casa::TiledShapeStMan
 Tiled Data Storage Manager using the shape as id. More...
class  casa::TiledStMan
 Base class for Tiled Storage Manager classes. More...
struct  casa::VirtScaCol_global_functions_getVirtualScalarColumn
 Global functions to get or put data of a virtual column. More...
class  casa::VirtualTaQLColumn
 Virtual scalar column using TaQL. More...
class  casa::VSCEngine< T >
 Base virtual column for a scalar column with any type. More...


Generated on Tue Nov 6 11:37:44 2007 for AIPS++ by  doxygen 1.4.4