See below for an overview of the classes in this module.
Part of API
"Table" is a formal term from relational database theory: <cite> "The organizing principle in a relational database is the TABLE, a rectangular, row/column arrangement of data values."</cite> AIPS++ tables are extensions to traditional tables, but are similar enough that we use the same name. There is also a strong resemblance between the uses of AIPS++ tables, and FITS binary tables, which provides another reason to use "Tables" to describe the AIPS++ data storage mechanism.
Tables are the fundamental storage mechanism for AIPS++. This document explains why they had to be made, what their properties are, and how to use them. The last subject is discussed and illustrated in a sequence of sections:
The AIPS++ tables are mainly based upon the ideas of Allen Farris, as laid out in the AIPS++ Database document, from where the following paragraph is taken:
<BLOCKQUOTE> Traditional relational database tables have two features that decisively limit their applicability to scientific data. First, an item of data in a column of a table must be atomic -- it must have no internal structure. A consequence of this restriction is that relational databases are unable to deal with arrays of data items. Second, an item of data in a column of a table must not have any direct or implied linkages to other items of data or data aggregates. This restriction makes it difficult to model complex relationships between collections of data. While these restrictions may make it easy to define a mathematically complete set of data manipulation operations, they are simply intolerable in a scientific data-handling context. Multi-dimensional arrays are frequently the most natural modes in which to discuss and think about scientific data. In addition, scientific data often requires complex calibration operations that must draw on large bodies of data about equipment and its performance in various states. The restrictions imposed by the relational model make it very difficult to deal with complex problems of this nature. </BLOCKQUOTE>
In response to these limitations, and other needs, the AIPS++ tables were designed.
AIPS++ tables have the following properties:
table.endianformat which defaults to Table::LocalEndian (thus the endian format of the machine being used). Tables can be in one of three forms:
A (somewhat primitive) mechanism is available to do a table lookup based on the contents of a key. In the future this might be replaced by a proper B+-tree index mechanism.
To open an existing table you just create a Table object giving the name of the table, like:
Table readonly_table ("tableName"); // or Table read_and_write_table ("tableName", Table::Update);
The constructor option determines whether the table will be opened as readonly or as read/write. A readonly table file must be opened as readonly, otherwise an exception is thrown. The functions Table::isWritable(.\..) can be used to determine if a table is writable.
When the table is opened, the data managers are reinstantiated according to their definition at table creation.
You can read data from a table column with the "get" functions in the classes ROScalarColumn<T> and ROArrayColumn<T> . For scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could instead use ROTableColumn::getScalar(.\..) or ROTableColumn::asXXX(.\..) . These functions offer an extra: they do automatic data type promotion; so that you can, for example, get a double value from a float column.
These "get" functions are used in the same way as the simple"put" functions described in the previous section.
ScalarColumn<T> is derived from ROScalarColumn<T>, and therefore has the same "get" functions. However, if a ScalarColumn<T> object is constructed for a non-writable column, an exception is thrown. Only ROScalarColumn<T> objects can be constructed for nonwritable columns. The same is true for ArrayColumn<T> and TableColumn .
A typical program could look like:
#include <tables/Tables/Table.h> #include <tables/Tables/ScalarColumn.h> #include <tables/Tables/ArrayColumn.h> #include <casa/Arrays/Vector.h> #include <casa/Arrays/Slicer.h> #include <casa/Arrays/ArrayMath.h> #include <iostream> main() { // Open the table (readonly). Table tab ("some.name"); // Construct the various column objects. // Their data type has to match the data type in the table description. ROScalarColumn<Int> acCol (tab, "ac"); ROArrayColumn<Float> arr2Col (tab, "arr2"); // Loop through all rows in the table. uInt nrrow = tab.nrow(); for (uInt i=0; i<nrow; i++) { // Read the row for both columns. cout << "Column ac in row i = " << acCol(i) << endl; Array<Float> array = arr2Col.get (i); } // Show the entire column ac, // and show the 10th element of arr2 in each row.\. cout << ac.getColumn(); cout << arr2.getColumn (Slicer(Slice(10))); }
The creation of a table is a multi-step process:
MemoryTable will rebind stored columns to the MemoryStMan storage manager, but virtual columns bindings are not changed.The following example shows how you can create a table. An example specifically illustrating the creation of the table description is given in that section. Other sections discuss the access to the table.
#include <tables/Tables/TableDesc.h> #include <tables/Tables/SetupNewTab.h> #include <tables/Tables/Table.h> #include <tables/Tables/ScaColDesc.h> #include <tables/Tables/ScaRecordColDesc.h> #include <tables/Tables/ArrColDesc.h> #include <tables/Tables/StandardStMan.h> #include <tables/Tables/IncrementalStMan.h> main() { // Step1 -- Build the table description. TableDesc td("tTableDesc", "1", TableDesc::Scratch); td.comment() = "A test of class SetupNewTable"; td.addColumn (ScalarColumnDesc<Int> ("ab" ,"Comment for column ab")); td.addColumn (ScalarColumnDesc<Int> ("ac")); td.addColumn (ScalarColumnDesc<uInt> ("ad","comment for ad")); td.addColumn (ScalarColumnDesc<Float> ("ae")); td.addColumn (ScalarRecordColumnDesc ("arec")); td.addColumn (ArrayColumnDesc<Float> ("arr1",3,ColumnDesc::Direct)); td.addColumn (ArrayColumnDesc<Float> ("arr2",0)); td.addColumn (ArrayColumnDesc<Float> ("arr3",0,ColumnDesc::Direct)); // Step 2 -- Setup a new table from the description. SetupNewTable newtab("newtab.data", td, Table::New); // Step 3 -- Create storage managers for it. StandardStMan stmanStand_1; StandardStMan stmanStand_2; IncrementalStMan stmanIncr; // Step 4 -- First, bind all columns to the first storage // manager. Then, bind a few columns to another storage manager // (which will overwrite the previous bindings). newtab.bindAll (stmanStand_1); newtab.bindColumn ("ab", stmanStand_2); newtab.bindColumn ("ae", stmanIncr); newtab.bindColumn ("arr3", stmanIncr); // Step 5 -- Define the shape of the direct columns. // (this could have been done in the column description). newtab.setShapeColumn( "arr1", IPosition(3,2,3,4)); newtab.setShapeColumn( "arr3", IPosition(3,3,4,5)); // Step 6 -- Finally, create the table consisting of 10 rows. Table tab(newtab, 10); // Now we can fill the table, which is shown in a next section. // The Table destructor will flush the table to the files. }
Table tab(newtab, Table::Memory, 10);
Once a table has been created or has been opened for read/write, you want to write data into it. Before doing that you may have to add one or more rows to the table. Tip: When a table was created with a given number of rows, you do not need to add rows; you may not even be able to do so.
When adding new rows to the table, either via the Table(.\..) constructor or via the Table::addRow(.\..) function, you can choose to have those rows initialized with the default values given in the description.
To actually write the data into the table you need the classes ScalarColumn<T> and ArrayColumn<T> . For each column you can construct one or more of these objects. Their put(.\..) functions let you write a value at a time or the entire column in one go. For arrays you can "put" subsections of the arrays.
As an alternative for scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could use the functions TableColumn::putScalar(.\..) . These functions offer an extra: automatic data type promotion; so that you can, for example, put a float value in a double column.
A typical program could look like:
#include <tables/Tables/TableDesc.h> #include <tables/Tables/SetupNewTab.h> #include <tables/Tables/Table.h> #include <tables/Tables/ScaColDesc.h> #include <tables/Tables/ArrColDesc.h> #include <tables/Tables/ScalarColumn.h> #include <tables/Tables/ArrayColumn.h> #include <casa/Arrays/Vector.h> #include <casa/Arrays/Slicer.h> #include <casa/Arrays/ArrayMath.h> #include <iostream> main() { // First build the table description. TableDesc td("tTableDesc", "1", TableDesc::Scratch); td.comment() = "A test of class SetupNewTable"; td.addColumn (ScalarColumnDesc<Int> ("ac")); td.addColumn (ArrayColumnDesc<Float> ("arr2",0)); // Setup a new table from the description, // and create the (still empty) table. // Note that since we do not explicitly bind columns to // data managers, all columns will be bound to the default // standard storage manager StandardStMan. SetupNewTable newtab("newtab.data", td, Table::New); Table tab(newtab); // Construct the various column objects. // Their data type has to match the data type in the description. ScalarColumn<Int> ac (tab, "ac"); ArrayColumn<Float> arr2 (tab, "arr2"); Vector<Float> vec2(100); // Write the data into the columns. // In each cell arr2 will be a vector of length 100. // Since its shape is not set explicitly, it is done implicitly. for (uInt i=0; i<10; i++) { tab.addRow(); // First add a row. ac.put (i, i+10); // value is i+10 in row i indgen (vec2, float(i+20)); // vec2 gets i+20, i+21, .\.., i+119 arr2.put (i, vec2); } // Finally, show the entire column ac, // and show the 10th element of arr2. cout << ac.getColumn(); cout << arr2.getColumn (Slicer(Slice(10))); // The Table destructor writes the table. }
In this example we added rows in the for loop, but we could also have created 10 rows straightaway by constructing the Table object as:
Table tab(newtab, 10);
tab.addRow()
The classes TableColumn , ScalarColumn<T> , and ArrayColumn<T> contain several functions to put values into a single cell or into the whole column. This may look confusing, but is actually quite simple. The functions can be divided in two groups:
Apart from accessing a table column-wise as described in the previous two sections, it is also possible to access a table row-wise. The TableRow class makes it possible to access multiple fields in a table row as a whole. Note that like the XXColumn classes described above, there is also an ROTableRow class for access to readonly tables.
On construction of a TableRow object it has to be specified which fields (i.e. columns) are part of the row. For these fields a fixed structured TableRecord object is constructed as part of the TableRow object. The TableRow::get function will fill this record with the table data for the given row. The user has access to the record and can use RecordFieldPtr objects for speedier access to the record.
The class could be used as shown in the following example.
// Open the table as readonly and define a row object to contain // the given columns. // Note that the function stringToVector is a very convenient // way to construct a Vector<String>. // Show the description of the fields in the row. Table table("Some.table"); ROTableRow row (table, stringToVector("col1,col2,col3")); cout << row.record().description(); // Since the structure of the record is known, the RecordFieldPtr // objects could be used to allow for easy and fast access to // the record which is refilled for each get. RORecordFieldPtr<String> col1(row.record(), "col1"); RORecordFieldPtr<Double> col2(row.record(), "col2"); RORecordFieldPtr<Array<Int> > col3(row.record(), "col3"); for (uInt i=0; i<table.nrow(); i++) { row.get (i); someString = *col1; somedouble = *col2; someArrayInt = *col3; }
The result of a select and sort of a table is another table, which references the original table. This means that an update of a sorted or selected table results in the update of the original table. The result is, however, a table in itself, so all table functions (including select and sort) can be used with it. Note that a true copy of such a reference table can be made with the Table::deepCopy function.
Rows or columns can be selected from a table. Columns can be selected by the Table::project(.\..) function, while rows can be selected by the various Table operator() functions. Usually a row is selected by giving a select expression with TableExprNode objects. These objects represent the various nodes in an expression, e.g. a constant, a column, or a subexpression. The Table function Table::col(.\..) creates a TableExprNode object for a column. The function Table::key(.\..) does the same for a keyword by reading the keyword value and storing it as a constant in an expression node. All column nodes in an expression must belong to the same table, otherwise an exception is thrown. In the following example we select all rows with RA>10:
#include <tables/Tables/ExprNode.h> Table table ("Table.name"); Table result = table (table.col("RA") > 10);
Table result = table (table.col("RA") > 10 && table.col("RA") < 14 && table.col("DEC") >= -10 && table.col("DEC") <= 10);
Table result = table (sin (table.col("RA")) > 0.5);
in can be used to select from a set of values. A value set can be constructed using class TableExprNodeSet . TableExprNodeSet set;
set.add (TableExprNodeSetElem ("abc"));
set.add (TableExprNodeSetElem ("defg"));
set.add (TableExprNodeSetElem ("h"));
Table result = table (table.col("NAME).in (set));
abc, defg, or h.You can sort a table on one or more columns containing scalars. In this example we simply sort on column RA (default is ascending):
Table table ("Table.name"); Table result = table.sort ("RA");
Table table ("Table.name"); Block<String> sortKeys(2); Block<int> sortOrders(2); sortKeys(0) = "RA"; sortOrders(0) = Sort::Descending; sortKeys(1) = "DEC"; sortOrders(1) = Sort::Ascending; Table result = table.sort (sortKeys, sortOrders);
Tables stemming from the same root, can be combined in several ways with the help of the various logical Table operators (operator|, etc.).
The selection and sorting mechanism described above can only be used in a hard-coded way in a C++ program. There is, however, another way. Strings containing selection and sorting commands can be used. The syntax of these commands is based on SQL and is described in the Table Query Language (TaQL).
Such a command can be executed with the static function TableParse::tableCommand defined in class TableParse .
You can iterate through a table in an arbitrary order by getting a subset of the table consisting of the rows in which the iteration columns have the same value. An iterator object is created by constructing a TableIterator object with the appropriate column names.
In the next example we define an iteration on the columns Time and Baseline. Each iteration step returns a table subset in which Time and Baseline have the same value.
// Iterate over Time and Baseline (by default in ascending order). // Time is the main iteration order, thus the first column specified. Table t; Table tab ("UV_Table.data"); Block<String> iv0(2); iv0[0] = "Time"; iv0[1] = "Baseline"; // // Create the iterator. This will prepare the first subtable. TableIterator iter(tab, iv0); Int nr = 0; while (!iter.pastEnd()) { // Get the first subtable. // This will contain rows with equal Time and Baseline. t = iter.table(); cout << t.nrow() << " "; nr++; // Prepare the next subtable with the next Time,Baseline value. iter.next(); } cout << endl << nr << " iteration steps" << endl;
You can define more than one iterator on the same table; they operate independently.
Note that the result of each iteration step is a table in itself which references the original table, just as in the case of a sort or select. This means that the resulting table can be used again in a sort, select, iteration, etc.\.
A table vector makes it possible to treat a column in a table as a vector. Almost all operators and functions defined for normal vectors, are also defined for table vectors. So it is, for instance, possible to add a constant to a table vector. This has the effect that the underlying column gets changed.
You can use the templated classes ROTableVector and TableVector and to define a table vector (readonly and read/write, respectively) for a scalar column. Columns containing arrays or tables are not supported. The data type of the (RO)TableVector object must match the data type of the column. A table vector can also hold a normal vector so that (temporary) results of table vector operations can be handled.
In the following example we double the data in column COL1 and store the result in a temporary table vector.
// Create a table vector for column COL1. // It has to be a ROTableVector, because the table is opened // as readonly. Table tab ("Table.data"); ROTableVector<Int> tabvec(tab, "COL1"); // Multiply it by a constant. // The result has to be stored in a TableVector, // since a ROTableVector cannot be written to. TableVector<Int> temp = 2 * tabvec;
In the next example we double the data in COL1 and put the result back in the column.
// Create a table vector for column COL1. // It has to be a TableVector to be able to change the column. Table tab ("Table.data", Table::Update); TableVector<Int> tabvec(tab, "COL1"); // Multiply it by a constant. tabvec *= 2;
Any number of keyword/value pairs may be attached to the table as a whole, or to any individual column. They may be freely added, retrieved, re-assigned, or deleted. They are, in essence, a self-resizing list of values (any of the primitive types) indexed by Strings (the keyword).
A table keyword/value pair might be
Observer = Grote Reber
Date = 10 october 1942
Units = mJy
Reference Pixel = 320
A table contains a description of itself, which defines the layout of the columns and the keyword sets for the table and for the individual columns. It may also define initial keyword sets and default values for the columns. Such a default value is automatically stored in a cell in the table column, whenever a row is added to the table.
The creation of the table descriptor is the first step in the creation of a new table. The description is part of the table itself, but may also exist in a separate file. This is useful when you need to create a number of tables with the same structure; in other circumstances it probably should be avoided.
The public classes to set up a table description are:
Here follows a typical example of the construction of a table description. For more specialized things -- like the definition of a default data manager -- we refer to the descriptions of the above mentioned classes.
#include <tables/Tables/TableDesc.h> #include <tables/Tables/ScaColDesc.h> #include <tables/Tables/ArrColDesc.h> #include <aips/Tables/ScaRecordTabDesc.h> #include <tables/Tables/TableRecord.h> #include <casa/Arrays/IPosition.h> #include <casa/Arrays/Vector.h> main() { // Create a new table description // Define a comment for the table description. // Define some keywords. ColumnDesc colDesc1, colDesc2; TableDesc td("tTableDesc", "1", TableDesc::New); td.comment() = "A test of class TableDesc"; td.rwKeywordSet().define ("ra" float(3.14)); td.rwKeywordSet().define ("equinox", double(1950)); td.rwKeywordSet().define ("aa", Int(1)); // Define an integer column ab. td.addColumn (ScalarColumnDesc<Int> ("ab", "Comment for column ab")); // Add a scalar integer column ac, define keywords for it // and define a default value 0. // Overwrite the value of keyword unit. ScalarColumnDesc<Int> acColumn("ac"); acColumn.rwKeywordSet().define ("scale" Complex(0,0)); acColumn.rwKeywordSet().define ("unit", ""); acColumn.setDefault (0); td.addColumn (acColumn); td.rwColumnDesc("ac").rwKeywordSet().define ("unit", "DEG"); // Add a scalar string column ad and define its comment string. td.addColumn (ScalarColumnDesc<String> ("ad","comment for ad")); // Now define array columns. // This one is indirect and has no dimensionality mentioned yet. td.addColumn (ArrayColumnDesc<Complex> ("Arr1","comment for Arr1")); // This one is indirect and has 3-dim arrays. td.addColumn (ArrayColumnDesc<Int> ("A2r1","comment for Arr1",3)); // This one is direct and has 2-dim arrays with axes length 4 and 7. td.addColumn (ArrayColumnDesc<uInt> ("Arr3","comment for Arr1", IPosition(2,4,7), ColumnDesc::Direct)); // Add columns containing records. td.addColumn (ScalarRecordColumnDesc ("Rec1")); }
Data managers take care of the actual access to the data in a column. There are two kinds of data managers:
Several storage managers are currently supported. The default and preferred storage manager is StandardStMan. Other storage managers should only be used when they pay off in file space (like IncrementalStMan for slowly varying data) or access speed (like the tiled storage managers for large data arrays).
The storage managers store the data in a big or little endian canonical format. The format can be specified when the table is created. By default it uses the endian format as specified in the aipsrc variable table.endianformat which can have the value local, big, or little. The default is local.
StManAipsIO.
IncrementalStMan. It contains functions to deal with the cache size and to show the behaviour of the cache.
AipsIO to store the data in the columns. It supports all table functionality, but its I/O is probably not as efficient as other storage managers. It also requires that a large part of the table fits in memory.
MemoryStMan holds the data in memory. It means that data 'stored' with this storage manager are NOT persistent.
This storage manager is primarily meant for tables held in memory, but it can also be useful for temporary columns in normal tables. Note, however, that when a table is accessed concurrently from multiple processes, MemoryStMan data cannot be synchronized.
The storage manager framework makes it possible to support arbitrary files as tables. This has been used in a case where a file is filled by the data acquisition system of a telescope. The file is simultaneously used as a table using a dedicated storage manager. The table system and storage manager provide a sync function to synchronize the processes, i.e. to make the table system aware of changes in the file size (thus in the table size) by the filling process.
Tip: Not all data managers support all the table functionality. So, the choice of a data manager can greatly influence the type of operations you can do on the table as a whole. For example, if a column uses the tiled storage manager, it is not possible to delete rows from the table, because that storage manager will not support deletion of rows. However, it is always possible to delete all columns of a data manager in one single call.
The Tiled Storage Managers allow one to store the data of one or more columns in a tiled way. Tiling means that the data are stored without a preferred order to make access along the different main axes equally efficient. This is done by storing the data in so-called tiles (i.e. equally shaped subsets of an array) to increase data locality. The user can define the tile shape to optimize for the most frequently used access.
The Tiled Storage Manager has the following properties:
(see TableDesc::defineHypercolumn (file="TableDesc.h::defineHypercolumn")) ).
A hypercolumn consists of up to three types of columns:
The following Tiled Storage Managers are available:
TiledDataStManAccessor . This makes it possible to store, say, row 0-9 in hypercube A, row 10-34 in hypercube B, row 35-54 in hypercube A again, etc.\.
This storage manager could be used to store UV-data with a mix of continuum and line data.
TiledDataStMan by using the array shape as the id value. Similarly to TiledDataStMan it can maintain multiple hypercubes and store multiple rows in a hypercube, but is is easier to use, because the special addHypercube and extendHypercube functions are not needed. An hypercube is automatically added when a new array shape is encountered.
For example:
UV-data and weights have to be stored in a table. The data have the coordinates Pol, Freq, Baseline and Time. There is continuum and line data, which have to be stored in 2 separate hypercubes. This could lead to the following scenario when creating/filling the table:
Virtual column engines are used to implement the virtual (i.e. calculated-on-the-fly) columns. The Table system provides an abstract base class (or "interface class") VirtualColumnEngine that specifies the protocol for these engines. The programmer must derive a concrete class to implement the application-specific virtual column.
For example: the programmer needs a column in a table which is the difference between two other columns. (Perhaps these two other columns are updated periodically during the execution of a program.) A good way to handle this would be to have a virtual column in the table, and write a virtual column engine which knows how to calculate the difference between corresponding cells of the two other columns. So the result is that accessing a particular cell of the virtual column invokes the virtual column engine, which then gets the values from the other two columns, and returns their difference. This particular example could be done using VirtualTaQLColumn .
Several virtual column engines exist:
ForwardColumnEngine forwards the gets and puts on a row in a column to the same row in a column with the same name in another table. This provides a virtual copy of the referenced column.
ForwardColumnIndexedRowEngine is similar to ForwardColumnEngine.. However, instead of forwarding it to the same row it uses a a column to map its row number to a row number in the referenced table. In this way multiple rows can share the same data. This data manager only allows for get operations.
dVSCEngine.cc.Multiple concurrent readers and writers (also via NFS) of a table are supported by means of a locking/synchronization mechanism. This mechanism is not very sophisticated in the sense that it is very coarsely grained. When locking, the entire table gets locked. A special lock file is used to lock the table. This lock file also contains some synchronization data.
Five ways of locking are supported (see class TableLock ):
Table::resync. lock and unlock have to be used to acquire and release a (read or write) lock. Table::resync.
The function Table::hasDataChanged can be used to check if a table is (being) changed by another process. In this way a program can react on it. E.g. the table browser can refresh its screen when the underlying table is changed.
In general the default locking option will do. From the above it should be clear that heavy concurrent access results in a lot of flushing, thus will have a negative impact on performance. When uninterrupted access to a table is needed, the PermanentLocking option should be used. When transaction-like processing is done (e.g. updating a table containing an observation catalogue), the UserLocking option is probably best.
Creation or deletion of a table is not possible when that table is still open in another process. The function Table::isMultiUsed() can be used to check if a table is open in other processes.
The function deleteTable should be used to delete a table. Before deleting the table it ensures that it is writable and that it is not open in the current or another process
The following example wants to read the table uninterrupted, thus it uses the PermanentLocking option. It also wants to wait until the lock is actually acquired. Note that the destructor closes the table and releases the lock.
// Open the table (readonly). // Acquire a permanent (read) lock. // It waits until the lock is acquired. Table tab ("some.name", TableLock(TableLock::PermanentLockingWait));
The following example uses the automatic locking.\. It tells the system to check about every 20 seconds if another process wants access to the table.
// Open the table (readonly). Table tab ("some.name", TableLock(TableLock::AutoLocking, 20));
The following example gets data (say from a GUI) and writes it as a row into the table. The lock the table as little as possible the lock is acquired just before writing and released immediately thereafter.
// Open the table (writable). Table tab ("some.name", TableLock(TableLock::UserLocking), Table::Update); while (True) { get input data tab.lock(); // Acquire a write lock and wait for it. tab.addRow(); write data into the row tab.unlock(); // Release the lock. }
The following example deletes a table when it is not used in another process.
Table tab ("some.name"); if (! tab.isMultiUsed()) { tab.markForDelete(); }
Class ColumnsIndex offers the user a means to find the rows matching a given key or key range. It is a somewhat primitive replacement of a B-tree index and in the future it may be replaced by a proper B+-tree implementation.
The ColumnsIndex class makes it possible to build an in-core index on one or more columns. Looking a key or key range is done using a binary search on that index. It returns a vector containing the row numbers of the rows matching the key (range).
The class is not capable of tracing changes in the underlying column(s). It detects a change in the number of rows and updates the index accordingly. However, it has to be told explicitly when a value in the underlying column(s) changes.
The following example shows how the class can be used.
Suppose one has an antenna table with key ANTENNA.
// Open the table and make an index for column ANTENNA. Table tab("antenna.tab") ColumnsIndex colInx(tab, "ANTENNA"); // Make a RecordFieldPtr for the ANTENNA field in the index key record. // Its data type has to match the data type of the column. RecordFieldPtr<Int> antFld(colInx.accessKey(), "ANTENNA"); // Now loop in some way and find the row for the antenna // involved in that loop. Bool found; while (.\..) { // Fill the key field and get the row number. // ANTENNA is a unique key, so only one row number matches. // Otherwise function getRowNumbers had to be used. *antFld = antenna; uInt antRownr = colInx.getRowNumber (found); if (!found) { cout << "Antenna " << antenna << " is unknown" << endl; } else { // antRownr can now be used to get data from that row in // the antenna table. } }
ColumnsIndex itself contains a more advanced example. It shows how to use a private compare function to adjust the lookup when the index does not contain single key values, but intervals instead. This is useful when a row in a (sub)table is valid for, say, a time range instead of a single timestamp.
The Table System resembles a database system, but it is not as robust. It lacks the transaction and logging facilities common to data base systems. It means that in case of a crash data might be lost. To reduce the risk of data loss to a minimum, it is advisable to regularly do a flush, optionally with an fsync to ensure that all data are really written. However, that can degrade the performance because it involves extra writes. So one should find the right balance between robustness and performance.
To get a good feeling for the performance issues, it is important to understand some of the internals of the Table System.
The storage managers drive the performance. All storage managers use buckets (called tiles for the TiledStMan) which contain the data. All IO is done by bucket. The bucket/tile size is defined when creating the storage manager objects. Sometimes the default will do, but usually it is better to set it explicitly.
It is best to do a flush when a tile is full. For example:
When creating a MeasurementSet containing N antennae (thus N*(N-1) baselines or N*(N+1) if auto-correlations are stored as well) it makes sense to store, say, N/2 rows in a tile and do a flush each time all baselines are written. In that way tiles are fully filled when doing the flush, so no extra IO is involved.
Here is some code showing this when creating a MeasurementSet. The code should speak for itself.
MS* createMS (const String& msName, int nrchan, int nrant) { // Get the MS main default table description. TableDesc td = MS::requiredTableDesc(); // Add the data column and its unit. MS::addColumnToDesc(td, MS::DATA, 2); td.rwColumnDesc(MS::columnName(MS::DATA)).rwKeywordSet(). define("UNIT","Jy"); // Store the DATA and FLAG column in two separate files. // In this way accessing FLAG only is much cheaper than // when combining DATA and FLAG. // All data have the same shape, thus use TiledColumnStMan. // Also store UVW with TiledColumnStMan. Vector<String> tsmNames(1); tsmNames[0] = MS::columnName(MS::DATA); td.rwColumnDesc(tsmNames[0]).setShape (IPosition(2,itsNrCorr,itsNrFreq)); td.defineHypercolumn("TiledData", 3, tsmNames); tsmNames[0] = MS::columnName(MS::FLAG); td.rwColumnDesc(tsmNames[0]).setShape (IPosition(2,itsNrCorr,itsNrFreq)); td.defineHypercolumn("TiledFlag", 3, tsmNames); tsmNames[0] = MS::columnName(MS::UVW); td.defineHypercolumn("TiledUVW", 2, tsmNames); // Setup the new table. SetupNewTable newTab(msName, td, Table::New); // Most columns vary slowly and use the IncrStMan. IncrementalStMan incrStMan("ISMData"); // A few columns use he StandardStMan (set an appropriate bucket size). StandardStMan stanStMan("SSMData", 32768); // Store all pol and freq and some rows in a single tile. // autocorrelations are written, thus in total there are // nrant*(nrant+1)/2 baselines. Ensure a baseline takes up an // integer number of tiles. TiledColumnStMan tiledData("TiledData", IPosition(3,4,nchan,(nrant+1)/2)); TiledColumnStMan tiledFlag("TiledFlag", IPosition(3,4,nchan,8*(nrant+1)/2)); TiledColumnStMan tiledUVW("TiledUVW", IPosition(2,3,)); IPosition(2,3,nrant*(nrant+1)/2)); newTab.bindAll (incrStMan); newTab.bindColumn(MS::columnName(MS::ANTENNA1),stanStMan); newTab.bindColumn(MS::columnName(MS::ANTENNA2),stanStMan); newTab.bindColumn(MS::columnName(MS::DATA),tiledData); newTab.bindColumn(MS::columnName(MS::FLAG),tiledFlag); newTab.bindColumn(MS::columnName(MS::UVW),tiledUVW); // Create the MS and its subtables. // Get access to its columns. MS* msp = new MeasurementSet(newTab); // Create all subtables. // Do this after the creation of optional subtables, // so the MS will know about those optional sutables. msp->createDefaultSubtables (Table::New); return msp; }
Which storage managers to use and how to use them depends heavily on the type of data and the access patterns to the data. Here follow some guidelines:
Modules | |
| Tables_internal_classes | |
| Internal Tables classes and functions. | |
Classes | |
| class | casa::ROArrayColumn< T > |
| Readonly access to an array table column with arbitrary data type. More... | |
| class | casa::ArrayColumn< T > |
| Read/write access to an array table column with arbitrary data type. More... | |
| class | casa::ArrayColumnDesc< T > |
| Templated class for description of table array columns. More... | |
| class | casa::BaseMappedArrayEngine< VirtualType, StoredType > |
| Templated virtual column engine for a table array of any type. More... | |
| class | casa::ColumnDesc |
| Envelope class for the description of a table column. More... | |
| class | casa::ColumnsIndex |
| Index to one or more columns in a table. More... | |
| class | casa::ColumnsIndexArray |
| Index to an array column in a table. More... | |
| class | casa::CompressComplex |
| Virtual column engine to scale a table Complex array. More... | |
| class | casa::CompressComplexSD |
| Virtual column engine to scale a table Complex array for Single Dish data. More... | |
| class | casa::CompressFloat |
| Virtual column engine to scale a table float array. More... | |
| class | casa::DataManError |
| Base error class for table data manager. More... | |
| class | casa::DataManInternalError |
| Internal table data manager error. More... | |
| class | casa::DataManUnknownCtor |
| Table DataManager error; invalid data manager. More... | |
| class | casa::DataManInvDT |
| Table DataManager error; invalid data type. More... | |
| class | casa::DataManInvOper |
| Table DataManager error; invalid operation. More... | |
| class | casa::DataManUnknownVirtualColumn |
| Table DataManager error; unknown virtual column. More... | |
| class | casa::TSMError |
| Table DataManager error; error in TiledStMan. More... | |
| class | casa::TableExprNode |
| Handle class for a table column expression tree. More... | |
| class | casa::TableExprNodeSetElem |
| Class to hold the table expression nodes for an element in a set. More... | |
| class | casa::TableExprNodeSet |
| Class to hold multiple table expression nodes. More... | |
| class | casa::ForwardColumnEngine |
| Virtual column engine forwarding to other columns. More... | |
| class | casa::ForwardColumnIndexedRowEngine |
| Virtual column engine forwarding to other columns/rows. More... | |
| class | casa::IncrementalStMan |
| The Incremental Storage Manager. More... | |
| class | casa::ROIncrementalStManAccessor |
| Give access to some IncrementalStMan functions. More... | |
| class | casa::MappedArrayEngine< VirtualType, StoredType > |
| Templated virtual column engine to map the data type of a table array. More... | |
| class | casa::MemoryStMan |
| Memory-based table storage manager class. More... | |
| struct | casa::ReadAsciiTable_global_functions_readAsciiTable |
| Filling a table from an Ascii file. More... | |
| struct | casa::RecordExpr_global_functions_RecordExpr |
| Global functions to make a expression node for a record field. More... | |
| class | casa::RetypedArrayEngine< VirtualType, StoredType > |
| Virtual column engine to retype and reshape arrays. More... | |
| struct | casa::RetypedArraySetGet_global_functions_RetypedArrayEngineSetGet |
| Helper functions for users of RetypedArrayEngine. More... | |
| class | casa::RowCopier |
| RowCopier copies all or part of a row from one table to another. More... | |
| class | casa::ScalarColumnDesc< T > |
| Templated class to define columns of scalars in tables. More... | |
| class | casa::ROScalarColumn< T > |
| Readonly access to a scalar table column with arbitrary data type. More... | |
| class | casa::ScalarColumn< T > |
| Read/write access to a scalar table column with arbitrary data type. More... | |
| class | casa::ScaledArrayEngine< VirtualType, StoredType > |
| Templated virtual column engine to scale a table array. More... | |
| class | casa::ScaledComplexData< VirtualType, StoredType > |
| Templated virtual column engine to scale a complex table array. More... | |
| class | casa::ScalarRecordColumnDesc |
| Class to define columns of scalar records in tables. More... | |
| class | casa::SetupNewTable |
| Create a new table - define shapes, data managers, etc. More... | |
| class | casa::StandardStMan |
| The Standard Storage Manager. More... | |
| class | casa::ROStandardStManAccessor |
| Give access to some StandardStMan functions. More... | |
| class | casa::StManAipsIO |
| AipsIO table storage manager class. More... | |
| class | casa::SubTableDesc |
| Description of columns containing tables. More... | |
| class | casa::Table |
| Main interface class to a read/write table. More... | |
| class | casa::ROTableColumn |
| Readonly access to a table column. More... | |
| class | casa::TableColumn |
| Non-const access to a table column. More... | |
| class | casa::TableCopy |
| Class with static functions for copying a table. More... | |
| class | casa::TableDesc |
| Define the structure of an AIPS++ table. More... | |
| class | casa::TableError |
| Base error class for storage manager. More... | |
| class | casa::TableInternalError |
| Internal table error. More... | |
| class | casa::TableDuplFile |
| Table error; table (description) already exists. More... | |
| class | casa::TableNoFile |
| Table error; table (description) not found. More... | |
| class | casa::TableDescNoName |
| Table error; no name given to table description. More... | |
| class | casa::TableInvOpt |
| Table error; invalid table (description) option. More... | |
| class | casa::TableInvType |
| Table error; table type mismatch. More... | |
| class | casa::TableInvColumnDesc |
| Table error; invalid column description. More... | |
| class | casa::TableInvHyperDesc |
| Table error; invalid hypercolumn description. More... | |
| class | casa::TableUnknownDesc |
| Table error; unknown column description. More... | |
| class | casa::TableInvDT |
| Table error; invalid data type. More... | |
| class | casa::TableInvOper |
| Table error; invalid operation. More... | |
| class | casa::TableArrayConformanceError |
| Table error; non-conformant array. More... | |
| class | casa::TableConformanceError |
| Table error; table length conformance error. More... | |
| class | casa::TableInvSort |
| Table error; invalid sort. More... | |
| class | casa::TableInvLogic |
| Table error; invalid logical operation. More... | |
| class | casa::TableInvExpr |
| Table error; invalid select expression. More... | |
| class | casa::TableVectorNonConform |
| Table error; non-conformant table vectors. More... | |
| class | casa::TableParseError |
| Table error; invalid table command. More... | |
| class | casa::TableExprData |
| Abstract base class for data object in a TaQL expression. More... | |
| class | casa::TableExprId |
| The identification of a TaQL selection subject. More... | |
| class | casa::TableIndexProxy |
| Proxy for table index access. More... | |
| class | casa::TableIterator |
| Iterate through a Table. More... | |
| class | casa::TableIterProxy |
| Proxy for table iterator access. More... | |
| class | casa::TableLocker |
| Class to hold a (user) lock on a table. More... | |
| class | casa::TableProxy |
| High-level interface to tables. More... | |
| class | casa::TableRecord |
| A hierarchical collection of named fields of various types. More... | |
| class | casa::ROTableRow |
| Readonly access to a table row. More... | |
| class | casa::TableRow |
| Read/write access to a table row. More... | |
| class | casa::TableRowProxy |
| Proxy for table row access. More... | |
| class | casa::ROTableVector< T > |
| Templated readonly table column vectors. More... | |
| class | casa::TableVector< T > |
| Templated read/write table column vectors. More... | |
| struct | casa::TabVecMath_global_functions_basicMath |
| Basic math for table vectors. More... | |
| struct | casa::TabVecMath_global_functions_basicTransMath |
| Transcendental math for table vectors. More... | |
| struct | casa::TabVecMath_global_functions_advTransMath |
| Further transcendental math for table vectors. More... | |
| struct | casa::TabVecMath_global_functions_miscellaneous |
| Miscellaneous table vector operations. More... | |
| struct | casa::TabVecMath_global_functions_vectorMath |
| Vector operations on a table vector. More... | |
| class | casa::TiledCellStMan |
| Tiled Cell Storage Manager. More... | |
| class | casa::TiledColumnStMan |
| Tiled Column Storage Manager. More... | |
| class | casa::TiledDataStMan |
| Tiled Data Storage Manager. More... | |
| class | casa::TiledDataStManAccessor |
| Give access to some TiledDataStMan functions. More... | |
| class | casa::TiledFileAccess |
| Tiled access to an array in a file. More... | |
| class | casa::TiledShapeStMan |
| Tiled Data Storage Manager using the shape as id. More... | |
| class | casa::TiledStMan |
| Base class for Tiled Storage Manager classes. More... | |
| struct | casa::VirtScaCol_global_functions_getVirtualScalarColumn |
| Global functions to get or put data of a virtual column. More... | |
| class | casa::VirtualTaQLColumn |
| Virtual scalar column using TaQL. More... | |
| class | casa::VSCEngine< T > |
| Base virtual column for a scalar column with any type. More... | |
1.4.4