Cuisine Framework.

A pipeline framework for the WSRT

document version 0.1

Adriaan Renting

1 Introduction

The usual view of a pipeline is a series of individual steps connected together and maybe grouped into higher order operations, with some inputs and some outputs. A schematic example of a pipeline is shown in the figure below:

The goal of the Cuisine framework is to make it easy to connect heterogeneous individual steps together, with a uniform way to communicate what the inputs and the outputs of each step are, group them together in higher level functions and redirect the messages they generate to the appropriate user interface of log file. The framework also has some capability to detect and halt a pipeline if a certain step reports a failure, and will be able to restart from a selected point in the future. In short it's an abstraction layer between the individual steps and any user interface.

2 Architecture

Because of the limited scope of the project the choice has been made to rely heavily on Python as a scripting language, including some features it provides. The framework is modeled after the concept of a kitchen, hence the name Cuisine. The model consists of four main parts: Recipes, Cooks, Ingredients and Messages.

Cook

The processing in the framework is done by cooks. These get ingredients and a task as inputs, and deliver a set of ingredients for further processing as output. when first constructed it also gets assigned a message handling object. Several specialized cooks exist to handle different types of tasks, like for example AIPS, Miriad, Glish or system calls. A special version is the cook that handles recipes, as those can contain calls to other cooks.

Each type of cook knows how to convert the ingredients to and from the format specific to the type of task it implements. It also know how catch the output of the assigned task and divert it to the message handing object.

Recipe

A recipe is basically a generic Python script that either does some functional step entirely in Python, or wraps one or more cooks that perform lower level tasks. Constructing a pipeline is basically writing a recipe in which cooks performing certain tasks are strung together. The flow of the data between the cooks is coded in Python by how the cooks inputs and outputs are passed between them.

Ingredient

The Ingredient is a collection of keyword value pairs, implemented as a Python dictionary. This is used both to define both the inputs and outputs for a task. This could be extended in the future.

Message

All cooks give the message output of their task to a message object. This message handler maintains a list of output devices to which to send this outputs. New output device can be added, and there verbosity level can be set, so for example the graphical user interface will only show errors, but everything up to the debug information is written into a log file at the same time.

The message object can also store this input, so a subsequent or higher level task can parse it to obtain information that the task might have provided.

3 Design

There are a couple of reasons that the framework of how to pass parameters from one task to the next was separated from the actual code that executes the task en performs the sequencing.

The technical details of how the framework operates can be different on another machine, like for example a cluster, or in a Grid, while the pipeline recipes can still use the same interface.
It hides the complexity needed to support different user interfaces from the end user writing or modifying a script.
It allows pipelines to be tasks within a pipeline with a larger scope. Without the lower level pipeline needing to be aware of this.
It allows to write small independent modules that can then be strung together to form higher level pipelines.
It allows for easy reuse of low level modules once written.

The framework is written in Python for a couple of reasons:

The scripts can be easily modified, because no recompiling is necessary.
The Python language is well supported in general and within the astronomical community.
Because Python always first tries to load a script from the local directory, you can easily customize one step in a pipeline, specific to a certain project, problem or dataset, but still use all the other pipelines and steps from the system wide library unaltered.
Python supports object oriented programming, but does not force you to program this way.

It has on purpose chosen an implementation that is a simple as possible, where few of the details are predefined. This is because the Python language allows you to write code like that, without bothering users with a lot of the details, and it allows the users to keep a lot of the flexibility that is inherent in the Python language.

Basically it requires you that if you write a pipeline that you want to be used as modules in larger pipelines, that you use the format of the recipe. That you execute external tasks using one of the predefined cooks, and pass input and output parameters around as ingredients.

4 User Interfaces

There are currently 3 interfaces under development:

Command-line interface

The recipe class is defined in such a way, that when a file defining a pipeline recipe is called from the system command line, it can parse the input ingredients as optional command line parameters, execute the steps defined in the script, write logging either to the screen or to a log file, and give the output ingredients as a list on the last line before finishing execution. The script will execute if at least all inputs without default values are specified on the command line.

This is currently fully operational. An example is given below:

10:26-118> tvclip_flagging.py

This function runs tvclip for automatic flagging

of a Miriad UV dataset. If flags both for amplitude

and for phase.

Step 2.6 in SWOM

Inputs:

MiriadUvFiles : [No Default: can be a list]

filepath : [Default: '.']

Outputs:

MiriadUvFiles

Use --help or -h for help, --verbose or -v for more verbose output

Commandline inputs can either be written in full or

abbreviated to one letter, example:

Inputs: number [No Default]

can be entered from the commandline as:

"--number=123456" or "-n123456"

lists can be entered as comma separated values for example:

"--filenames=test1.txt,test2.txt,test3.txt"

10:27-119> tvclip_flagging.py -M10405568_ts

2006-02-01 09:28:13 Notification: recipe /local/pipeline/python/WSRTpipeline_library/tvclip_flagging.py started

2006-02-01 09:28:35 Notification: recipe /local/pipeline/python/WSRTpipeline_library/tvclip_flagging.py completed

Results:

MiriadUvFiles = ['10405568_ts']

Interactive shell interface

Secondly there is a Python module that allows to use the different cooks interactively, in a fashion similar to the miriad or AIPS shell, from within the Python shell.

This is functional, but not yet tested very well. An example is given below:

>>> from pipelineCLI import *

2005-08-31 12:12:36 Message : Pipeline logging started in variable PipelineLog

>>> tvclip = MiriadCook('tvclip')

>>> tvclip.com = 'diff,clip'

>>> tvclip.op = 'notv'

>>> tvclip.v = '/dop77_0/renting/10403396_S0_T0.MIR'

>>> tvclip.inputs()

'commands': 'diff,clip' [No Default]

'taver': [No Default]

'tvcorn': [No Default]

'server': [No Default]

'vis': '/dop77_0/renting/10403396_S0_T1.MIR' [No Default]

'range': [No Default]

'mode': [No Default]

'clip': [No Default]

'line': [No Default]

'tvchan': [No Default]

'options': 'notv' [No Default]

'select': [No Default]

>>> tvclip.go()

2005-08-31 12:17:30 Notification: tvclip has ended with exitstatus: 0

Graphical user interface

There is a graphical user interface under development that also allows the user to define the pipeline recipes in a graphical way. This is not yet finished. An example is shown below: