public:processing_at_juropa

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:processing_at_juropa [2013-08-30 13:45] Stefan Froehlichpublic:processing_at_juropa [2017-03-08 15:27] (current) – external edit 127.0.0.1
Line 1: Line 1:
-===== LTA Pipeline Environment Libraries ======+===== Juropa decommissioned ===== 
 +The information below is only for backup and some install info might still be useful to other people. 
 +The new system is Jureca and you can find the info here:\\ 
 +[[http://www.lofar.org/wiki/doku.php?id=public:lofar_processing_juelich]]
  
-The following libraries with given versions are installed in a local directory on Juropa (for now /lustre/jhome17/htb00/htb003/LOFAR-R14-P275   will be moved to the top level home directory of htb003 most likely).+===== Installation and Processing on Juropa ====== 
 +Here will be some notes on the Juropa installation and how to use the software on Juropa at the Juelich Supercomuting Centre.\\ 
 +Work in progress but useful anyhow. 
 +The next section Using Juropa for LOFAR Processing contains the information you need to use the system as of May 2014. The sections below these up to date information are kept for archiving purpose (still useful).  
 + 
 +===== Using Juropa for LOFAR Processing ====== 
 +Here are the most recent information on how to make use of Juropa for LOFAR Processing. 
 +Last edit June 2014. 
 +==== Account ==== 
 +First of all you need an account on the system. The Project leader is Matthias Hoeft and the Project ID is HTB00 (needed for registration). 
 +The following website contains all necessary links for allocating computing time in the Jülich Supercomputing Centre (JSC). 
 +Click on the link "User Accounts for projects on JUQUEEN, JUROPA,..." and follow the instructions.\\ 
 +[[http://www.fz-juelich.de/ias/jsc/EN/Expertise/Services/JSConline/ComputingTime/_node.html]]\\ 
 +german version:\\ 
 +[[http://www.fz-juelich.de/ias/jsc/DE/Leistungen/Dienstleistungen/JSCOnline/Rechenzeitvergabe/_node.html]]\\ 
 +Get in contact with Matthias so he can sign your account application and initiate the next steps. 
 +==== Acquiring Data ==== 
 +Take a look at this site on how to get the data from the LTA\\ 
 +[[http://www.lofar.org/operations/doku.php?id=public:lta_howto]]\\ 
 +To download data from the web you need the full filename. You can look those up in the catalog\\ 
 +[[http://lofar.target.rug.nl/Lofar]]\\ 
 +The Juelich Http download server is here\\ 
 +[[https://lofar-download.fz-juelich.de/]]\\ 
 +For Sara\\ 
 +[[https://lofar-download.grid.sara.nl/]]\\ 
 +\\ 
 +The recommended way to copy data is via srm copy. For doing this you need a Grid Certificate and to Register in the Virtual Organization (VO) as a Lofar User. 
 +==== Register with the Virtual Organization ==== 
 +You can register with the Lofar VO here: 
 +[[https://voms.grid.sara.nl:8443/voms/lofar]] 
 + 
 +==== Grid Certificate ==== 
 +To get direct srm copy access to the LTA storage you need a Grid Certificate.\\ 
 +Its best to ask around in your institute where to get and how to install such a certificate. 
 +General information about german grid certificates can be found here:\\ 
 +[[http://dgi-2.d-grid.de/zertifikate.php]] 
 +==== SRM Copy from Juropa ==== 
 +There are two possible ways to create proxies on Juropa. With the ltools and grid-proxy-init commands or with voms-proxy-init.\\ 
 +The main difference is the authentication at the VO server. The voms-proxy-init can directly get a user role as input where as the grid-proxy-init can not. For grid-proxy-init the user roles are compared to a local file that is synchronized with the SARA server once per day. So you have to wait until the next day after registering with the Lofar VO in order to use your proxies.\\ 
 +== grid-proxy-init == 
 +To use the grid-proxies simple follow these steps:\\ 
 +Store your private key in ''$HOME/.globus/userkey.pem''\\ 
 +Execute:\\ <code>chmod 600 $HOME/.globus/userkey.pem</code> 
 +Store your signed certificate in ''$HOME/.globus/usercert.pem''\\ 
 +Then you have to generate a proxy. Simply source the script\\ 
 +<code>. /lustre/jhome17/htb00/htb003/env_srm.sh</code> 
 +and then you can create your proxy\\ 
 +<code>grid-proxy-init -bits 2048</code> 
 +Test data retrieval:\\ <code>srmcp -server_mode=passive srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lofar/ops/fifotest/file1M file:///file1M</code> 
 +When copying data with the --jobfile option, keep in mind that there is a 30min cpu limit on the login node. Meaning your srmcp should be shorter that 30min.\\ 
 +\\ 
 +== voms-proxy-init == 
 +To use voms-proxies handle your keys and certificate as above and source the script  
 +<code>. /lustre/jhome17/htb00/htb003/lofar_grid/init_java6.sh</code> 
 +Optional: Set proxy environment variable to custom location:\\ <code>export X509_USER_PROXY=<proxy_location></code> 
 +Generate a proxy:\\ <code>voms-proxy-init -voms lofar:/lofar/user</code> 
 +Test data retrieval:\\ <code>srmcp -server_mode=passive srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lofar/ops/fifotest/file1M file:///file1M</code> 
 +==== LOFAR Software ==== 
 +The LOFAR Software Framework is installed in the home directory of user htb003. You load the environment with 
 +<code>. /lustre/jhome17/htb00/htb003/env_lofar.sh</code>\\ 
 +This loads Release version 2.1 (in the future probably always the latest release). You can also load 2.3 version (env_lofar_2.3.sh).\\ 
 +There is more software available: 
 + 
 +  * Casapy 4.2 -> env_casapy.sh 
 +  * Karma ->     env_karma.sh 
 +  * losoto ->    env_losoto.sh 
 + 
 +In addition you might need a copy of the measurement data\\ 
 +/lustre/jhome17/htb00/htb003/dataCEP\\ 
 +Put it in your home directory and point to it in a file .casarc (just contains:"measures.directory: 
 +[yourhome]/dataCEP"
 + 
 +If you require access to the GlobalSkyModel database, there is a copy of 
 +the database from the CEP Cluster (hopefully) running on the Juropa 
 +login node juropa02. Access the databse "gsm" on port 51000 with user 
 +"gsm" and pass "msss"\\ 
 +\\ 
 +You can now run and test the executables on the login node from 
 +"lofar/release/bin" or run python scripts (your own or pipeline scripts 
 +in "local/lib/python2.7/site-packages/lofarpipe/recipes"). 
 + 
 +To run your jobs on the compute nodes you first have to setup and submit 
 +a job via the batch system. A detailed description can be found on the 
 +Juropa homepage 
 +'http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUROPA/UserInfo/QuickIntroduction.html' 
 + 
 +Here is a simple example of the procedure. 
 +Basically you use two scripts. One to configure the job and one to setup 
 +the environment for your program and run it.\\  
 +Do not get confused by the use of comments '#'. The 
 +'#' in front of MSUB commands is necessary for the command to be 
 +recognized from the Moab batch system.\\  
 +You submit the job with the command "msub [yourscript]". Check your 
 +status with "showq -u 'username' ". To see the whole machine with a gui 
 +try the "llview" program. 
 + 
 +Contents of 'lofarmsub.sh': 
 +<file> 
 +#!/bin/bash -x 
 +#MSUB -N Lofar-test 
 +# just the name 
 +#MSUB -l nodes=1:ppn=16 
 +#MSUB -l walltime=00:30:00 
 +#MSUB -e error.txt 
 +# if keyword omitted : default is submitting directory 
 +#MSUB -o output.txt 
 +# if keyword omitted : default is submitting directory 
 +#MSUB -M your@mail.de 
 +#Mailadress 
 +#MSUB -m eab 
 +#send mail on end, abort, begin 
 +./lofarCalibratorPipelinePy2.7.sh 
 +</file> 
 + 
 +The walltime is the time your job will be running on the machine. If it 
 +is to low and the job is not finished it will be killed. Is it to high 
 +your job might have to wait longer in queue but only the real computing 
 +time will be booked.\\  
 +The maximum walltime is 24h.\\  
 +It is a good practice to name the log files error and output with some job specific 
 +parameters and maybe the date.\\  
 +You can choose to have mails send to you about the status of your job. 
 + 
 + 
 +Contents of 'lofarCalibratorPipelinePy2.7.sh' (the environment variables are the same as in the env_lofar.sh. you could use just that script first before using your custom environments): 
 +<file> 
 +#/bin/sh! 
 +#start of jobscript 
 +export OMP_NUM_THREADS=16 
 +
 +
 +export PYTHONPATH=/lustre/jhome17/htb00/htb003/local/lib/python2.7/site-packages:$PYTHONPATH 
 +export PYTHONPATH=/lustre/jhome17/htb00/htb003/lofar/release/lib/python2.7/site-packages:$PYTHONPATH 
 +
 +export PATH=/lustre/jhome17/htb00/htb003/local/bin:$PATH 
 +export PATH=/lustre/jhome17/htb00/htb003/lofar/release/bin:$PATH 
 +export PATH=/lustre/jhome17/htb00/htb003/lofar/release/sbin:$PATH 
 +
 +export LD_LIBRARY_PATH=/lustre/jhome17/htb00/htb003/lofar/release/lib:$LD_LIBRARY_PATH 
 +export LD_LIBRARY_PATH=/lustre/jhome17/htb00/htb003/lofar/release/lib64:$LD_LIBRARY_PATH 
 +export LD_LIBRARY_PATH=/lustre/jhome17/htb00/htb003/local/lib:$LD_LIBRARY_PATH 
 +export LD_LIBRARY_PATH=/lustre/jhome17/htb00/htb003/local/lib64:$LD_LIBRARY_PATH 
 +
 +export LOFARROOT=/lustre/jhome17/htb00/htb003 
 +
 +module load gsl 
 +module load GCC/4.6.3 
 +
 +/lustre/jhome17/htb00/htb003/lofar/release/bin/msss_target_pipeline.py /lustre/jhome17/htb00/htb003/pipeline_tests/Pipeline/target_pipeline/Observation64406 -c /lustre/jhome17/htb00/htb003/lofar/release/share/pipeline/pipeline.cfg --job target_test_omp16_descfile -d 
 +</file> 
 + 
 +Simply replace the pipeline call with the command you want to run in 
 +your job. 
 +Example of Alexanders bbs test: 
 +/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/bin/calibrate-stand-alone 
 +-v -n -f L104244_SB200_uv.dppp.MS BBS.parset skymodel.parset\\ 
 +One important remark for your working directory. Use the Filesystem 
 +mounted under $WORK for your data and jobs.\\  
 +From the Juropa home page:\\  
 +$WORK\\  
 +File system for large temporary files with high I/O bandwidth demands 
 +(scratch file system). No backup of files residing here. Files not used 
 +for more than 28 days will be automatically deleted! 
 +==== Jobs in parallel ==== 
 +(this section has to be reedited because of a bug involving the PSI_WAIT parameter. for working multinoe compatible scripts as Bjoern for the moment. I will edit this section with the correct information after 16.6.14) 
 +You can start one job for every independent piece of data. You can use your old scripts and the pipeline scripts but every subtask will be processed in serial on one node. So typically you only allocate one node for your jobs.\\ 
 +To circumvent this, start the subprocesses in the python scripts in a different manner. Use the mpiexec command to start your subprocess. The Parastation MPI Demon will then allocate free resources to your subprocess when available. For this behavior the environment variable PSI_WAIT has to be set. This means you can allocate the partition you want to work on with more than one node. Run your script and whenever you use a subprocess call use mpiexec with number of processes equal to one (np=1).\\ 
 +You can have up to 16 processes per node (eight cpus + smt mode). How many of these processes are allocated to your np=1 option depends on the number of threads you want to have for openMP. So for OMP_NUM_THREADS=4 you will be able to run 4 subprocesses on one node. With OMP_NUM_THREADS=16 one subprocess per node and with OMP_NUM_THREADS=1 you will have 16.\\ 
 +As an example lets look at a part of a script from Andreas (run_NDPPPs.py):(in progress)\\ 
 +The snippet shows the the subprocess call with subprocess.Popen and how their return is handled. What has to be executed is in the list of tupels "cmds". Where the first entry is the executable and the second a temporary parset file. Hence the os.remove after the return. The process is put into a list of processes and the function returns when this list is empty. 
 +<code> 
 +    while True: 
 +        while cmds and len(processes) < max_task: 
 +            task = cmds.pop() 
 +            print time.asctime()," : ",list2cmdline(task) 
 +            processes.append([Popen(task,env=myenv),task[1]]) 
 +            if waittime: 
 +                break 
 +        for p in processes: 
 +            if done(p[0]): 
 +                if success(p[0]): 
 +                    os.remove(p[1]) 
 +                    processes.remove(p) 
 +                else: 
 +                    fail() 
 +        if not processes and not cmds: 
 +            break 
 +        else: 
 +            time.sleep(sleeptime) 
 +</code> 
 +To use multiple nodes on Juropa the command that is passed to popen has to be changed in the following way. The first argument is the executable followed by the arguments. The argument for "/bin/sh" has to be passed as one string and not as additional argument in the list. In this example the command we want to run consists of the executable and its argument written as tupels in "cmds". The mpiexec is executed on one available slot "-np=1" which has the number of processes you specified with OMP_NUM_THREADS. The argument "-x" passes all environment variables to the process executed with mpiexec. Then we wait while there are elements left in the list of processes until all have returned. With the env variable PSI_WAIT=1 we can call more mpiexec than we have available slots. The mpi demon will handle the execution for us. 
 +<code> 
 +    for task in cmds: 
 +        command = ["mpiexec","-x","-np=1","/bin/sh", "-c", "hostname && "+task[0]+" "+task[1]] 
 +        print command 
 +        processes.append([Popen(command,env=myenv),task[1]]) 
 + 
 +    while True: 
 +        for p in processes: 
 +            if done(p[0]): 
 +                if success(p[0]): 
 +                    os.remove(p[1]) 
 +                    processes.remove(p) 
 +                else: 
 +                    print "Error in: ",p[1] 
 +                    os.remove(p[1]) 
 +                    processes.remove(p) 
 + 
 +        if not processes: 
 +            break 
 +</code> 
 + 
 + 
 +I hope these information are sufficient for some first tests and 
 +experiments.\\  
 +Good luck and let me know of any problems and feel free to give some 
 +feedback. 
 +===== Old installation guide (still useful information, but irrelevant for users) ====== 
 + 
 +The following libraries with given versions are installed in the home of user htb003 on Juropa.\\ 
 +/lustre/jhome17/htb00/htb003/ [local and lofar]
  
 ^ Library ^ Version ^ ^ Library ^ Version ^
Line 34: Line 257:
 | argparse | 1.2.1 | | argparse | 1.2.1 |
 | libiberty | | | libiberty | |
-| LOFAR | 1.14 |+| LOFAR | 1.16 | 
 + 
 +Additional software for post processing requested by users: 
 +^ Package ^ Version ^ 
 +| SIP | 4.15.1 | 
 +| PyQt4 | 4.10.3 | 
 +| iPython | 1.1.0 | 
 +| casapy | 41.0.24668|
  
-===== LTA Installation on Juropa ======+==== LTA Installation on Juropa =====
  
 The operating system is: The operating system is:
Line 50: Line 280:
 The current working installation is in: The current working installation is in:
 <code> <code>
-/lustre/jhome17/htb00/htb003/LOFAR-R14-P275+/lustre/jhome17/htb00/htb003/lofar/release
 </code> </code>
 Some things have to be Changed in order to compile and run everything on Juropa. Some things have to be Changed in order to compile and run everything on Juropa.
Line 82: Line 312:
 TypeError: No registered converter was able to produce a C++ rvalue of type int from this Python object of type numpy.int32 TypeError: No registered converter was able to produce a C++ rvalue of type int from this Python object of type numpy.int32
 </code> </code>
-This can be corrected by changing the order of the "pyrap.tables" import in the node script "imager_awimager.py"+This can be corrected by changing the order of the "pyrap.tables" import in the node script "imager_prepare.py"
  
 Change from: Change from:
Line 169: Line 399:
 Logging into compute nodes via ssh is not permitted on the system. Subprocesses have to be started on the one rented compute for now via shell or mpiexec command. Distribution to multiple nodes is in the works. Logging into compute nodes via ssh is not permitted on the system. Subprocesses have to be started on the one rented compute for now via shell or mpiexec command. Distribution to multiple nodes is in the works.
  
-lofarpipe/support/remotecommand.py has to be edited to circumvent ssh for locahost job spawning (svn diff useful?)+lofarpipe/support/remotecommand.py has to be edited to circumvent ssh for locahost job spawning (svn diff, see extra section)
  
 ==== File copy ==== ==== File copy ====
-Since the Juropa cluster uses a shared filesystem every data should be (read: HAS to be) present at job start to not waste computing time. The login nodes are supposed to be used for job preparation and analysis afterwards. So we do not need to copy data to the working directory (quota is limited!). The change for that is in lofarpipe/recipes/nodes/imager_prepare.py (again svn diff).+Since the Juropa cluster uses a shared filesystem every data should be (read: HAS to be) present at job start to not waste computing time. The login nodes are supposed to be used for job preparation and analysis afterwards. So we do not need to copy data to the working directory (quota is limited!). The change for that is in lofarpipe/recipes/nodes/imager_prepare.py (svn diff, see extra section).
  
 === Imaging Pipeline === === Imaging Pipeline ===
-Because the datacopy to the working directory will not be done automatically the data has to present in your working directory set in pipeline.cfg plus subfolder jobname. Somthing like working_dir/imaging_pipeline/subbands+Because the datacopy to the working directory will not be done automatically the data has to present in your working directory set in pipeline.cfg plus subfolder jobname. Somthing like working_dir/imaging_pipeline/subbands\\ 
 +In lofarpipe/recipes/nodes/imager_prepare.py in the call to rfi_console the "indirect_read" option has to be removed because of insufficient write access on the target machine (some folder you are not supposed to use as normal user) 
  
 ==== GSM Database ==== ==== GSM Database ====
Line 187: Line 419:
 How to install a local GSM Database take a look at this [[http://www.lofar.org/wiki/doku.php?id=lta:software_stack_installation#gsm_database_installation]] How to install a local GSM Database take a look at this [[http://www.lofar.org/wiki/doku.php?id=lta:software_stack_installation#gsm_database_installation]]
  
 +==== gsmutils.py ====
 +The changes made during release 1.14 (after initial release) break the functionality of the database access on Juropa. You can revert back to the initial release of 1.14 or use the fix mentioned below. The error is as follows:
 +<code>
 +ERROR:node.jj29l09.imager_create_dbs:expected_fluxes_in_fov raise exception: GDK reported error.
 +!BATfetchjoin(tmpr_2277,tmp_4347) does not hit always (|bn|=0 != 46216=|l|) => can't use fetchjoin.
  
-====== Notes on Processing at the Juropa Cluster ======+ERROR:node.jj29l09.imager_create_dbs:failed creating skymodel 
 +Traceback (most recent call last): 
 +  File "/lustre/jhome17/htb00/htb003/LOFAR-R14-P275/lofar/release/lib/python2.7/site-packages/lofarpipe/recipes/nodes/imager_create_dbs.py", line 470, in <module> 
 +    _jobid, _jobhost, _jobport).run_with_stored_arguments()) 
 +  File "/lustre/jhome17/htb00/htb003/LOFAR-R14-P275/lofar/release/lib/python2.7/site-packages/lofarpipe/support/lofarnode.py", line 85, in run_with_stored_arguments 
 +    returnvalue self.run_with_logging(*self.arguments) 
 +  File "/lustre/jhome17/htb00/htb003/LOFAR-R14-P275/lofar/release/lib/python2.7/site-packages/lofarpipe/support/lofarnode.py", line 59, in run_with_logging 
 +    return self.run(*args) 
 +  File "/lustre/jhome17/htb00/htb003/LOFAR-R14-P275/lofar/release/lib/python2.7/site-packages/lofarpipe/recipes/nodes/imager_create_dbs.py", line 71, in run 
 +    monet_db_password, assoc_theta) 
 +TypeError: 'int' object is not iterable 
 +</code> 
 +Bart Scheers provided a temporary fix for this issue. You need to change the configuration of your database. 
 +<code> 
 +Stop gsm database set nthreads property to 1:
  
-This is going to be the Wiki page for the Lofar Software installation at +monetdb stop gsm 
-the Juelich Supercomupting Centre.I will update this page as the installation progresses.+monetdb set nthreads=1 gsm 
 +monetdb start gsm 
 +</code>
  
-==== Acquiring Data ==== +==== imager_prepare.py ==== 
-Take a look at this site on how to get the data from the LTA\\ +Prevent datacopy when working on local host only. No "indirect_read" supported on Juropa.\\ 
-[[http://www.lofar.org/operations/doku.php?id=public:lta_howto]]\\ +SVN diff for lofarpipe/recipes/nodes/imager_prepare.py 
-To download data from the web you need the full filenameYou can look those up in the catalog\\ +<code> 
-[[http://lofar.target.rug.nl/Lofar]]\\ +IndexCEP/Pipeline/recipes/sip/nodes/imager_prepare.py 
-The Juelich Http download server is here\\ +=================================================================== 
-[[https://lofar-download.fz-juelich.de/]]\\ +--- CEP/Pipeline/recipes/sip/nodes/imager_prepare.py    (revision 25127) 
-For Sara\\ ++++ CEP/Pipeline/recipes/sip/nodes/imager_prepare.py    (working copy) 
-[[https://lofar-download.grid.sara.nl/]]\\ +@@ -10,6 +10,7 @@ 
-If you want to do a direct srm copy you need a Grid Certificate.+ import os 
 + import subprocess 
 + import copy 
 ++import pyrap.tables as pt 
 + from lofarpipe.support.pipelinelogging import CatchLog4CPlus 
 + from lofarpipe.support.pipelinelogging import log_time 
 + from lofarpipe.support.utilities import patch_parset 
 +@@ -19,7 +20,7 @@ 
 + from lofarpipe.support.data_map import DataMap 
 + from lofarpipe.support.subprocessgroup import SubProcessGroup 
 +  
 +-import pyrap.tables as pt 
 ++#import pyrap.tables as pt 
 +  
 + # Some constant settings for the recipe 
 + _time_slice_dir_name = "time_slices" 
 +@@ -140,37 +141,44 @@ 
 +             if input_item.skip =True: 
 +                 exit_status = 1 #  
 +  
 +-            # construct copy command 
 +-            command = ["rsync", "-r", "{0}:{1}".format( 
 +-                            input_item.host, input_item.file), 
 +-                               "{0}".format(processed_ms_dir)
 +
 ++           self.logger.debug(input_item.host) 
 ++           self.logger.debug(self.host) 
 ++            # skip the copy if machine is the same (execution on localhost). 
 ++           # make sure data is in the correct directory. for nowworking_dir/trunk_imager_regression/subbands 
 ++           if input_item.host != "localhost": 
 ++             
 ++                       # construct copy command 
 ++               command = ["rsync", "-r", "{0}:{1}".format( 
 ++                           input_item.host, input_item.file), 
 ++                           "{0}".format(processed_ms_dir)
 +  
 +-            self.logger.debug("executing: " + " ".join(command)) 
 ++               self.logger.debug("executing: " + " ".join(command)) 
 +  
 +-            # Spawn a subprocess and connect the pipes 
 +-            # The copy step is performed 720 at once in that case which might 
 +-            # saturate the cluster. 
 +-            copy_process = subprocess.Popen( 
 +-                        command, 
 +-                        stdin=subprocess.PIPE, 
 +-                        stdout=subprocess.PIPE, 
 +-                        stderr=subprocess.PIPE) 
 ++               # Spawn a subprocess and connect the pipes 
 ++               # The copy step is performed 720 at once in that case which might 
 ++               # saturate the cluster. 
 ++               copy_process = subprocess.Popen( 
 ++                               command, 
 ++                               stdin=subprocess.PIPE, 
 ++                               stdout=subprocess.PIPE, 
 ++                               stderr=subprocess.PIPE) 
 +  
 +-            # Wait for finish of copy inside the loopenforce single tread 
 +           # copy 
 +-            (stdoutdata, stderrdata) = copy_process.communicate() 
 ++               # Wait for finish of copy inside the loop: enforce single tread 
 ++               # copy 
 ++               (stdoutdata, stderrdata) = copy_process.communicate() 
 +  
 +           exit_status = copy_process.returncode 
 ++               exit_status = copy_process.returncode 
 +  
 +             #if copy failed log the missing file and update the skip fields  
 +-            if  exit_status != 0: 
 +               input_item.skip = True 
 +-                copied_item.skip = True 
 +-                self.logger.warning( 
 ++                if  exit_status != 0: 
 ++                       input_item.skip = True 
 ++                       copied_item.skip = True 
 ++                       self.logger.warning( 
 +                             "Failed loading file: {0}".format(input_item.file)) 
 +-                self.logger.warning(stderrdata) 
 ++                       self.logger.warning(stderrdata) 
 +  
 +-            self.logger.debug(stdoutdata) 
 ++                       self.logger.debug(stdoutdata) 
 +  
 +         return copied_ms_map 
 +  
 +@@ -298,7 +306,8 @@ 
 +  
 +                 # construct copy command 
 +                 self.logger.info(time_slice) 
 +-                command = [rficonsole_executable, "-indirect-read", 
 ++                command = [rficonsole_executable, 
 ++                           ## "-indirect-read", 
 +                             time_slice] 
 +                 self.logger.info("executing rficonsole command: {0}".format( 
 +                             " ".join(command))) 
 +</code>
  
-==== German Grid Certificate ==== +==== remotecommand.py ==== 
-To get direct srm copy access to the LTA storage you need a Grid Certificate.\\ +Extra Path variable for remote systems where python is not installed in the same place as on the master node.\\ 
-[[http://dgi-2.d-grid.de/zertifikate.php]] +Prevent ssh commands entirely as they are not supported on JuropaJust a switch for localhostSVN diff:
- +
-==== SRM Copy from Juropa ==== +
-You need to load the ltools module and execute the given command to activate the environment to use "srmcp"+
 <code> <code>
-module load ltools +Index: CEP/Pipeline/framework/lofarpipe/support/remotecommand.py 
-. /usr/local/lroot/etc/env.sh+=================================================================== 
 +--- CEP/Pipeline/framework/lofarpipe/support/remotecommand.py   (revision 25127) 
 ++++ CEP/Pipeline/framework/lofarpipe/support/remotecommand.py   (working copy) 
 +@@ -111,13 +111,29 @@ 
 +     process.kill = lambda : os.kill(process.pid, signal.SIGTERM) 
 +     return process 
 +  
 ++def run_via_local(logger, command, arguments): 
 ++    commandstring = ["/bin/sh","-c"
 ++    for arg in arguments: 
 ++        command = command + " " + str(arg) 
 ++    commandstring.append(command) 
 ++    process = spawn_process(commandstring, logger) 
 ++    process.kill = lambda : os.kill(process.pid, signal.SIGKILL) 
 ++    return process 
 +
 + def run_via_ssh(logger, host, command, environment, arguments): 
 +     """ 
 +     Dispatch a remote command via SSH. 
 +  
 +     We return a Popen object pointing at the SSH session, to which we add a 
 +     kill method for shutting down the connection if required. 
 +
 ++    hack/ 
 ++    if host is localhost run without ssh 
 ++    /hack 
 +     """ 
 ++    if host == "localhost": 
 ++        logger.debug("Running command locally"
 ++        return run_via_local(logger, command, arguments) 
 +     logger.debug("Dispatching command to %s with ssh" % host) 
 +     ssh_cmd = ["ssh", "-n", "-tt", "-x", host, "--", "/bin/sh", "-c"
 +  
 +@@ -214,6 +230,7 @@ 
 +                 self.host, 
 +                 self.command, 
 +                 { 
 ++                   "PATH": os.environ.get('PATH'), 
 +                     "PYTHONPATH": os.environ.get('PYTHONPATH'), 
 +                     "LD_LIBRARY_PATH": os.environ.get('LD_LIBRARY_PATH'
 +                 },
 </code> </code>
-Follow this Walkthrough to generate a proxy for your srm download 
-[[http://www.lofar.org/operations/doku.php?id=public:srmclientinstallation#walkthrough]] 
- 
-==== Running the Software ==== 
-(already outdated, will get updates when Lofar v1.16 is installed (week of 2.9.13 maybe?!))\\ 
-Currently the software is beeing tested on the Juropa system. Everything 
-that is part of the Calibration and the Target Pipeline is working. The 
-awimager causes problems but might be working in an experimental build 
-(details at the end).\\  
-The software is available in the home directory of user zdv596. The root 
-path of the install is 
- 
-/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14 
- 
-You can find the lofar software in "lofar/release". 
-The environment you need is loaded with the script "variables_lofar.sh" 
- 
-In addition you might need a copy of the measurement data 
-/lustre/jhome9/lofar/zdv596/dataCEP in your home directory and point to 
-it in a file .casarc (just contains:"measures.directory: 
-[yourhome]/dataCEP") 
- 
-If you require access to the GlobalSkyModel database, there is a copy of 
-the database from the CEP Cluster (hopefully) running on the Juropa 
-login node jj28l02. Access the databse "gsm" on port 51000 with user 
-"gsm" and pass "msss" 
- 
-How to keep the measurement and gsm data up to date and distributed has 
-to be discussed 
- 
-You can now run and test the executables on the login node from 
-"lofar/release/bin" or run python scripts (your own or pipeline scripts 
-in "local/lib/python2.7/site-packages/lofarpipe/recipes"). 
- 
-To run your jobs on the compute nodes you first have to setup and submit 
-a job via the batch system. A detailed description can be found on the 
-Juropa homepage 
-'http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUROPA/UserInfo/QuickIntroduction.html' 
- 
-Here is a simple example of the procedure. 
-Basically you use two scripts. One to configure the job and one to setup 
-the environment for your program and run it.\\  
-Job configuration is pretty basic right now because we can only utilize 
-one node per job. Do not get confused by the use of comments '#'. The 
-'#' in front of MSUB commands is necessary for the command to be 
-recognized from the Moab batch system.\\  
-You submit the job with the command "msub [yourscript]". Check your 
-status with "showq -u 'username' ". To see the whole machine with a gui 
-try the "llview" program. 
- 
-Contents of 'lofarmsub.sh': 
-<file> 
-#!/bin/bash -x 
-#MSUB -N Lofar-test 
-# just the name 
-#MSUB -l nodes=1:ppn=8 
-#MSUB -l walltime=00:30:00 
-#MSUB -e error.txt 
-# if keyword omitted : default is submitting directory 
-#MSUB -o output.txt 
-# if keyword omitted : default is submitting directory 
-#MSUB -M your@mail.de 
-#Mailadress 
-#MSUB -m eab 
-#send mail on end, abort, begin 
-./lofarCalibratorPipelinePy2.7.sh 
-</file> 
- 
-The walltime is the time your job will be running on the machine. If it 
-is to low and the job is not finished it will be killed. Is it to high 
-your job might have to wait longer in queue but only the real computing 
-time will be booked.\\  
-The maximum walltime is 24h.\\  
-The number of nodes has to stay at 1 for the time being. You can 
-experiment with ppn (process per node) which is used for openmp enabled 
-programs.\\  
-It is best to name the log files error and output with some job specific 
-parameters and maybe the date.\\  
-You can choose to have mails send to you about the status of your job. 
- 
- 
-Contents of 'lofarCalibratorPipelinePy2.7.sh': 
-<file> 
-#/bin/sh! 
-#start of jobscript 
-export OMP_NUM_THREADS=8 
-# 
-# 
-export 
-PYTHONPATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/local/lib/python2.7/site-packages:$PYTHONPATH 
-export 
-PYTHONPATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/lib/python2.7/site-packages:$PYTHONPATH 
-# 
-export PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/local/bin:$PATH 
-export 
-PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/bin:$PATH 
-export 
-PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/sbin:$PATH 
-# 
-export 
-LD_LIBRARY_PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/lib:$LD_LIBRARY_PATH 
-export 
-LD_LIBRARY_PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/lib64:$LD_LIBRARY_PATH 
-export 
-LD_LIBRARY_PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/local/lib:$LD_LIBRARY_PATH 
-export 
-LD_LIBRARY_PATH=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/local/lib64:$LD_LIBRARY_PATH 
-# 
-export LOFARROOT=/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14 
-# 
-/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/bin/msss_calibrator_pipeline.py 
-/lustre/jwork/lofar/zdv596/Pipeline_testdata/calibrator_pipeline/Observation64405 
--c 
-/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/share/pipeline/pipeline.cfg 
---job calibrator_branch_regression -d 
-</file> 
- 
-Simply replace the pipeline call with the command you want to run in 
-your job. 
-Example of Alexanders bbs test: 
-/lustre/jhome9/lofar/zdv596/LOFAR-Release-1_14/lofar/release/bin/calibrate-stand-alone 
--v -n -f L104244_SB200_uv.dppp.MS BBS.parset skymodel.parset 
-One important remark for your working directory. Use the Filesystem 
-mounted under $WORK for your data and jobs.\\  
-From the Juropa home page:\\  
-$WORK\\  
-File system for large temporary files with high I/O bandwidth demands 
-(scratch file system). No backup of files residing here. Files not used 
-for more than 28 days will be automatically deleted! 
- 
-If you want to try the awimager try another build directly in the home 
-directory /lustre/jhome9/lofar/zdv596/lofar/release.\\  
-There are some weird problems with pyrap and this version uses a 
-different compiler. The awimager seems to work but there are other 
-problems which extend has not been analyzed yet. 
- 
- 
-I hope these information are sufficient for some first tests and 
-experiments.\\  
-Good luck and let me know of any problems and feel free to give some 
-feedback. 
- 
  
 +==== copier.py ====
 +The copy process of the Intrument files used in the target pipeline has to be changed because rsync is not supported between nodes. Change to a simple copy command.\\
 +recipes/nodes/copier.py
 +<code>
 +53,56c53
 +<         if source_node=="localhost":
 +<             command = ["cp", "-r","{0}".format(source_path),"{0}".format(target_path)]
 +<         else:
 +<             command = ["rsync", "-r", 
 +---
 +>         command = ["rsync", "-r",
 +</code>
  
 +==== parset.py ====
 +Changed the "output_dir" in "patch_parset" to a directory visible from all nodes.\\
 +Should maybe be changed to the working directory?!
  
  
-on holidays for one week... after that more updates 
  • Last modified: 2013-08-30 13:45
  • by Stefan Froehlich