Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
public:processing_at_juropa [2014-06-05 15:49] – [Account] Stefan Froehlich | public:processing_at_juropa [2017-03-08 15:27] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== Juropa decommissioned ===== | ||
+ | The information below is only for backup and some install info might still be useful to other people. | ||
+ | The new system is Jureca and you can find the info here:\\ | ||
+ | [[http:// | ||
+ | |||
===== Installation and Processing on Juropa ====== | ===== Installation and Processing on Juropa ====== | ||
Here will be some notes on the Juropa installation and how to use the software on Juropa at the Juelich Supercomuting Centre.\\ | Here will be some notes on the Juropa installation and how to use the software on Juropa at the Juelich Supercomuting Centre.\\ | ||
Line 24: | Line 29: | ||
For Sara\\ | For Sara\\ | ||
[[https:// | [[https:// | ||
- | If you want to do a direct | + | \\ |
+ | The recommended way to copy data is via srm copy. For doing this you need a Grid Certificate | ||
+ | ==== Register with the Virtual Organization ==== | ||
+ | You can register with the Lofar VO here: | ||
+ | [[https:// | ||
==== Grid Certificate ==== | ==== Grid Certificate ==== | ||
To get direct srm copy access to the LTA storage you need a Grid Certificate.\\ | To get direct srm copy access to the LTA storage you need a Grid Certificate.\\ | ||
Line 31: | Line 41: | ||
[[http:// | [[http:// | ||
==== SRM Copy from Juropa ==== | ==== SRM Copy from Juropa ==== | ||
- | Here are only a few lines we need from Hanno Holties guide (Link is for reference. Whole guide not applicable | + | There are two possible ways to create proxies on Juropa. With the ltools and grid-proxy-init commands or with voms-proxy-init.\\ |
+ | The main difference | ||
+ | == grid-proxy-init == | ||
+ | To use the grid-proxies simple follow these steps:\\ | ||
Store your private key in '' | Store your private key in '' | ||
Execute:\\ < | Execute:\\ < | ||
Line 40: | Line 53: | ||
< | < | ||
Test data retrieval: | Test data retrieval: | ||
- | When copying data with the --jobfile option, keep in mind that there is a 30min cpu limit on the login node. Meaning your srmcp should be shorter that 30min. | + | When copying data with the --jobfile option, keep in mind that there is a 30min cpu limit on the login node. Meaning your srmcp should be shorter that 30min.\\ |
+ | \\ | ||
+ | == voms-proxy-init == | ||
+ | To use voms-proxies handle your keys and certificate as above and source the script | ||
+ | < | ||
+ | Optional: Set proxy environment variable to custom location:\\ < | ||
+ | Generate a proxy:\\ < | ||
+ | Test data retrieval: | ||
==== LOFAR Software ==== | ==== LOFAR Software ==== | ||
The LOFAR Software Framework is installed in the home directory of user htb003. You load the environment with | The LOFAR Software Framework is installed in the home directory of user htb003. You load the environment with | ||
Line 201: | Line 221: | ||
Good luck and let me know of any problems and feel free to give some | Good luck and let me know of any problems and feel free to give some | ||
feedback. | feedback. | ||
- | ===== LTA Pipeline Environment Libraries | + | ===== Old installation guide (still useful information, |
The following libraries with given versions are installed in the home of user htb003 on Juropa.\\ | The following libraries with given versions are installed in the home of user htb003 on Juropa.\\ | ||
Line 246: | Line 266: | ||
| casapy | 41.0.24668| | | casapy | 41.0.24668| | ||
- | ===== LTA Installation on Juropa | + | ==== LTA Installation on Juropa ===== |
The operating system is: | The operating system is: | ||
Line 260: | Line 280: | ||
The current working installation is in: | The current working installation is in: | ||
< | < | ||
- | / | + | / |
</ | </ | ||
Some things have to be Changed in order to compile and run everything on Juropa. | Some things have to be Changed in order to compile and run everything on Juropa. | ||
Line 596: | Line 616: | ||
Should maybe be changed to the working directory?! | Should maybe be changed to the working directory?! | ||
- | ====== Notes on Processing at the Juropa Cluster ====== | ||
- | This is going to be the Wiki page for the Lofar Software installation at | ||
- | the Juelich Supercomupting Centre.I will update this page as the installation progresses. | ||
- | |||
- | ==== Acquiring Data ==== | ||
- | Take a look at this site on how to get the data from the LTA\\ | ||
- | [[http:// | ||
- | To download data from the web you need the full filename. You can look those up in the catalog\\ | ||
- | [[http:// | ||
- | The Juelich Http download server is here\\ | ||
- | [[https:// | ||
- | For Sara\\ | ||
- | [[https:// | ||
- | If you want to do a direct srm copy you need a Grid Certificate. | ||
- | |||
- | ==== German Grid Certificate ==== | ||
- | To get direct srm copy access to the LTA storage you need a Grid Certificate.\\ | ||
- | [[http:// | ||
- | |||
- | ==== SRM Copy from Juropa ==== | ||
- | There are two methods.\\ | ||
- | **One**:\\ | ||
- | Follow this Walkthrough to generate a proxy for your srm download\\ | ||
- | [[http:// | ||
- | If you get the message "No user credentials found" you might need to convert your userkey.pem to a different format with this command | ||
- | < | ||
- | openssl rsa -des3 -in userkey.pem -out userkey.pem | ||
- | </ | ||
- | Make a backup before you do this.\\ | ||
- | **Two**:\\ | ||
- | You need to load the ltools module and execute the given command to activate the environment to use " | ||
- | < | ||
- | module load ltools | ||
- | . / | ||
- | </ | ||
- | Then you can generate a proxy with the command | ||
- | < | ||
- | voms-proxy-init --voms lofar:/ | ||
- | </ | ||
- | Notice that " | ||
- | If you want more output add the " | ||
- | |||
- | ==== Running the Software ==== | ||
- | (already outdated, will get updates when Lofar v1.16 is installed (week of 2.9.13 maybe?!))\\ | ||
- | Currently the software is beeing tested on the Juropa system. The Pipelines are working in single node modus. That means the subbands are computed in serial. One should submit multiple jobs with less subbands.\\ | ||
- | The software is available in the home directory of user htb003. The root | ||
- | path of the install is | ||
- | |||
- | / | ||
- | |||
- | You can find the lofar software in " | ||
- | The environment you need is loaded with the script " | ||
- | |||
- | In addition you might need a copy of the measurement data\\ | ||
- | / | ||
- | Put it in your home directory and point to it in a file .casarc (just contains:" | ||
- | [yourhome]/ | ||
- | |||
- | If you require access to the GlobalSkyModel database, there is a copy of | ||
- | the database from the CEP Cluster (hopefully) running on the Juropa | ||
- | login node jj28l02. Access the databse " | ||
- | " | ||
- | |||
- | How to keep the measurement and gsm data up to date and distributed has | ||
- | to be discussed | ||
- | |||
- | You can now run and test the executables on the login node from | ||
- | " | ||
- | in " | ||
- | |||
- | To run your jobs on the compute nodes you first have to setup and submit | ||
- | a job via the batch system. A detailed description can be found on the | ||
- | Juropa homepage | ||
- | ' | ||
- | |||
- | Here is a simple example of the procedure. | ||
- | Basically you use two scripts. One to configure the job and one to setup | ||
- | the environment for your program and run it.\\ | ||
- | Job configuration is pretty basic right now because we can only utilize | ||
- | one node per job. Do not get confused by the use of comments '#' | ||
- | '#' | ||
- | recognized from the Moab batch system.\\ | ||
- | You submit the job with the command "msub [yourscript]" | ||
- | status with "showq -u ' | ||
- | try the " | ||
- | |||
- | Contents of ' | ||
- | < | ||
- | #!/bin/bash -x | ||
- | #MSUB -N Lofar-test | ||
- | # just the name | ||
- | #MSUB -l nodes=1: | ||
- | #MSUB -l walltime=00: | ||
- | #MSUB -e error.txt | ||
- | # if keyword omitted : default is submitting directory | ||
- | #MSUB -o output.txt | ||
- | # if keyword omitted : default is submitting directory | ||
- | #MSUB -M your@mail.de | ||
- | #Mailadress | ||
- | #MSUB -m eab | ||
- | #send mail on end, abort, begin | ||
- | ./ | ||
- | </ | ||
- | |||
- | The walltime is the time your job will be running on the machine. If it | ||
- | is to low and the job is not finished it will be killed. Is it to high | ||
- | your job might have to wait longer in queue but only the real computing | ||
- | time will be booked.\\ | ||
- | The maximum walltime is 24h.\\ | ||
- | The number of nodes has to stay at 1 for the time being. You can | ||
- | experiment with ppn (process per node) which is used for openmp enabled | ||
- | programs.\\ | ||
- | It is best to name the log files error and output with some job specific | ||
- | parameters and maybe the date.\\ | ||
- | You can choose to have mails send to you about the status of your job. | ||
- | |||
- | |||
- | Contents of ' | ||
- | < | ||
- | #/bin/sh! | ||
- | #start of jobscript | ||
- | export OMP_NUM_THREADS=16 | ||
- | # | ||
- | # | ||
- | export PYTHONPATH=/ | ||
- | export PYTHONPATH=/ | ||
- | # | ||
- | export PATH=/ | ||
- | export PATH=/ | ||
- | export PATH=/ | ||
- | # | ||
- | export LD_LIBRARY_PATH=/ | ||
- | export LD_LIBRARY_PATH=/ | ||
- | export LD_LIBRARY_PATH=/ | ||
- | export LD_LIBRARY_PATH=/ | ||
- | # | ||
- | export LOFARROOT=/ | ||
- | # | ||
- | module load gsl | ||
- | module load GCC/4.6.3 | ||
- | # | ||
- | / | ||
- | </ | ||
- | |||
- | Simply replace the pipeline call with the command you want to run in | ||
- | your job. | ||
- | Example of Alexanders bbs test: | ||
- | / | ||
- | -v -n -f L104244_SB200_uv.dppp.MS BBS.parset skymodel.parset | ||
- | One important remark for your working directory. Use the Filesystem | ||
- | mounted under $WORK for your data and jobs.\\ | ||
- | From the Juropa home page: | ||
- | $WORK\\ | ||
- | File system for large temporary files with high I/O bandwidth demands | ||
- | (scratch file system). No backup of files residing here. Files not used | ||
- | for more than 28 days will be automatically deleted! | ||
- | |||
- | ==== Jobs in parallel ==== | ||
- | You can start one job for every independent piece of data. You can use your old scripts and the pipeline scripts but every subtask will be processed in serial on one node. So typically you only allocate one node for your jobs.\\ | ||
- | To circumvent this, start the subprocesses in the python scripts in a different manner. Use the mpiexec command to start your subprocess. The Parastation MPI Demon will then allocate free resources to your subprocess when available. For this behavior the environment variable PSI_WAIT has to be set. This means you can allocate the partition you want to work on with more than one node. Run your script and whenever you use a subprocess call use mpiexec with number of processes equal to one (np=1).\\ | ||
- | You can have up to 16 processes per node (eight cpus + smt mode). How many of these processes are allocated to your np=1 option depends on the number of threads you want to have for openMP. So for OMP_NUM_THREADS=4 you will be able to run 4 subprocesses on one node. With OMP_NUM_THREADS=16 one subprocess per node and with OMP_NUM_THREADS=1 you will have 16.\\ | ||
- | As an example lets look at a part of a script from Andreas (run_NDPPPs.py): | ||
- | The snippet shows the the subprocess call with subprocess.Popen and how their return is handled. What has to be executed is in the list of tupels " | ||
- | < | ||
- | while True: | ||
- | while cmds and len(processes) < max_task: | ||
- | task = cmds.pop() | ||
- | print time.asctime()," | ||
- | processes.append([Popen(task, | ||
- | if waittime: | ||
- | break | ||
- | for p in processes: | ||
- | if done(p[0]): | ||
- | if success(p[0]): | ||
- | os.remove(p[1]) | ||
- | processes.remove(p) | ||
- | else: | ||
- | fail() | ||
- | if not processes and not cmds: | ||
- | break | ||
- | else: | ||
- | time.sleep(sleeptime) | ||
- | </ | ||
- | To use multiple nodes on Juropa the command that is passed to popen has to be changed in the following way. The first argument is the executable followed by the arguments. The argument for "/ | ||
- | < | ||
- | for task in cmds: | ||
- | command = [" | ||
- | print command | ||
- | processes.append([Popen(command, | ||
- | |||
- | while True: | ||
- | for p in processes: | ||
- | if done(p[0]): | ||
- | if success(p[0]): | ||
- | os.remove(p[1]) | ||
- | processes.remove(p) | ||
- | else: | ||
- | print "Error in: ",p[1] | ||
- | os.remove(p[1]) | ||
- | processes.remove(p) | ||
- | |||
- | if not processes: | ||
- | break | ||
- | </ | ||
- | |||
- | |||
- | I hope these information are sufficient for some first tests and | ||
- | experiments.\\ | ||
- | Good luck and let me know of any problems and feel free to give some | ||
- | feedback. |