Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
dragnet:cluster_usage [2016-03-24 13:00] – add SLURM cmd to set node state from DOWN to IDLE amesfoort | dragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== DRAGNET Cluster Usage ====== | ====== DRAGNET Cluster Usage ====== | ||
- | Hope this helps... If not let me (Alexander) know (amesfoort@astron.nl). | + | Some non-obvious and DRAGNET hardware and setup specific info on using DRAGNET wrt logins, |
+ | |||
+ | Feel free to extend / improve! | ||
===== Access and Login ===== | ===== Access and Login ===== | ||
- | To get an account, | + | To get an account, |
- | With permission | + | Easiest is to ask him to send his permission |
+ | You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal | ||
- | Having an account, ssh to hostname '' | + | Having an account, ssh to hostname '' |
- | ssh USERNAME@dragnet | + | |
==== Password-less Login ==== | ==== Password-less Login ==== | ||
Within the cluster (or even to it), don't bother typing your password all the time. Passwords make cluster-wide commands a nightmare. Instead, use an ssh key pair: | Within the cluster (or even to it), don't bother typing your password all the time. Passwords make cluster-wide commands a nightmare. Instead, use an ssh key pair: | ||
- | ssh-keygen -t rsa # or copy an existing public key pair to .ssh/ | + | |
- | cat .ssh/ | + | |
- | chmod 600 .ssh/ | + | |
+ | (For completeness: | ||
- | Now test if this works by logging in and out to '' | + | To make login between nodes more reliable, you can disable the ssh host identification verification within DRAGNET. |
- | | + | It is overkill within our cluster |
+ | To disable, add to (or create) your '' | ||
+ | | ||
+ | |||
+ | Host dragnet dragnet.control.lofar dragproc dragproc-10g dragproc.control.lofar dragproc-10g.online.lofar drg?? drg?? | ||
+ | StrictHostKeyChecking no | ||
- | (For completeness: | + | Now test if password-less login works by logging in and out to '' |
+ | ssh drg23 exit | ||
+ | ===== Finding Applications ===== | ||
+ | To use most applications conveniently, | ||
- | ===== Hostname Hell and Routing Rampage ===== | ||
- | If you are just running some computations on DRAGNET, skip this section. But if you need fast networking, or are already deep in the slow data transfers and rapid-fire connection errors, here is some info that may save you time wrt the multiple networks and network interfaces. (Or just tell us your needs.) | ||
- | === Hostnames | + | ==== Practical Summary ==== |
- | * dragnet(.control.lofar) | + | On DRAGNET add to your .bashrc e.g.: |
- | | + | |
- | * drg01(.control.lofar) - drg23(.control.lofar) | + | or a similar list ('' |
- | === Networks === | + | Command to print the list to select from: |
- | | + | |
- | 10G network: | + | Re-login (or enter the '' |
- | Infiniband network (IPoIB): NODENAME-ib.dragnet.infiniband.lofar | + | |
- | (There is also a 1 Gb IPMI network.) | + | |
- | ==== Cross-Cluster ==== | + | If you want to keep using the same tool version instead of auto-upgrading along when updates are installed, then specify |
- | When going cross-cluster, prefer to use the fully-qualified domainnames | + | |
- | In most cases, you will use the network as deduced from the destination hostname or IP. Indicate a 10G name to use the 10G network. Idem for infiniband (IPoIB). (Exception: CEP 2, see below.) | ||
- | //Note//: Copying large data sets at high bandwidth to/from other clusters | + | ==== Using the Environment Modules ==== |
+ | The " | ||
+ | Your environment | ||
+ | You can further adjust it in .bashrc (Note that there is also .bash_profile and .profile. What to change for different login types varies among Linux distros and shells and documentation is not always matching reality...) | ||
+ | The complete, sorted list (1000s of lines) and (unexported) shell variables can be printed by typing '' | ||
- | === CEP 2 === | + | Type '' |
- | Initiate connections | + | |
- | The reverse, connecting from DRAGNET to CEP 2, by default will connect you via DRAGNET 1G (e.g. for login). To use 10G (e.g. to copy datasets), you need to bind to the local 10G interface name or IP. The program | + | List of available modules |
+ | $ module avail | ||
+ | |||
+ | --------------------------------------------------------------------- / | ||
+ | dot | ||
+ | |||
+ | ---------------------------------------------------------------------------- / | ||
+ | aoflagger/ | ||
+ | aoflagger/ | ||
+ | aoflagger/ | ||
+ | casa/ | ||
+ | |||
+ | Add latest lofar module to your env: | ||
+ | $ module add lofar # or a specific one e.g. module add lofar/2.17.5 | ||
+ | |||
+ | Remove module from your env (e.g. if it conflicts with another version you want to use): | ||
+ | $ module rm lofar | ||
+ | $ module purge # remove all added modules | ||
+ | |||
+ | To run the prefactor and factor imaging pipelines, you may want to only use the following command (do not add '' | ||
+ | $ module add local-user-tools wsclean/2.4 aoflagger/ | ||
+ | If you login and want to use CASA instead, better run ''/ | ||
+ | |||
+ | See what adding the '' | ||
+ | $ module show local-user-tools | ||
+ | ------------------------------------------------------------------- | ||
+ | / | ||
+ | |||
+ | module-whatis Adds tools, libraries and Python modules under /usr/local to your environment. | ||
+ | Pulsar tools : dspsr, psrcat, psrdada, pstfits, psrchive, tempo, tempo2, dedisp, sigproc, ffasearch, ephem, see, clig, ... | ||
+ | Imaging tools: factor, losoto, ds9, Duchamp, sagecal, excon imager, rmsynthesis, | ||
+ | prepend-path PATH / | ||
+ | prepend-path PYTHONPATH / | ||
+ | ------------------------------------------------------------------- | ||
Line 65: | Line 107: | ||
Example: | Example: | ||
- | cexec drg:3-5 "df -h" | + | |
- | cexec dragnet:23 ls # run ls on dragproc | + | |
- | cexec hostname | + | |
The hostname specifier (2nd optional argument) must contain a ':' | The hostname specifier (2nd optional argument) must contain a ':' | ||
Line 78: | Line 120: | ||
Examples of simple commands: | Examples of simple commands: | ||
- | ansible alldragnet -a 'df -h' | + | |
- | ansible proc: | + | |
- | ansible workers -f 25 -a 'ls -al / | + | |
- | ansible drg01:drg17 -a 'ls -l / | + | |
Apart from hostnames, the following hostname groups are also recognized on DRAGNET: '' | Apart from hostnames, the following hostname groups are also recognized on DRAGNET: '' | ||
The command must be a simple command. It can be the name of an executable shell script if accessible to all hosts, but not a compound shell command with &, &&, pipes or other descriptor redirection (you can of course run the shell with some argument, but then, what's the point of using ansible like that?). | The command must be a simple command. It can be the name of an executable shell script if accessible to all hosts, but not a compound shell command with &, &&, pipes or other descriptor redirection (you can of course run the shell with some argument, but then, what's the point of using ansible like that?). | ||
Line 91: | Line 133: | ||
==== Shell Loop and SSH ==== | ==== Shell Loop and SSH ==== | ||
Examples: | Examples: | ||
- | for ((i = 1; i <= 10; i++)); do host=$(printf | + | |
- | for host in drg01 drg17; do ssh $host "df -h"; done | + | |
Be careful with complex commands! | Be careful with complex commands! | ||
+ | |||
+ | ===== Data Copying ===== | ||
+ | Generic data copying info plus cluster specific subsections. | ||
+ | |||
+ | To copy large data sets between nodes or into / out of DRAGNET, you can use '' | ||
+ | $ scp -B -c arcfour < | ||
+ | |||
+ | The '' | ||
+ | $ bbcp -A -e -s 4 -B 4M -r -g -@ follow -v -y dd -- drg23-10g.online.lofar:/ | ||
+ | |||
+ | Notes: | ||
+ | * OpenSSH-6.7 no longer allows the '' | ||
+ | * The '' | ||
+ | * For '' | ||
+ | |||
+ | |||
+ | ==== Hostname Hell and Routing Rampage ==== | ||
+ | If you are just running some computations on DRAGNET, skip this section. But if you need fast networking, or are already deep in the slow data transfers and rapid-fire connection errors, here is some info that may save you time wrt the multiple networks and network interfaces. (Or just tell us your needs.) | ||
+ | |||
+ | ==== Hostnames === | ||
+ | Control network: | ||
+ | * dragnet(.control.lofar) | ||
+ | * dragproc(.control.lofar) | ||
+ | * drg01(.control.lofar) - drg23(.control.lofar) | ||
+ | |||
+ | 10G network: | ||
+ | * dragproc-10g(.online.lofar) | ||
+ | * drg01-10g(.online.lofar) - drg23-10g(.online.lofar) | ||
+ | |||
+ | Infiniband network (~54G): | ||
+ | * drg01-ib(.dragnet.infiniband.lofar) | ||
+ | |||
+ | (There is also a 1 Gb IPMI network.) | ||
+ | |||
+ | Note that for copying files between hard disks, there is some benefit to use the 10G network. If you have data to copy on ''/ | ||
+ | |||
+ | |||
+ | ==== Cross-Cluster ==== | ||
+ | When writing scripts that (also) have to work cross-cluster, | ||
+ | |||
+ | In most cases, you will use the network as deduced from the destination hostname or IP. Indicate a 10G name to use the 10G network. Idem for infiniband. (Exception: CEP 2, see below.) | ||
+ | |||
+ | //Note//: Copying large data sets at high bandwidth to/from other clusters (in particular CEP 2) may interfere with running observations as long as CEP 2 is still in use. If you are unsure, ask us. It is ok to use potentially oversubscribed links heavily, but please coordinate with Science Operations and Support! | ||
+ | |||
+ | |||
+ | === CEP 2 === | ||
+ | Initiate connections for e.g. data transfers from CEP 2 to HOSTNAME-10g.online.lofar to transfer via 10G. | ||
+ | |||
+ | The reverse, connecting from DRAGNET to CEP 2, by default will connect you via DRAGNET 1G (e.g. for login). To use 10G (e.g. to copy datasets), you need to bind to the local 10G interface name or IP. The program you are using has to support this via e.g. a command-line argument. | ||
+ | |||
+ | |||
+ | === CEP 3 === | ||
+ | Use the '' | ||
+ | |||
+ | |||
+ | === CEP 4 === | ||
+ | CEP 4 has a Lustre global file system. Copying data to DRAGNET is supposed to happen via '' | ||
+ | |||
+ | A Lustre mount has also been set up on DRAGNET, but the storage name is not mounted by default. | ||
+ | |||
+ | |||
+ | === External Hosts (also LTA Staged Data) === | ||
+ | |||
+ | To copy data sets from outside the LOFAR network (e.g. staged long-term archive data) into DRAGNET, there is unfortunately only 1 Gbit/s available that is shared with other LOFAR users. A 10G link may become available in the future. | ||
+ | |||
+ | There are 3 cases to distinguish: | ||
+ | |||
+ | == 1. Access external hosts (but not lofar-webdav.grid.sara.nl) from the LOFAR network == | ||
+ | This all uses the LOFAR portal / public internet link (1 Gbit/s). Since the LOFAR portal is used by all users to login, it is important not to overload it. Load is monitored and too hungry copying processes may be killed if they harm other users. | ||
+ | |||
+ | So please rate-limit your download from outside into DRAGNET and CEPx! A reasonable chunk of 1 Gbit/s is 400 Mbit/s (= 50 MByte/s), such that if somebody else does this too, there is still a bit of bandwidth for dozens of login sessions from other users. (Yes, this is hardly a foolproof strategy.) Please use: | ||
+ | $ scp -l 400000 ... # value in kbit/s | ||
+ | or | ||
+ | $ wget --limit-rate=50m ... # value in MByte/s | ||
+ | or | ||
+ | $ curl --limit-rate=50m ... # value in MByte/s | ||
+ | or | ||
+ | $ rsync --bwlimit=51200 ... # value in kByte/ | ||
+ | |||
+ | For those interested, you can use '' | ||
+ | |||
+ | == 2. Download via http(s) from lofar-webdav.grid.sara.nl to the LOFAR network == | ||
+ | |||
+ | A http(s) '' | ||
+ | |||
+ | http_proxy=lexar004.control.lofar: | ||
+ | |||
+ | // | ||
+ | Put that username and password in a '' | ||
+ | proxy-user=yourusername | ||
+ | proxy-password=yourpassword | ||
+ | then keep it reasonably private by making that file only accessible to you: | ||
+ | chmod 600 ~/.wgetrc | ||
+ | |||
+ | If you use this only for lofar-webdav.grid.sara.nl, | ||
+ | |||
+ | //Note:// This also works for other http(s) destinations than SurfSara servers, however, then you need to rate-limit your http(s) traffic as described above under **1.**. Do **not** use this for other LTA sites than SurfSara, as atm this interferes with data streams from some int'l station! | ||
+ | |||
+ | == 3. Between ASTRON internal 10.xx.xx.xx nodes and the LOFAR network == | ||
+ | Specifically for ASTRON hosts with an internal '' | ||
===== SLURM Job Submission ===== | ===== SLURM Job Submission ===== | ||
Line 103: | Line 245: | ||
* SLURM does not enforce accessing nodes through it; one can access any node via ssh. Depending on the intention and the current workload, that may be fine or less desirable. | * SLURM does not enforce accessing nodes through it; one can access any node via ssh. Depending on the intention and the current workload, that may be fine or less desirable. | ||
* SLURM has a ton of options that we haven' | * SLURM has a ton of options that we haven' | ||
- | |||
- | If you are having trouble using SLURM, please contact Alexander (amesfoort@astron.nl). | ||
==== Introduction: | ==== Introduction: | ||
From any DRAGNET node (typically the '' | From any DRAGNET node (typically the '' | ||
- | Run a single | + | Use '' |
- | $ srun -n 1 ls dir1 dir2 | + | $ srun --nodes=5 --nodelist=drg01, |
- | file1 | + | |
- | | + | |
- | | + | Use '' |
+ | $ sbatch --mail-type=END, | ||
+ | Submitted batch job < | ||
+ | The '' | ||
+ | \\ | ||
+ | Tip: use absolute path names and $HOME. | ||
\\ | \\ | ||
Line 126: | Line 271: | ||
JOBID PARTITION | JOBID PARTITION | ||
| | ||
+ | |||
+ | \\ | ||
+ | Show details of a specific job: | ||
+ | $ scontrol show job < | ||
+ | JobId=223058 JobName=wrap | ||
+ | | ||
\\ | \\ | ||
Show list and state of nodes. When submitting a job, you can indicate one of the partitions listed or a (not necessarily large enough) set of nodes that must be used. Please hesitate indefinitely when trying to submit insane loads to the '' | Show list and state of nodes. When submitting a job, you can indicate one of the partitions listed or a (not necessarily large enough) set of nodes that must be used. Please hesitate indefinitely when trying to submit insane loads to the '' | ||
- | $ sinfo | + | $ sinfo --all |
PARTITION AVAIL TIMELIMIT | PARTITION AVAIL TIMELIMIT | ||
- | workers* | + | workers* |
- | proc | + | |
head | head | ||
+ | lofarobs | ||
If you get an error on job submission that there are no resources in the cluster to ever satisfy your job, and you know this is wrong (no typo), you can see with the '' | If you get an error on job submission that there are no resources in the cluster to ever satisfy your job, and you know this is wrong (no typo), you can see with the '' | ||
+ | \\ | ||
+ | More detail: | ||
+ | $ sinfo -o "%10N %8z %8m %40f %10G %C" | ||
+ | NODELIST | ||
+ | dragnet,dr 1+: | ||
+ | drg[01-23] 2:8:1 128500 | ||
+ | where in the last column A = Allocated, I = Idle, O = Other, T = Total | ||
==== Hints on using more SLURM capabilities ==== | ==== Hints on using more SLURM capabilities ==== | ||
The sbatch(1) command offers to: | The sbatch(1) command offers to: | ||
Line 148: | Line 306: | ||
* either number of nodes or CPUs | * either number of nodes or CPUs | ||
* number of GPUs, if any needed. If no GPUs are requested, any GPU program will fail. (Btw, this policy is not fully as intended, so if technically it can be improved, we can look into it.) | * number of GPUs, if any needed. If no GPUs are requested, any GPU program will fail. (Btw, this policy is not fully as intended, so if technically it can be improved, we can look into it.) | ||
- | You do not have to indicate memory size, but if you don't, SLURM will grant you all the memory of a node, preventing other jobs from running on the same node(s). This may or may not be the intention. (If the intention, better use %%--%%exclusive.) | + | * In general, but no longer on DRAGNET or CEP4: if you want to run >1 job on a node at the same time, memory. Just reserve per job: 128500 / NJOBS_PER_NODE. By default, SLURM reserves |
- | Note that a '' | + | Note that a '' |
- | To indicate a scheduling resource constraint on 2 GPUs, use the --gres option (//gres// stands for //generic resource// | + | To indicate a scheduling resource constraint on 2 GPUs, use the %%--%%gres option (//gres// stands for //generic resource// |
$ srun --gres=gpu: | $ srun --gres=gpu: | ||
To indicate a list of nodes that must be used (list may be smaller than number of nodes requested). Some examples: | To indicate a list of nodes that must be used (list may be smaller than number of nodes requested). Some examples: | ||
- | $ srun --nodelist=drg02 ls | + | $ srun --nodelist=drg23 ls |
- | $ srun --nodelist=drg05-drg07, | + | $ srun --nodelist=drg05-drg07, |
$ srun --nodelist=./ | $ srun --nodelist=./ | ||
- | For the moment, see more explanation and examples at http://www.umbc.edu/ | + | For the moment, see more explanation and examples at http://hpcf.umbc.edu/ |
Please see the manual pages on srun(1), sbatch(1), salloc(1) and the [[http:// | Please see the manual pages on srun(1), sbatch(1), salloc(1) and the [[http:// | ||
- | |||
==== SLURM Cluster Management ==== | ==== SLURM Cluster Management ==== | ||
Line 169: | Line 326: | ||
Bring fixed node back to partition from state DOWN to state IDLE (logged in as slurm): | Bring fixed node back to partition from state DOWN to state IDLE (logged in as slurm): | ||
- | scontrol: update NodeName=drg02 state=idle | + | |
+ | |||
+ | Users can resume their (list of) job(s) after SLURM found it/they cannot be run (network errors or so) and sets the status to something like ' | ||
+ | This can also be executed by users for their own jobs. | ||
+ | $ scontrol resume 100 | ||
+ | $ scontrol resume [1000, | ||
+ | |||
+ | ==== SLURM Troubleshooting ==== | ||
+ | == " | ||
+ | |||
+ | If you expect that there should be enough resources, but slurm submission fails because some nodes could be in " | ||
+ | something like this, where nodes drg06 and drg08 are in drain state: | ||
+ | |||
+ | $ sinfo | ||
+ | PARTITION AVAIL TIMELIMIT | ||
+ | workers* | ||
+ | workers* | ||
+ | workers* | ||
+ | head | ||
+ | |||
+ | To " | ||
+ | $ scontrol update NodeName=drg08 State=DOWN Reason=" | ||
+ | $ scontrol update NodeName=drg08 State=RESUME | ||