Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
dragnet:cluster_usage [2016-09-09 12:45] – [Hostname Hell and Routing Rampage] 10 Gb -> 10G consistency amesfoort | dragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst | ||
---|---|---|---|
Line 7: | Line 7: | ||
===== Access and Login ===== | ===== Access and Login ===== | ||
- | To get an account, get permission from the Dragnet PI: Jason Hessels ('' | + | To get an account, get permission from the Dragnet PI: Jason Hessels ('' |
- | With permission | + | Easiest is to ask him to send his permission |
+ | You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal | ||
- | Having an account, ssh to hostname '' | + | Having an account, ssh to hostname '' |
$ ssh USERNAME@dragnet | $ ssh USERNAME@dragnet | ||
Line 19: | Line 20: | ||
$ cat .ssh/ | $ cat .ssh/ | ||
$ chmod 600 .ssh/ | $ chmod 600 .ssh/ | ||
- | (For completeness: | + | (For completeness: |
To make login between nodes more reliable, you can disable the ssh host identification verification within DRAGNET. | To make login between nodes more reliable, you can disable the ssh host identification verification within DRAGNET. | ||
Line 29: | Line 30: | ||
StrictHostKeyChecking no | StrictHostKeyChecking no | ||
- | Now test if password-less login works by logging in and out to '' | + | Now test if password-less login works by logging in and out to '' |
- | ssh drg01 exit | + | ssh drg23 exit |
===== Finding Applications ===== | ===== Finding Applications ===== | ||
Line 45: | Line 46: | ||
Re-login (or enter the '' | Re-login (or enter the '' | ||
- | If you want to keep using the same tool version instead of auto-upgrading along when updates are installed, then specify the versioned module name (when available), e.g. '' | + | If you want to keep using the same tool version instead of auto-upgrading along when updates are installed, then specify the versioned module name (when available), e.g. '' |
Line 57: | Line 58: | ||
Type '' | Type '' | ||
- | List of available modules (Aug 2016): | + | List of available modules (July 2017): |
$ module avail | $ module avail | ||
| | ||
- | | + | --------------------------------------------------------------------- / |
dot | dot | ||
| | ||
- | | + | ---------------------------------------------------------------------------- / |
- | aoflagger/ | + | aoflagger/ |
- | aoflagger/ | + | aoflagger/2.9.0 |
- | casa/ | + | aoflagger/ |
+ | casa/ | ||
Add latest lofar module to your env: | Add latest lofar module to your env: | ||
Line 76: | Line 78: | ||
To run the prefactor and factor imaging pipelines, you may want to only use the following command (do not add '' | To run the prefactor and factor imaging pipelines, you may want to only use the following command (do not add '' | ||
- | $ module add local-user-tools wsclean/1.12 aoflagger/ | + | $ module add local-user-tools wsclean/2.4 aoflagger/ |
If you login and want to use CASA instead, better run ''/ | If you login and want to use CASA instead, better run ''/ | ||
Line 92: | Line 94: | ||
- | ===== Copying Staged Data into DRAGNET | + | ===== Cluster-wide Commands |
+ | To run a command over many cluster nodes, use '' | ||
- | To copy data sets from outside the LOFAR network | + | * cexec (shell) runs any shell command in parallel. Output |
- | Since the portal | + | * ansible |
+ | * shell loop around ssh is most basic and possibly powerful wrt UNIX tools, but tricky wrt escaping, which remote environment values are actually | ||
- | So please rate-limit your download from outside into DRAGNET! A reasonable chunk of 1 Gbit/s is 400 Mbit/ | + | NOTE: be careful with potentially destructive operations like '' |
- | $ scp -l 400000 | + | |
- | | + | ==== C3 Cexec ==== |
- | $ wget --limit-rate=50m ... # value in MByte/s | + | The [[http:// |
- | or | + | |
- | $ curl --limit-rate=50m ... # value in MByte/s | + | Example: |
- | Rate-limited copying may take longer, but if the 1 Gbit/s portal link fills up, other users have problems working. A member | + | $ cexec drg:3-5 "df -h" |
+ | $ cexec dragnet:23 ls # run ls on dragproc | ||
+ | $ cexec hostname | ||
+ | |||
+ | The hostname specifier (2nd optional argument) must contain a ':' | ||
+ | The '' | ||
+ | Note that the hostname numbers here specify start and end index (starting at 0!). | ||
+ | |||
+ | |||
+ | ==== Ansible ==== | ||
+ | [[http:// | ||
+ | |||
+ | Examples of simple commands: | ||
+ | | ||
+ | $ ansible proc: | ||
+ | $ ansible workers | ||
+ | $ ansible drg01: | ||
+ | Apart from hostnames, the following hostname groups are also recognized on DRAGNET: '' | ||
+ | The command must be a simple command. It can be the name of an executable shell script if accessible to all hosts, but not a compound shell command with &, &&, pipes or other descriptor redirection (you can of course run the shell with some argument, but then, what's the point of using ansible like that?). | ||
+ | |||
+ | Background: Ansible heavily relies on the idea to specify what you want in terms of the desired situation rather than what to do to get there. Such // | ||
+ | |||
+ | For many common system admin related tasks, use an ansible module. Search | ||
+ | |||
+ | ==== Shell Loop and SSH ==== | ||
+ | Examples: | ||
+ | $ for host in $(seq -f drg%02g 1 10); do ssh $host " | ||
+ | $ for host in drg01 drg17; do ssh $host "df -h"; done # disk usage on drg01 and drg17 | ||
+ | |||
+ | Be careful with complex commands! | ||
+ | |||
+ | |||
+ | ===== Data Copying ===== | ||
+ | Generic data copying info plus cluster specific subsections. | ||
+ | |||
+ | To copy large data sets between nodes or into / out of DRAGNET, you can use '' | ||
+ | $ scp -B -c arcfour < | ||
+ | |||
+ | The '' | ||
+ | $ bbcp -A -e -s 4 -B 4M -r -g -@ follow -v -y dd -- drg23-10g.online.lofar:/ | ||
- | For those interested, you can use '' | + | Notes: |
+ | * OpenSSH-6.7 no longer allows the '' | ||
+ | * The '' | ||
+ | * For '' | ||
- | ===== Hostname Hell and Routing Rampage | + | ==== Hostname Hell and Routing Rampage ==== |
If you are just running some computations on DRAGNET, skip this section. But if you need fast networking, or are already deep in the slow data transfers and rapid-fire connection errors, here is some info that may save you time wrt the multiple networks and network interfaces. (Or just tell us your needs.) | If you are just running some computations on DRAGNET, skip this section. But if you need fast networking, or are already deep in the slow data transfers and rapid-fire connection errors, here is some info that may save you time wrt the multiple networks and network interfaces. (Or just tell us your needs.) | ||
- | === Hostnames === | + | ==== Hostnames === |
Control network: | Control network: | ||
* dragnet(.control.lofar) | * dragnet(.control.lofar) | ||
Line 121: | Line 167: | ||
* drg01-10g(.online.lofar) - drg23-10g(.online.lofar) | * drg01-10g(.online.lofar) - drg23-10g(.online.lofar) | ||
- | Infiniband network: | + | Infiniband network |
* drg01-ib(.dragnet.infiniband.lofar) | * drg01-ib(.dragnet.infiniband.lofar) | ||
(There is also a 1 Gb IPMI network.) | (There is also a 1 Gb IPMI network.) | ||
- | Note that for copying files between hard disks, there is some benefit to use the 10G network, | + | Note that for copying files between hard disks, there is some benefit to use the 10G network. If you have data to copy on ''/ |
==== Cross-Cluster ==== | ==== Cross-Cluster ==== | ||
- | When writing scripts that (also) have to work cross-cluster, | + | When writing scripts that (also) have to work cross-cluster, |
- | In most cases, you will use the network as deduced from the destination hostname or IP. Indicate a 10G name to use the 10G network. Idem for infiniband | + | In most cases, you will use the network as deduced from the destination hostname or IP. Indicate a 10G name to use the 10G network. Idem for infiniband. (Exception: CEP 2, see below.) |
- | //Note//: Copying large data sets at high bandwidth to/from other clusters (in particular CEP 2) may interfere with running observations as long as CEP 2 is still in use. If you are unsure, ask us. It is ok to use potentially oversubscribed links heavily, but please coordinate with Science Support! | + | //Note//: Copying large data sets at high bandwidth to/from other clusters (in particular CEP 2) may interfere with running observations as long as CEP 2 is still in use. If you are unsure, ask us. It is ok to use potentially oversubscribed links heavily, but please coordinate with Science |
=== CEP 2 === | === CEP 2 === | ||
- | Initiate connections for e.g. data transfers from CEP 2 to HOSTNAME-10g.online.lofar | + | Initiate connections for e.g. data transfers from CEP 2 to HOSTNAME-10g.online.lofar |
The reverse, connecting from DRAGNET to CEP 2, by default will connect you via DRAGNET 1G (e.g. for login). To use 10G (e.g. to copy datasets), you need to bind to the local 10G interface name or IP. The program you are using has to support this via e.g. a command-line argument. | The reverse, connecting from DRAGNET to CEP 2, by default will connect you via DRAGNET 1G (e.g. for login). To use 10G (e.g. to copy datasets), you need to bind to the local 10G interface name or IP. The program you are using has to support this via e.g. a command-line argument. | ||
- | ===== Cluster-wide Commands ===== | + | === CEP 3 === |
- | To run a command over many cluster nodes, use '' | + | Use the '' |
- | * cexec (shell) runs any shell command in parallel. Output is sorted and only appears after all nodes finished. Indexed hostname specification. | ||
- | * ansible (Python) is easy with simple commands or with Ansible modules to support idempotent changes. Easy integration in Python programs. No sorted output, but node output appears when a node is done. No shell interpretation of commands, which may be a restriction or rather safe. Can run commands in parallel. Tailored for system administration, | ||
- | * shell loop around ssh is most basic and possibly powerful wrt UNIX tools, but tricky wrt escaping, which remote environment values are actually used, and for dealing correctly with filename corner cases. Scripts easily end up shell specific (e.g. bash vs tcsh). | ||
- | NOTE: be careful with potentially destructive operations like '' | + | === CEP 4 === |
+ | CEP 4 has a Lustre global file system. Copying data to DRAGNET is supposed to happen via '' | ||
- | ==== C3 Cexec ==== | + | A Lustre mount has also been set up on DRAGNET, but the storage name is not mounted by default. |
- | The [[http:// | + | |
- | Example: | ||
- | $ cexec drg:3-5 "df -h" | ||
- | $ cexec dragnet:23 ls # run ls on dragproc | ||
- | $ cexec hostname | ||
- | The hostname specifier | + | === External Hosts (also LTA Staged Data) === |
- | The '' | + | |
- | Note that the hostname numbers here specify start and end index (starting at 0!). | + | |
+ | To copy data sets from outside the LOFAR network (e.g. staged long-term archive data) into DRAGNET, there is unfortunately only 1 Gbit/s available that is shared with other LOFAR users. A 10G link may become available in the future. | ||
- | ==== Ansible ==== | + | There are 3 cases to distinguish: |
- | [[http:// | + | |
- | Examples of simple commands: | + | == 1. Access external hosts (but not lofar-webdav.grid.sara.nl) from the LOFAR network == |
- | $ ansible alldragnet -a 'df -h' | + | This all uses the LOFAR portal / public internet link (1 Gbit/s). Since the LOFAR portal is used by all users to login, it is important |
- | $ ansible proc: | + | |
- | $ ansible workers -f 25 -a 'ls -al / | + | |
- | $ ansible drg01:drg17 -a 'ls -l / | + | |
- | Apart from hostnames, | + | |
- | The command must be a simple command. It can be the name of an executable shell script if accessible | + | |
- | Background: Ansible heavily relies on the idea to specify what you want in terms of the desired situation rather than what to do to get there. Such //idempotent// commands work correctly regardless whether some nodes are already ok or different. To this end ansible has numerous modules to manipulate system settings in an easy way, but you can also write your own modules (e.g. to reinstall | + | So please rate-limit your download from outside into DRAGNET and CEPx! A reasonable chunk of 1 Gbit/s is 400 Mbit/s (= 50 MByte/s), such that if somebody else does this too, there is still a bit of bandwidth for dozens of login sessions from other users. (Yes, this is hardly |
+ | $ scp -l 400000 ... # value in kbit/s | ||
+ | | ||
+ | $ wget --limit-rate=50m ... # value in MByte/s | ||
+ | or | ||
+ | $ curl --limit-rate=50m ... # value in MByte/s | ||
+ | or | ||
+ | $ rsync --bwlimit=51200 | ||
- | For many common system admin related tasks, use an ansible module. Search | + | For those interested, you can use '' |
- | ==== Shell Loop and SSH ==== | + | == 2. Download via http(s) from lofar-webdav.grid.sara.nl to the LOFAR network |
- | Examples: | + | |
- | $ for ((i = 1; i <= 10; i++)); do host=$(printf drg%02u $i); ssh $host "df -h"; done # disk usage on the drg01-drg10 nodes | + | A http(s) '' |
- | $ for host in drg01 drg17; do ssh $host "df -h"; done # disk usage on drg01 and drg17 | + | |
- | Be careful | + | http_proxy=lexar004.control.lofar: |
+ | |||
+ | // | ||
+ | Put that username and password in a '' | ||
+ | proxy-user=yourusername | ||
+ | proxy-password=yourpassword | ||
+ | then keep it reasonably private by making that file only accessible to you: | ||
+ | chmod 600 ~/.wgetrc | ||
+ | |||
+ | If you use this only for lofar-webdav.grid.sara.nl, | ||
+ | |||
+ | //Note:// This also works for other http(s) destinations than SurfSara servers, however, then you need to rate-limit your http(s) traffic as described above under **1.**. Do **not** use this for other LTA sites than SurfSara, as atm this interferes | ||
+ | == 3. Between ASTRON internal 10.xx.xx.xx nodes and the LOFAR network == | ||
+ | Specifically for ASTRON hosts with an internal '' | ||
===== SLURM Job Submission ===== | ===== SLURM Job Submission ===== | ||
Line 198: | Line 249: | ||
From any DRAGNET node (typically the '' | From any DRAGNET node (typically the '' | ||
- | Use '' | + | Use '' |
$ srun --nodes=5 --nodelist=drg01, | $ srun --nodes=5 --nodelist=drg01, | ||
dir1 dir2 file1 file2 [...] | dir1 dir2 file1 file2 [...] | ||
- | Use '' | + | Use '' |
$ sbatch --mail-type=END, | $ sbatch --mail-type=END, | ||
Submitted batch job < | Submitted batch job < | ||
Line 229: | Line 280: | ||
\\ | \\ | ||
Show list and state of nodes. When submitting a job, you can indicate one of the partitions listed or a (not necessarily large enough) set of nodes that must be used. Please hesitate indefinitely when trying to submit insane loads to the '' | Show list and state of nodes. When submitting a job, you can indicate one of the partitions listed or a (not necessarily large enough) set of nodes that must be used. Please hesitate indefinitely when trying to submit insane loads to the '' | ||
- | $ sinfo | + | $ sinfo --all |
PARTITION AVAIL TIMELIMIT | PARTITION AVAIL TIMELIMIT | ||
- | workers* | + | workers* |
- | proc | + | |
head | head | ||
+ | lofarobs | ||
If you get an error on job submission that there are no resources in the cluster to ever satisfy your job, and you know this is wrong (no typo), you can see with the '' | If you get an error on job submission that there are no resources in the cluster to ever satisfy your job, and you know this is wrong (no typo), you can see with the '' | ||
Line 239: | Line 290: | ||
More detail: | More detail: | ||
$ sinfo -o "%10N %8z %8m %40f %10G %C" | $ sinfo -o "%10N %8z %8m %40f %10G %C" | ||
- | NODELIST | + | NODELIST |
+ | dragnet,dr 1+: | ||
drg[01-23] 2:8:1 128500 | drg[01-23] 2:8:1 128500 | ||
- | dragnet,dr 1+:4+:1+ 31800+ | ||
where in the last column A = Allocated, I = Idle, O = Other, T = Total | where in the last column A = Allocated, I = Idle, O = Other, T = Total | ||
==== Hints on using more SLURM capabilities ==== | ==== Hints on using more SLURM capabilities ==== | ||
Line 255: | Line 306: | ||
* either number of nodes or CPUs | * either number of nodes or CPUs | ||
* number of GPUs, if any needed. If no GPUs are requested, any GPU program will fail. (Btw, this policy is not fully as intended, so if technically it can be improved, we can look into it.) | * number of GPUs, if any needed. If no GPUs are requested, any GPU program will fail. (Btw, this policy is not fully as intended, so if technically it can be improved, we can look into it.) | ||
- | * if you want to run >1 job on a node at the same time, memory. Just reserve per job: 128500 / NJOBS_PER_NODE. By default, SLURM reserves all the memory of a node, preventing other jobs from running on the same node(s). This may or may not be the intention. (If the intention, better use %%--%%exclusive.) | + | * In general, but no longer on DRAGNET or CEP4: if you want to run >1 job on a node at the same time, memory. Just reserve per job: 128500 / NJOBS_PER_NODE. By default, SLURM reserves all the memory of a node, preventing other jobs from running on the same node(s). This may or may not be the intention. (If the intention, better use %%--%%exclusive.) |
Note that a '' | Note that a '' | ||
- | To indicate a scheduling resource constraint on 2 GPUs, use the --gres option (//gres// stands for //generic resource// | + | To indicate a scheduling resource constraint on 2 GPUs, use the %%--%%gres option (//gres// stands for //generic resource// |
$ srun --gres=gpu: | $ srun --gres=gpu: | ||
To indicate a list of nodes that must be used (list may be smaller than number of nodes requested). Some examples: | To indicate a list of nodes that must be used (list may be smaller than number of nodes requested). Some examples: | ||
- | $ srun --nodelist=drg02 ls | + | $ srun --nodelist=drg23 ls |
- | $ srun --nodelist=drg05-drg07, | + | $ srun --nodelist=drg05-drg07, |
$ srun --nodelist=./ | $ srun --nodelist=./ | ||
Line 275: | Line 326: | ||
Bring fixed node back to partition from state DOWN to state IDLE (logged in as slurm): | Bring fixed node back to partition from state DOWN to state IDLE (logged in as slurm): | ||
- | $ scontrol update NodeName=drg02 state=idle | + | $ scontrol update NodeName=drg23 state=idle |
Users can resume their (list of) job(s) after SLURM found it/they cannot be run (network errors or so) and sets the status to something like ' | Users can resume their (list of) job(s) after SLURM found it/they cannot be run (network errors or so) and sets the status to something like ' | ||
- | This can also be exectured | + | This can also be executed |
$ scontrol resume 100 | $ scontrol resume 100 | ||
$ scontrol resume [1000,2000] | $ scontrol resume [1000,2000] | ||
+ | | ||
+ | ==== SLURM Troubleshooting ==== | ||
+ | == " | ||
+ | |||
+ | If you expect that there should be enough resources, but slurm submission fails because some nodes could be in " | ||
+ | something like this, where nodes drg06 and drg08 are in drain state: | ||
+ | |||
+ | $ sinfo | ||
+ | PARTITION AVAIL TIMELIMIT | ||
+ | workers* | ||
+ | workers* | ||
+ | workers* | ||
+ | head | ||
+ | |||
+ | To " | ||
+ | $ scontrol update NodeName=drg08 State=DOWN Reason=" | ||
+ | $ scontrol update NodeName=drg08 State=RESUME | ||