Both sides previous revision Previous revision Next revision | Previous revision |
dragnet:cluster_usage [2017-06-13 12:47] – [SLURM Cluster Management] fix typo amesfoort | dragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst |
---|
| |
===== Access and Login ===== | ===== Access and Login ===== |
To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels@astron.nl'').\\ | To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels[AT]astron[DOT]nl'').\\ |
Easiest is to ask him to send his permission to Teun Grit (''grit@astron.nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior(''sipior@astron.nl'') to add your account to DRAGNET.\\ | Easiest is to ask him to send his permission to the RO Sysadmins (''roadmin[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\ |
You can also provide Teun Grit your (e.g. home) IP(s) to add to a LOFAR portal white list if needed. | You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal white list if needed. |
| |
Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it): | Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it): |
Type ''module help'' for a list of ''module'' commands. | Type ''module help'' for a list of ''module'' commands. |
| |
List of available modules (June 2017): | List of available modules (July 2017): |
$ module avail | $ module avail |
| |
| |
---------------------------------------------------------------------------- /etc/modulefiles ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------- /etc/modulefiles ----------------------------------------------------------------------------- |
aoflagger/2.8.0 casa/4.7 casacore/2.1.0 cuda/7.0 karma/1.7.25 lofar/2.20.0 lofardal/current wsclean/1.12 | aoflagger/2.8.0 casa/4.7 casacore/2.0.3 casarest/current cuda/current lofar/2.20.0 lofardal/2.5.0 srm/2.6.28 wsclean/current |
aoflagger/2.9.0 casa/current casacore/current cuda/7.5 local-user-tools lofar/2.21.1 mpi/mpich-x86_64 wsclean/2.2.1 | aoflagger/2.9.0 casa/5.0 casacore/2.1.0 cuda/7.0 karma/1.7.25 lofar/2.21.1 lofardal/current wsclean/1.12 |
aoflagger/current casacore/2.0.1 casarest/1.4.1 cuda/8.0 lofar/2.16.2 lofar/current mpi/openmpi-x86_64 wsclean/2.4 | aoflagger/current casa/current casacore/current cuda/7.5 local-user-tools lofar/2.21.5 mpi/mpich-x86_64 wsclean/2.2.1 |
casa/4.6 casacore/2.0.3 casarest/current cuda/current lofar/2.17.5 lofardal/2.5.0 srm/2.6.28 wsclean/current | casa/4.6 casacore/2.0.1 casarest/1.4.1 cuda/8.0 lofar/2.17.5 lofar/current mpi/openmpi-x86_64 wsclean/2.4 |
| |
Add latest lofar module to your env: | Add latest lofar module to your env: |
http_proxy=lexar004.control.lofar:3128 https_proxy=lexar004.control.lofar:3128 wget --no-check-certificate https://lofar-webdav.grid.sara.nl/... | http_proxy=lexar004.control.lofar:3128 https_proxy=lexar004.control.lofar:3128 wget --no-check-certificate https://lofar-webdav.grid.sara.nl/... |
| |
//However//, atm you need to authenticate to this proxy. Get an account via the ASTRON "Science Operations & Support" group <sos@astron.nl> (sigh...)\\ | //However//, atm you need to authenticate to this proxy. Get an account via the ASTRON "Science Operations & Support" group <sos[AT]astron[DOT]nl> (sigh...)\\ |
Put that username and password in a ''.wgetrc'' file in your home directory: | Put that username and password in a ''.wgetrc'' file in your home directory: |
proxy-user=yourusername | proxy-user=yourusername |
$ scontrol resume 100 | $ scontrol resume 100 |
$ scontrol resume [1000,2000] | $ scontrol resume [1000,2000] |
| |
| ==== SLURM Troubleshooting ==== |
| == "Undraining" nodes == |
| |
| If you expect that there should be enough resources, but slurm submission fails because some nodes could be in "drain" state, you can check that by running "sinfo". You could see |
| something like this, where nodes drg06 and drg08 are in drain state: |
| |
| $ sinfo |
| PARTITION AVAIL TIMELIMIT NODES STATE NODELIST |
| workers* up infinite 2 drain drg[06,08] |
| workers* up infinite 1 mix drg01 |
| workers* up infinite 21 idle dragproc,drg[02-05,07,09-23] |
| head up infinite 1 idle dragnet |
| |
| To "undrain" e.g. drg08, you can do: |
| $ scontrol update NodeName=drg08 State=DOWN Reason="undraining" |
| $ scontrol update NodeName=drg08 State=RESUME |
| |