dragnet:cluster_usage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:cluster_usage [2017-06-13 12:47] – [SLURM Cluster Management] fix typo amesfoortdragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst
Line 7: Line 7:
  
 ===== Access and Login ===== ===== Access and Login =====
-To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels@astron.nl'').\\ +To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels[AT]astron[DOT]nl'').\\ 
-Easiest is to ask him to send his permission to Teun Grit (''grit@astron.nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior(''sipior@astron.nl'') to add your account to DRAGNET.\\ +Easiest is to ask him to send his permission to the RO Sysadmins (''roadmin[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\ 
-You can also provide Teun Grit your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.+You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.
  
 Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it): Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it):
Line 58: Line 58:
 Type ''module help'' for a list of ''module'' commands. Type ''module help'' for a list of ''module'' commands.
  
-List of available modules (June 2017):+List of available modules (July 2017):
   $ module avail   $ module avail
      
Line 65: Line 65:
      
   ---------------------------------------------------------------------------- /etc/modulefiles -----------------------------------------------------------------------------   ---------------------------------------------------------------------------- /etc/modulefiles -----------------------------------------------------------------------------
-  aoflagger/2.8.0    casa/4.7           casacore/2.1.0     cuda/7.0           karma/1.7.25       lofar/2.20.      lofardal/current   wsclean/1.12 +  aoflagger/2.8.0    casa/4.7           casacore/2.0.3     casarest/current   cuda/current       lofar/2.20.0       lofardal/2.5.0     srm/2.6.28         wsclean/current 
-  aoflagger/2.9.0    casa/current       casacore/current   cuda/7.5           local-user-tools   lofar/2.21.      mpi/mpich-x86_64   wsclean/2.2.1 +  aoflagger/2.9.0    casa/5.0           casacore/2.1.0     cuda/7.0           karma/1.7.25       lofar/2.21.      lofardal/current   wsclean/1.12 
-  aoflagger/current  casacore/2.0.1     casarest/1.4.1     cuda/8.0           lofar/2.16.      lofar/current      mpi/openmpi-x86_64 wsclean/2.4 +  aoflagger/current  casa/current       casacore/current   cuda/7.5           local-user-tools   lofar/2.21.      mpi/mpich-x86_64   wsclean/2.2.1 
-  casa/4.6           casacore/2.0.3     casarest/current   cuda/current       lofar/2.17.5       lofardal/2.5.0     srm/2.6.28         wsclean/current+  casa/4.6           casacore/2.0.1     casarest/1.4.1     cuda/8.0           lofar/2.17.      lofar/current      mpi/openmpi-x86_64 wsclean/2.4
  
 Add latest lofar module to your env: Add latest lofar module to your env:
Line 225: Line 225:
   http_proxy=lexar004.control.lofar:3128 https_proxy=lexar004.control.lofar:3128 wget --no-check-certificate https://lofar-webdav.grid.sara.nl/...   http_proxy=lexar004.control.lofar:3128 https_proxy=lexar004.control.lofar:3128 wget --no-check-certificate https://lofar-webdav.grid.sara.nl/...
  
-//However//, atm you need to authenticate to this proxy. Get an account via the ASTRON "Science Operations & Support" group <sos@astron.nl> (sigh...)\\+//However//, atm you need to authenticate to this proxy. Get an account via the ASTRON "Science Operations & Support" group <sos[AT]astron[DOT]nl> (sigh...)\\
 Put that username and password in a ''.wgetrc'' file in your home directory: Put that username and password in a ''.wgetrc'' file in your home directory:
   proxy-user=yourusername   proxy-user=yourusername
Line 332: Line 332:
   $ scontrol resume 100   $ scontrol resume 100
   $ scontrol resume [1000,2000]   $ scontrol resume [1000,2000]
 +  
 +==== SLURM Troubleshooting ====
 +== "Undraining" nodes ==
 +
 +If you expect that there should be enough resources, but slurm submission fails because some nodes could be in "drain" state, you can check that by running "sinfo". You could see 
 +something like this, where nodes drg06 and drg08 are in drain state:
 +
 +  $ sinfo
 +  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 +  workers*     up   infinite      2  drain drg[06,08]
 +  workers*     up   infinite      1    mix drg01
 +  workers*     up   infinite     21   idle dragproc,drg[02-05,07,09-23]
 +  head         up   infinite      1   idle dragnet
 +
 +To "undrain" e.g. drg08, you can do:
 +  $ scontrol update NodeName=drg08 State=DOWN Reason="undraining"
 +  $ scontrol update NodeName=drg08 State=RESUME
  
  • Last modified: 2017-06-13 12:47
  • by amesfoort