dragnet:cluster_usage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:cluster_usage [2017-07-19 11:25] – [Using the Environment Modules] add CASA 5.0 to module avail list amesfoortdragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst
Line 7: Line 7:
  
 ===== Access and Login ===== ===== Access and Login =====
-To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels@astron.nl'').\\ +To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels[AT]astron[DOT]nl'').\\ 
-Easiest is to ask him to send his permission to Teun Grit (''grit@astron.nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior(''sipior@astron.nl'') to add your account to DRAGNET.\\ +Easiest is to ask him to send his permission to the RO Sysadmins (''roadmin[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\ 
-You can also provide Teun Grit your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.+You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.
  
 Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it): Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it):
Line 225: Line 225:
   http_proxy=lexar004.control.lofar:3128 https_proxy=lexar004.control.lofar:3128 wget --no-check-certificate https://lofar-webdav.grid.sara.nl/...   http_proxy=lexar004.control.lofar:3128 https_proxy=lexar004.control.lofar:3128 wget --no-check-certificate https://lofar-webdav.grid.sara.nl/...
  
-//However//, atm you need to authenticate to this proxy. Get an account via the ASTRON "Science Operations & Support" group <sos@astron.nl> (sigh...)\\+//However//, atm you need to authenticate to this proxy. Get an account via the ASTRON "Science Operations & Support" group <sos[AT]astron[DOT]nl> (sigh...)\\
 Put that username and password in a ''.wgetrc'' file in your home directory: Put that username and password in a ''.wgetrc'' file in your home directory:
   proxy-user=yourusername   proxy-user=yourusername
Line 332: Line 332:
   $ scontrol resume 100   $ scontrol resume 100
   $ scontrol resume [1000,2000]   $ scontrol resume [1000,2000]
 +  
 +==== SLURM Troubleshooting ====
 +== "Undraining" nodes ==
 +
 +If you expect that there should be enough resources, but slurm submission fails because some nodes could be in "drain" state, you can check that by running "sinfo". You could see 
 +something like this, where nodes drg06 and drg08 are in drain state:
 +
 +  $ sinfo
 +  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 +  workers*     up   infinite      2  drain drg[06,08]
 +  workers*     up   infinite      1    mix drg01
 +  workers*     up   infinite     21   idle dragproc,drg[02-05,07,09-23]
 +  head         up   infinite      1   idle dragnet
 +
 +To "undrain" e.g. drg08, you can do:
 +  $ scontrol update NodeName=drg08 State=DOWN Reason="undraining"
 +  $ scontrol update NodeName=drg08 State=RESUME
  
  • Last modified: 2017-07-19 11:25
  • by amesfoort