dragnet:cluster_usage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:cluster_usage [2017-08-17 22:01] – [Access and Login] spacing amesfoortdragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst
Line 8: Line 8:
 ===== Access and Login ===== ===== Access and Login =====
 To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels[AT]astron[DOT]nl'').\\ To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels[AT]astron[DOT]nl'').\\
-Easiest is to ask him to send his permission to Teun Grit (''grit[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\ +Easiest is to ask him to send his permission to the RO Sysadmins (''roadmin[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\ 
-You can also provide Teun Grit your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.+You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.
  
 Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it): Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it):
Line 332: Line 332:
   $ scontrol resume 100   $ scontrol resume 100
   $ scontrol resume [1000,2000]   $ scontrol resume [1000,2000]
 +  
 +==== SLURM Troubleshooting ====
 +== "Undraining" nodes ==
 +
 +If you expect that there should be enough resources, but slurm submission fails because some nodes could be in "drain" state, you can check that by running "sinfo". You could see 
 +something like this, where nodes drg06 and drg08 are in drain state:
 +
 +  $ sinfo
 +  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 +  workers*     up   infinite      2  drain drg[06,08]
 +  workers*     up   infinite      1    mix drg01
 +  workers*     up   infinite     21   idle dragproc,drg[02-05,07,09-23]
 +  head         up   infinite      1   idle dragnet
 +
 +To "undrain" e.g. drg08, you can do:
 +  $ scontrol update NodeName=drg08 State=DOWN Reason="undraining"
 +  $ scontrol update NodeName=drg08 State=RESUME
  
  • Last modified: 2017-08-17 22:01
  • by amesfoort