Differences

This shows you the differences between two versions of the page.

--- dragnet:cluster_usage [2017-08-17 22:01] – [Access and Login] spacing amesfoort
+++ dragnet:cluster_usage [2019-01-07 15:06] (current) – [Access and Login] Reinoud Bokhorst
@@ Line 8: / Line 8: @@
 ===== Access and Login =====
 To get an account, get permission from the Dragnet PI: Jason Hessels (''hessels[AT]astron[DOT]nl'').\\
-Easiest is to ask him to send his permission to Teun Grit (''grit[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\
+Easiest is to ask him to send his permission to the RO Sysadmins (''roadmin[AT]astron[DOT]nl'') for a LOFAR NIS account to access the LOFAR portal, and to Mike Sipior (''sipior[AT]astron[DOT]nl'') to add your account to DRAGNET.\\
-You can also provide Teun Grit your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.
+You can also provide RO Admin your (e.g. home) IP(s) to add to a LOFAR portal white list if needed.
 Having an account, ssh to hostname ''dragnet.control.lofar'' or easier, just **''dragnet''**, from the LOFAR portal (''portal.lofar.eu'') (or tunnel through it):
@@ Line 332: / Line 332: @@
   $ scontrol resume 100
   $ scontrol resume [1000,2000]
+==== SLURM Troubleshooting ====
+== "Undraining" nodes ==
+If you expect that there should be enough resources, but slurm submission fails because some nodes could be in "drain" state, you can check that by running "sinfo". You could see
+something like this, where nodes drg06 and drg08 are in drain state:
+  $ sinfo
+  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
+  workers*     up   infinite      2  drain drg[06,08]
+  workers*     up   infinite      1    mix drg01
+  workers*     up   infinite     21   idle dragproc,drg[02-05,07,09-23]
+  head         up   infinite      1   idle dragnet
+To "undrain" e.g. drg08, you can do:
+  $ scontrol update NodeName=drg08 State=DOWN Reason="undraining"
+  $ scontrol update NodeName=drg08 State=RESUME