public:stopdayactivities_5jun2018

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
public:stopdayactivities_5jun2018 [2018-05-14 07:53] – [Stop-day activities June 5-6, 2018] Reinoud Bokhorstpublic:stopdayactivities_5jun2018 [2018-06-05 11:51] – [Aartfaac] Reinoud Bokhorst
Line 4: Line 4:
  
 ^ Coordinator | Reinoud Bokhorst | roadmin@astron.nl | ^ Coordinator | Reinoud Bokhorst | roadmin@astron.nl |
-^ Software Support | | softwaresupport@astron.nl | +^ Software Support | Arno Schoenmakers | softwaresupport@astron.nl | 
-^ Science, Operations and Support |  | sos@astron.nl | +^ Science, Operations and Support | Matthijs van der Wiel | sos@astron.nl | 
-^ Observer | | observer@astron.nl |+^ Observer | Henk Mulder | observer@astron.nl |
  
  
-[[engineering:stop_day_procedures|More stopday details]] +⇒ [[engineering:stop_day_procedures|Description of stopday procedures]]\\ 
- +⇒ [[https://docs.google.com/spreadsheets/d/e/2PACX-1vSEbxNss-nmOofKDXJRmgACwMDB9zeekBLRl39krswsVGIigfvzD_EdnlKJ_2TF-IGgoX2IXvc2YlXL/pubhtml|LOFAR Schedule cycle 10]]\\ 
-[[https://docs.google.com/spreadsheets/d/e/2PACX-1vSEbxNss-nmOofKDXJRmgACwMDB9zeekBLRl39krswsVGIigfvzD_EdnlKJ_2TF-IGgoX2IXvc2YlXL/pubhtml|LOFAR Schedule cycle 10]] +⇒ The next stopday is scheduled for August 7.
 ===== Systems ===== ===== Systems =====
  
Line 19: Line 18:
 ==== Cobalt ==== ==== Cobalt ====
  
-  * ✔ Reboots and idrac reboots. (Hopko/Robin+  * ✔ Reboots and idrac reboots. (Hopko) 
-  * ✔ CBM010 will be present before the stopday+ 
 ==== CEP3 ==== ==== CEP3 ====
  
   * ✔ Block access at 08:00 (Teun)   * ✔ Block access at 08:00 (Teun)
-  * ✔ Mount DAC cables to headnodes to support floating IP (Hopko/Robin) +  * ✔ All nodes: file system check and reboot. (Kees)
-  * ✔ All nodes: file system check and reboot. (Hopko, Robin) +
-  * ✔ Was broken: Check/debug persistence of Slurm reservations (Reinoud)  **For the review**: We should have a backup in place! +
-  * ✔ NFS mounts for cep3, from all the lof-nodes are using control!+
  
  
 ==== CEP4 ==== ==== CEP4 ====
  
-  * ✔ Connect DAC cable (Hopko)+  * ✔ Reboot (Hopko) 
 +  * ✔ Recreate Docker thinpools on CPU nodes 
 +  * ✔ Recabling of Infiniband, details in Jira ticket 
 +  * Performance tests after recabling
  
 ==== LEXARS ==== ==== LEXARS ====
  
-  *  ✔ lexar003 reboot using XCAT (148 days up) (Was done by Reinoud 9-4-2018)+  *  ✔ Reboot (Hopko/Robin) 
  
 ==== LCU ==== ==== LCU ====
  
-  *  ✔ Reboot of all Dutch LCU's (teun) (ILT stations in local mode) **For the review**: We discovered that all LCU's have a fixed NFS mount on /home. This should be an automount.+  *  
  
 ==== Central Services ==== ==== Central Services ====
    
-  * ✔ Update portals to CentOS/KVM +  * ✔ Restart qpidd@ccu001 (ref. https://support.astron.nl/jira/browse/ROADMT-99) 
-  * ✔ Update and reboot lcs020 .. lcs30 +  * ✔ Test DMZ KVM Failover  (DMZ KVM Hypervisor hosts DMZ services (portal,dns server,smtp,proxy etc)) 
-  * ✔ Remove sas001 and sas099 (powered offdisconnect cables) +  * ✔ OS upgrade and reboot
-  * ✔ Update & reboot NFS server lcs115 +
-  * ✔ Almost all nfs mounts on the lcs115 nfs server are over the control network. Only a few correctly use the offline network: only lexar003lexar004and lhd002. We should force CEP3 to use off-line! +
-  * ✔ Mainly MAC/SAS, LCU'and Aartfaac machines use NFS over control VLAN. They don't have a off-line of on-line network connection. +
-  * Check resolv.conf settings; see https://support.astron.nl/lofar_issuetracker/issues/10448+
 ==== LTA ==== ==== LTA ====
  
-  * ✔ Update and reboot when required (Reinoud) +  * ✔ Update and reboot (Reinoud) 
 +  * ✔ Migration of Oracle DB to new hardware (Andrey Tsyganov)
 ==== Aartfaac ==== ==== Aartfaac ====
  
-  * ✘ Check for broken disks **Fail**: ais007 has a degraded RAID1  !! +  * ✔ Check for broken disks **Fail**: ais007 had a degraded RAID1, but a controller firmware update helped. 
-  * **For review**: Add Jasmin & Reinoud to all nodes as admin+ 
 + 
 ==== Core switches ==== ==== Core switches ====
  
-  * none (probably June)+  * ✔ Warm reset PD0, RD0 and RD1 (Arjen)
  
  
-==== Communication issues ==== 
  
-**For review**: At the end of the 1st day software support needs to report status to coordinator 
 ===== Software updates ===== ===== Software updates =====
  
 ==== MoM and related ==== ==== MoM and related ====
  
-  * ?+  * none
  
 ==== MAC/SAS ==== ==== MAC/SAS ====
  
-  * ?+  * none
  
 ==== CEP3 ==== ==== CEP3 ====
  
-  * ✔ Reboot / fs checks +  * none
-  * ✔ Make AOFlagger 2.10 the default version (already installed) +
-  * ✔ Make LOFAR-Release-3_0_14 the default version (linked against AOFlagger 2.10) +
-  * ✔ Make WSClean 2.5 the default version (already installed)+
  
-==== CEP4 ====+==== LCU ====
  
-  * ?+  * synchronize Python packages, see list in ticket 
 +  * ✔ umask change for foreign stations
  
 +==== CEP4 ====
 +
 +  * Rollout Docker images
 +  * ✘ SLURM upgrade  (postponed)
 ==== Aartfaac ==== ==== Aartfaac ====
  
Line 102: Line 101:
 ===== In the field ===== ===== In the field =====
  
-  * none +  * 
- +
- +
-===== External =====+
  
-  * ? 
  
-==== Next stopday ==== 
  
-The next stopday is TBD 
  • Last modified: 2018-06-05 11:52
  • by Reinoud Bokhorst