public:stopdayactivities_4dec2018

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
public:stopdayactivities_4dec2018 [2018-11-27 14:46] – [CEP4] gritpublic:stopdayactivities_4dec2018 [2018-12-05 10:35] – [Aartfaac] grit
Line 19: Line 19:
 ==== Cobalt ==== ==== Cobalt ====
  
-  * Reboots and idrac reboots. (Hopko)+  * √ Reboots and idrac reboots. (Hopko)
  
  
 ==== CEP3 ==== ==== CEP3 ====
  
-  * Block access at 08:00 (Teun) +  * √ Block access at 08:00 (Teun) 
-  * Repair memory bank of lof021 See [[https://support.astron.nl/jira/browse/CIT-25|ticket CIT-25]] +  * √ Repair memory bank of lof021 See [[https://support.astron.nl/jira/browse/CIT-25|ticket CIT-25]] 
-  * No further update (Slurm? Python3?) +  * √ Debug slurm lost reservations (Reinoud starts reboots, Robin waits for Reinoud)
- +
  
  
Line 33: Line 32:
 ==== CEP4 ==== ==== CEP4 ====
  
-  * Switch IPoIB to connected mode on head and gpu nodes, see https://support.astron.nl/jira/browse/ROADMT-186 +  * √ Switch IPoIB to connected mode on head and gpu nodes, see https://support.astron.nl/jira/browse/ROADMT-186 
-  * The Slurm disk performance tests are planned for after the stopday. (Volume now is 75%)+  * √ The Slurm disk performance tests are planned for after the stopday. (Volume now is 75%) 
 +  * mgmt01.cep4 to CentOS 7.5 
 +  * Robinhood tests
  
  
 ==== LEXARS ==== ==== LEXARS ====
  
-  * Powerdrain of iDracs (Hopko/Robin) +  * √ Disable Supervisor on both lexars at 07:45 (Teun) 
-  * Update iDrac firmware (Hopko/Robin) +  * √ Powerdrain of iDracs (Hopko/Robin) 
-  * The lexars stay on CentOS7.2 +  * √ Update iDrac firmware (Hopko/Robin) 
-  * Some investigation is needed on the iDrac's (Hopko)+  * √ The lexars stay on CentOS7.2 
 +  * √ Some investigation is needed on the iDrac's (Hopko)
  
 +:!: Be ware of the famous [[https://www.astron.nl/lofarwiki/doku.php?id=engineering:software:ingest_services&#ssh_tunnel|ssh tunnel]]!
 ==== LCU ==== ==== LCU ====
  
-  * Test reboot script on 1 LCU +  * √ Test reboot script on 1 LCU 
-  * Some remote stations need reboot+  * √ Reboot cn001 (not announced) 
 +  * √  remote stations need reboot
   * Install WinCC 3.16 on 1 LCU. Jasmin: Can't be done on CentOS7.2. We need a 7.5 system. There is a spare LCU available in the Dwnigeloo digital lab (RS511). In the end we need WinCC 3.16 on all LCU's someday.   * Install WinCC 3.16 on 1 LCU. Jasmin: Can't be done on CentOS7.2. We need a 7.5 system. There is a spare LCU available in the Dwnigeloo digital lab (RS511). In the end we need WinCC 3.16 on all LCU's someday.
  
  
 ==== Portals ==== ==== Portals ====
-  * Update & reboot +  * √ Update & reboot 
-  * Check High Availability of portal2+  * √ Check High Availability of portal2
  
  
 ==== Central Services lcs020 .. lcs030 ==== ==== Central Services lcs020 .. lcs030 ====
    
-  * OS upgrade and reboot +  * √ OS upgrade and reboot (SLES11_SP4 update contains ~65 packages, incl new kernel)
-  * Start Postgres replication ldb003 -> lcs119 +
-  * Plug in new harddisks for ldb003 (leave unconfigured)+
  
  
 ==== Other Central Services ==== ==== Other Central Services ====
    
-  * OS upgrade and reboot +  * √ Stop and disable supervisor at 07:45 (Teun) 
-  * Remove Zabbix-agent version 2.2 from scu001+  * √ OS upgrade and reboot 
 +  * √ Remove Zabbix-agent version 2.2 from scu001 
 +  * √ Start Postgres replication ldb003 -> lcs119 and database split (Reinoud) 
 +  * √ Recabling network interfaces and p2p ldb003 / lcs119 (Arjen, please inform Reinoud)
  
  
  
-==== LTA ==== 
  
-  None+==== LTA - lofarlta01.target.rug.nl ==== 
 + 
 +  ✘ Update & reboot
  
    
 +==== Dragnet ====
 +
 +  * √ Connect IB switch to CEP4 spine switches, instead of the Cobalt switch. The cables are already in. (Hopko)
 +
 ==== Aartfaac ==== ==== Aartfaac ====
  
-  * Connect IB switch to CEP4 spine switchesinstead of the Cobalt switchThe cables are already in. (Hopko)+  * √ Update & reboot ais001-007 
 +  * √ Update & reboot ads001 
 +  * Update & reboot aartfaac-lcu: No! OpenSuse 13.1 system too far behind!
  
  
 ==== Core switches ==== ==== Core switches ====
  
-  * Warm reset 11:00h (Arjen)+  * √ Warm reset 11:00h (Arjen)
  
  
Line 97: Line 109:
 ==== CEP3 ==== ==== CEP3 ====
  
-  * Debug slurm lost reservations (Reinoud)+  * 
  
 ==== LCU ==== ==== LCU ====
Line 105: Line 117:
 ==== CEP4 ==== ==== CEP4 ====
  
-  * Slurm update needs testing first. Is there a GPU node available?+  * Slurm update needs testing first. GPU04 is available for testing.
   * Jasmin: Is there a repo available? Hopko will check.   * Jasmin: Is there a repo available? Hopko will check.
  
  • Last modified: 2018-12-18 12:20
  • by grit