public:stopdayactivities_4dec2018

This is an old revision of the document!


Stop-day activities December 4-5, 2018


Coordinator Teun Grit roadmin@astron.nl
Software Support Arno Schoenmakers softwaresupport@astron.nl
Science, Operations and Support Matthijs, Pietro sos@astron.nl
Observer Henk Mulder observer@astron.nl

Description of stopday procedures
LOFAR Schedule cycle 11
Stop-day progress sheet (2 tabs!)

  • √ Reboots and idrac reboots. (Hopko)
  • √ Block access at 08:00 (Teun)
  • √ Repair memory bank of lof021 See ticket CIT-25
  • √ Debug slurm lost reservations (Reinoud starts reboots, Robin waits for Reinoud)
  • √ Switch IPoIB to connected mode on head and gpu nodes, see https://support.astron.nl/jira/browse/ROADMT-186
  • √ The Slurm disk performance tests are planned for after the stopday. (Volume now is 75%)
  • mgmt01.cep4 to CentOS 7.5
  • Robinhood tests
  • √ Disable Supervisor on both lexars at 07:45 (Teun)
  • √ Powerdrain of iDracs (Hopko/Robin)
  • √ Update iDrac firmware (Hopko/Robin)
  • √ The lexars stay on CentOS7.2
  • √ Some investigation is needed on the iDrac's (Hopko)

:!: Be ware of the famous ssh tunnel!

  • √ Test reboot script on 1 LCU
  • √ Reboot cn001 (not announced)
  • √ remote stations need reboot
  • Install WinCC 3.16 on 1 LCU. Jasmin: Can't be done on CentOS7.2. We need a 7.5 system. There is a spare LCU available in the Dwnigeloo digital lab (RS511). In the end we need WinCC 3.16 on all LCU's someday.
  • √ Update & reboot
  • √ Check High Availability of portal2
  • OS upgrade and reboot (SLES11_SP4 update contains ~65 packages, incl new kernel)
  • √ Stop and disable supervisor at 07:45 (Teun)
  • OS upgrade and reboot
  • √ Remove Zabbix-agent version 2.2 from scu001
  • √ Start Postgres replication ldb003 → lcs119 and database split (Reinoud)
  • √ Recabling network interfaces and p2p ldb003 / lcs119 (Arjen, please inform Reinoud)
  • ✘ Update & reboot
  • √ Connect IB switch to CEP4 spine switches, instead of the Cobalt switch. The cables are already in. (Hopko)
  • √ Update & reboot ais001-007,
  • √ Update & reboot ads001
  • Update & reboot aartfaac-lcu: No! OpenSuse 13.1 system too far behind!
  • √ Warm reset 11:00h (Arjen)
  • none
  • none
  • None
  • Slurm update needs testing first. GPU04 is available for testing.
  • Jasmin: Is there a repo available? Hopko will check.
  • None
  • none
  • none
  • Last modified: 2018-12-05 10:35
  • by grit