Table of Contents

Stop-day activities December 4-5, 2018


Coordinator Teun Grit roadmin@astron.nl
Software Support Arno Schoenmakers softwaresupport@astron.nl
Science, Operations and Support Matthijs, Pietro sos@astron.nl
Observer Henk Mulder observer@astron.nl

Description of stopday procedures
LOFAR Schedule cycle 11
Stop-day progress sheet (2 tabs!)

Systems

Cobalt

CEP3

CEP4

LEXARS

:!: Be ware of the famous ssh tunnel!

LCU

Portals

Central Services lcs020 .. lcs030

Other Central Services

LTA - lofarlta01.target.rug.nl

Dragnet

Aartfaac

Core switches

Software updates

MAC/SAS

CEP3

LCU

CEP4

Aartfaac

COBALT

LTA

In the field

DWG Lofar Systems

Review meeting

When: 13-12-2018 11:00 Muller Present: Arno, Matthijs(Slack), Reinoud, Jasmin, Henk, Teun (coordinator)

  1. Central: Always start the stop-day with NIS updates. NIS is needed for Slurm and Lexars (ssh tunnel)
  2. Cep3: Slurm needs working accounts when slumrcltd starts, otherwise reservations will get lost! Jasmin suggested to setup another solution for the current password hack using ssh_keys. To be implemented before next stop-day.
  3. CEP4: mgmt01 node is now on CentOS 7.5
  4. Lexars: The lexars came up when NIS was down. This caused the ssh-tunnel to be broken.
  5. LCU family: Replacement of Rubidium in CN caused confusion and that took quite some time to figure out what was happening.
  6. LCU ILT: Not available on stop-day. They were rebooted the next Monday. In future: plan many months in advance, if possible every other stopday. For the software roll-out the ILT stations always need to be available.
  7. Wincc: We need to set up a testsystem first.
  8. About Novell IDM: What is the timescale for replacement? The systems can’t be updated anymore.
  9. Ldb003: Intel NIC did not work. We used internal NIC’s instead.
  10. Lofarlta01 not done. No time left.
  11. Dragnet: Are dragnet users aware of the cable change? SOS have skipped the tests, so the Dragnet team needs to do that. Mattijs will inform dragnet@astron.nl.
  12. Aartfaac-lcu was upgraded and rebooted the next Wednesday. It had over 500 packages to be installed. It went fine in the end.
  13. Network: The reload at 11:00 surprised us. Teun should have warned his colleagues a few minutes earlier.
  14. The Zabbix server crashed during upgrade. Reinstall was needed and that took some hours. Cause unknown.
  15. Dwingeloo systems were also updated & rebooted using spacewalk.
  16. Scu001 has no NFS mount. Remove it from SDOS checks.
  17. Triggered observation test failed. There seems to be a bug in the script.
  18. Matthijs: Cep3. Should SOS inform users? Yes, the coordinator will report when accounts are back and Slurm is up. SOS needs to check and inform users thereafter. The SOS checks are still being defined.
  19. Network overhaul. Validation run in front of stop-day was not done due to miscommunication. Please wait for acknowledgement.