Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| public:stopdayactivities_5feb2019 [2019-01-24 14:44] – [Stop-day activities Februari 5, 2019] grit | public:stopdayactivities_5feb2019 [2019-02-19 13:19] (current) – [Review meeting] grit | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Stop-day activities | + | ====== Stop-day activities |
| \\ | \\ | ||
| ^ Coordinator | Teun Grit | roadmin@astron.nl | | ^ Coordinator | Teun Grit | roadmin@astron.nl | | ||
| - | ^ Software Support | Nico Vermaas | + | ^ Software Support | | softwaresupport@astron.nl | |
| - | ^ Science, Operations and Support | Sarvesh | + | ^ Science, Operations and Support | Sarrvesh Sridhar |
| - | ^ Observer | | observer@astron.nl | | + | ^ Observer | Henk Mulder |
| Line 19: | Line 19: | ||
| ==== Cobalt ==== | ==== Cobalt ==== | ||
| - | * Reboots and idrac reboots. (Hopko) | + | * ✔ Reboots |
| + | * ✔ COBALT2: Connect 10GbE ports to RD-0 and RD-1. (Arjen) | ||
| ==== CEP3 ==== | ==== CEP3 ==== | ||
| - | * Block access at 08:00 (?) | + | * ✔ Block access at 08:00 (Teun) |
| - | * ? | + | * ✔ Reboots (Kees) |
| + | * ✔ Update CentOS7.3 except Python3 | ||
| Line 31: | Line 33: | ||
| ==== CEP4 ==== | ==== CEP4 ==== | ||
| - | * | + | * ✔ Reboots (Hopko) |
| + | * ✔ [[Slurm upgrade to v17.02.2]] (Reinoud/ | ||
| ==== LEXARS ==== | ==== LEXARS ==== | ||
| - | * ? | + | * ✔ Reboots |
| ==== LCU ==== | ==== LCU ==== | ||
| - | * ? | + | * None |
| ==== Portals ==== | ==== Portals ==== | ||
| - | * Update & reboot | + | * ✔ Update & reboot |
| ==== Central Services lcs020 .. lcs030 ==== | ==== Central Services lcs020 .. lcs030 ==== | ||
| - | * OS upgrade and reboot | + | * ✔ OS upgrade and reboot |
| - | * Move memory DIMM's from lcs107/108 to lcs119 (4 x 8GB) | + | * ✔ Move memory DIMM's from lcs107/108 to lcs119 (4 x 8GB) |
| ==== Other Central Services ==== | ==== Other Central Services ==== | ||
| - | * OS upgrade and reboot | + | * ✔ OS upgrade and reboot |
| ==== LTA ==== | ==== LTA ==== | ||
| - | * None | + | * ✔ Update & reboot |
| ==== Aartfaac ==== | ==== Aartfaac ==== | ||
| - | * None | + | * No update & reboots on their request |
| ==== Core switches ==== | ==== Core switches ==== | ||
| - | * ? | + | * None |
| ===== Software updates ===== | ===== Software updates ===== | ||
| Line 82: | Line 84: | ||
| ==== CEP3 ==== | ==== CEP3 ==== | ||
| - | * ? | + | * ✔ Wsclean & IDG |
| + | * ✔ LoSoTo | ||
| + | * ✔ RMextract | ||
| ==== LCU ==== | ==== LCU ==== | ||
| - | * ? | + | * None |
| ==== CEP4 ==== | ==== CEP4 ==== | ||
| - | * ? | + | * None |
| ==== Aartfaac ==== | ==== Aartfaac ==== | ||
| Line 112: | Line 116: | ||
| ==== Review meeting ==== | ==== Review meeting ==== | ||
| - | * | + | Present: Reinoud Bokhorst, Henk Mulder, Hopko Meijering (Slack), Teun Grit. By email: Thomas Franzen on behalf of SOS |
| + | |||
| + | - Hopko: CEP4 cpu30 warning was disappeared after reboot | ||
| + | - Reinoud: Cobalt1 test script complained about kis001. Cobalt1 checks not complete. Communication could be better. Solution: Coordinator needs to ask again in Slack. | ||
| + | - Reinoud: CEP4 Slurm update went okay. We discovered that Docker containers had also an old version of the Slurm client. We solved it by mounting the client from outside. | ||
| + | - Teun: A new script is discovering all CEP3 users of the past 2 months and picks up the email addresses from NIS. This script must be used to warn CEP3 users a week ahead about the upcoming stop-days. It can also be used to inform those users when the stop-day finishes. The script '' | ||
| + | - Reinoud: Some supervisor daemons were not stopped by Software Support. | ||
| + | - Teun: It would be good to have a IB status check in Zabbix for Cobalt2 | ||
| + | - (Next from the email by Thomas) | ||
| + | - All items on the SOS checklist were completed except for 'Check data recorded on DRAGNET' | ||
| + | - In the future, the stop day coordinator should be formally informed when SOS has completed the checklist. An e-mail should be sent by SOS to the stop day coordinator, | ||
| + | - The SLURM version mismatch was not detected during the stop day because 'test pre-processing pipeline’ was not on the SOS stop day checklist. This has now been added to the checklist. | ||
| + | - My understanding is that ROAdmin will keep a CEP3 users e-mail list up-to-date. They will inform the CEP3 users when the system will be unavailable during stop days and software roll-outs, and also when the system is back online again. Let me know if I have misunderstood this. Comment by Teun: It is the responsibility of the stop-day coordinator that the CEP3 users are well informed. He will create the list of email addresses and send it to the SOS person on duty for that stop-day and this SOS person will send the email out (cc to coordinator). The coordinator verifies this. | ||