Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
public:stopdayactivities_5feb2019 [2019-02-05 14:23] – [Portals] Reinoud Bokhorst | public:stopdayactivities_5feb2019 [2019-02-19 13:19] (current) – [Review meeting] grit | ||
---|---|---|---|
Line 20: | Line 20: | ||
* ✔ Reboots (Hopko) | * ✔ Reboots (Hopko) | ||
- | * COBALT2: Connect 10GbE ports to RD-0 and RD-1. (Arjen) | + | * ✔ COBALT2: Connect 10GbE ports to RD-0 and RD-1. (Arjen) |
==== CEP3 ==== | ==== CEP3 ==== | ||
- | * Block access at 08:00 (Teun) | + | * ✔ Block access at 08:00 (Teun) |
- | * Reboots (Kees) | + | * ✔ Reboots (Kees) |
+ | * ✔ Update CentOS7.3 except Python3 | ||
Line 49: | Line 50: | ||
==== Central Services lcs020 .. lcs030 ==== | ==== Central Services lcs020 .. lcs030 ==== | ||
- | * OS upgrade and reboot | + | * ✔ OS upgrade and reboot |
- | * Move memory DIMM's from lcs107/108 to lcs119 (4 x 8GB) | + | * ✔ Move memory DIMM's from lcs107/108 to lcs119 (4 x 8GB) |
==== Other Central Services ==== | ==== Other Central Services ==== | ||
- | * OS upgrade and reboot | + | * ✔ OS upgrade and reboot |
Line 64: | Line 65: | ||
==== Aartfaac ==== | ==== Aartfaac ==== | ||
- | * None | + | * No update & reboots on their request |
Line 83: | Line 84: | ||
==== CEP3 ==== | ==== CEP3 ==== | ||
- | * Wsclean & IDG | + | * ✔ Wsclean & IDG |
- | * LoSoTo | + | * ✔ LoSoTo |
- | * RMextract | + | * ✔ RMextract |
==== LCU ==== | ==== LCU ==== | ||
Line 115: | Line 116: | ||
==== Review meeting ==== | ==== Review meeting ==== | ||
- | * | + | Present: Reinoud Bokhorst, Henk Mulder, Hopko Meijering (Slack), Teun Grit. By email: Thomas Franzen on behalf of SOS |
+ | |||
+ | - Hopko: CEP4 cpu30 warning was disappeared after reboot | ||
+ | - Reinoud: Cobalt1 test script complained about kis001. Cobalt1 checks not complete. Communication could be better. Solution: Coordinator needs to ask again in Slack. | ||
+ | - Reinoud: CEP4 Slurm update went okay. We discovered that Docker containers had also an old version of the Slurm client. We solved it by mounting the client from outside. | ||
+ | - Teun: A new script is discovering all CEP3 users of the past 2 months and picks up the email addresses from NIS. This script must be used to warn CEP3 users a week ahead about the upcoming stop-days. It can also be used to inform those users when the stop-day finishes. The script '' | ||
+ | - Reinoud: Some supervisor daemons were not stopped by Software Support. | ||
+ | - Teun: It would be good to have a IB status check in Zabbix for Cobalt2 | ||
+ | - (Next from the email by Thomas) | ||
+ | - All items on the SOS checklist were completed except for 'Check data recorded on DRAGNET' | ||
+ | - In the future, the stop day coordinator should be formally informed when SOS has completed the checklist. An e-mail should be sent by SOS to the stop day coordinator, | ||
+ | - The SLURM version mismatch was not detected during the stop day because 'test pre-processing pipeline’ was not on the SOS stop day checklist. This has now been added to the checklist. | ||
+ | - My understanding is that ROAdmin will keep a CEP3 users e-mail list up-to-date. They will inform the CEP3 users when the system will be unavailable during stop days and software roll-outs, and also when the system is back online again. Let me know if I have misunderstood this. Comment by Teun: It is the responsibility of the stop-day coordinator that the CEP3 users are well informed. He will create the list of email addresses and send it to the SOS person on duty for that stop-day and this SOS person will send the email out (cc to coordinator). The coordinator verifies this. |