dragnet:cluster_support

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:cluster_support [2017-08-24 20:38] – [Failing Components] answers to vendor questions on broken disk amesfoortdragnet:cluster_support [2017-08-29 11:09] (current) – [Failing Components] add smartd restart ansible command amesfoort
Line 80: Line 80:
 You may continue working, send the same command to the next drive, or even reboot. The drive will continue when no other requests are there. An extended test may take many hours. Afterwards, you can show the S.M.A.R.T. information again to see the result, but if a failure threshold was exceeded, our ''smartd'' service also sends an e-mail. You may continue working, send the same command to the next drive, or even reboot. The drive will continue when no other requests are there. An extended test may take many hours. Afterwards, you can show the S.M.A.R.T. information again to see the result, but if a failure threshold was exceeded, our ''smartd'' service also sends an e-mail.
  
 +To run a sudo command for all drives on all ''drgXX'' nodes, make a script (ansible -a only takes trivial commands), such as:
 +  #!/bin/sh
 +  hostname; smartctl --test=long /dev/sda; smartctl --test=long /dev/sdb; smartctl --test=long /dev/sdc; smartctl --test=long /dev/sdd
 +mark it executable and run it via ansible as superuser:
 +  $ chmod 755 $HOME/yourscript.sh
 +  $ ansible workers -b -K -f 25 -a '$HOME/yourscript.sh'  # + possibly redirecting stdout and/or stderr
 +then type your password once.
 +
 +To restart all smartd services, run:
 +  $ ansible alldragnet -b -K -f 25 -a 'systemctl restart smartd'
  
 == Upon Drive Failure == == Upon Drive Failure ==
Line 114: Line 124:
 For disk replacement, one person is enough. If the node has to be opened up, 2 persons are needed (sliding rails issue + node weight + limited space in data center). For disk replacement, one person is enough. If the node has to be opened up, 2 persons are needed (sliding rails issue + node weight + limited space in data center).
  
 +The ''ledmon'' package (''sudo yum install ledmon'') provide ''ledctl'' which can be used to turn off and on status lights on the disk caddies. Use ''ledctl locate=/dev/sdd'' or ''ledctl locate_off=/dev/sdd'' to turn on or off the lights of the device you want to change.
  
 ==== Overview Examples of mdadm and lsblk ==== ==== Overview Examples of mdadm and lsblk ====
  • Last modified: 2017-08-24 20:38
  • by amesfoort