Both sides previous revisionPrevious revisionNext revision | Previous revision |
drop_setup [2011/03/28 12:32] – leeuwen | drop_setup [2013/10/22 14:04] (current) – leeuwen |
---|
CMDS | CMDS |
* [root] export USER=<username>; useradd ${USER} ; passwd ${USER} ; /usr/sbin/setquota -u ${USER} 20000000 30000000 0 0 /export ; # usermod -G vault ${USER} | * [root] export USER=<username>; useradd ${USER} ; passwd ${USER} ; /usr/sbin/setquota -u ${USER} 20000000 30000000 0 0 /export ; # usermod -G vault ${USER} |
| * force update using "411" line below / or "rock sync users", but that messes up home dirs |
| |
TROUBLESHOOTING | TROUBLESHOOTING |
* Remove 'old' boot image on master, for this node: **/tftpboot/pxelinux/pxelinux.cfg/C0A800F?** where ? = EDCBA9 for nodes 012345 | * Remove 'old' boot image on master, for this node: **/tftpboot/pxelinux/pxelinux.cfg/C0A800F?** where ? = EDCBA9 for nodes 012345 |
* Manually reboot node (force PXE boot with F12 if needed) | * Manually reboot node (force PXE boot with F12 if needed) |
| * Prompts for "language"? Change master root dir permissions: ''chmod a+rx ~root'' |
* If network is slow (dmesg | grep eth says "Link is up at 10 Mbps, half duplex") force with ethtool: | * If network is slow (dmesg | grep eth says "Link is up at 10 Mbps, half duplex") force with ethtool: |
* ''ethtool -s eth1 speed 1000'' | * ''/sbin/ethtool -s eth1 speed 1000'' (if fails, set ''/sbin/ethtool -s eth1 autoneg'' off then on) |
* for eth0 ''ifdown eth0; ifup eth0'' | * for eth0 ''ifdown eth0; ifup eth0'' |
| * Keys: [ctrl-ctrl] = bring up display server screen ; [ctrl-alt-f2/f3/f4] = emergy sh, dmesg, etc. |
DONE | DONE |
* ROCKS user guide: http://www.rocksclusters.org/roll-documentation/base/5.1/ | * ROCKS user guide: http://www.rocksclusters.org/roll-documentation/base/5.1/ |
| * BACKUP |
| * Run ''~leeuwen/bin/backup-vault-to-sara'' (tar + sftp to /archive/joerivl/drop/) |
* Changed aliases c0 .. c5 ; drop ; f for frontend | * Changed aliases c0 .. c5 ; drop ; f for frontend |
* ''[root@drop ~]# for i in `seq 0 5`; do rocks add host alias compute-0-$i c$i; done'' | * ''[root@drop ~]# for i in `seq 0 5`; do rocks add host alias compute-0-$i c$i; done'' |
* per http://www.mulder.franken.de/workstuff/ (changed to incorporate "Bay Numbers" vs HW drives) | * per http://www.mulder.franken.de/workstuff/ (changed to incorporate "Bay Numbers" vs HW drives) |
* ''/etc/cron.hourly/raidcheck'' + ''/root/bin/check-hp-raid-status.pl'' | * ''/etc/cron.hourly/raidcheck'' + ''/root/bin/check-hp-raid-status.pl'' |
* [TODO] add /root/bin/backup-vault-to-sara to cron.weekly (after initial tunneled scp to archive.sara.nl) | |
* cd /export/; scp -pr home vault joerivl@archive:/archive/joerivl/drop/ | |
* front-end | * front-end |
* e2label /dev/sda5 /scratch ..etc | * e2label /dev/sda5 /scratch ..etc |
* turned off automount ; /etc/auto.master now empty ; /etc/exports now /data/ and /exports/ | * turned off automount ; /etc/auto.master now empty ; /etc/exports now /data/ and /exports/ |
* using fdisk "type fd" and mdadm --create etc, added 2x750 software RAID0 | * using fdisk "type fd" and mdadm --create etc, added 2x750 software RAID0 |
| * /sbin/mdadm -Ac partitions -m 0 /dev/md0 (to bring up after reboot) |
| * /sbin/fsck.ext4 /dev/md0 |
| * mount -t ext4 /dev/md0 /scratch2 |
* RAID INITIAL BUILD with '' /usr/sbin/hpacucli'' ([[http://h10032.www1.hp.com/ctg/Manual/c00709035.pdf|Guide, p44+]]) | * RAID INITIAL BUILD with '' /usr/sbin/hpacucli'' ([[http://h10032.www1.hp.com/ctg/Manual/c00709035.pdf|Guide, p44+]]) |
* ''ctrl slot=1 pd all show'' | * ''ctrl slot=1 pd all show'' |
* ''mkpartfs primary ext3 0 -0'' | * ''mkpartfs primary ext3 0 -0'' |
* RAID0 REBUILD | * RAID0 REBUILD |
* Blink LED (''ctrl slot=1 pd 1E:1:10 modify led=on''), Replace disk, re-enable logical drive (''ctrl slot=1 ld 2 modify reenable forced'') (from [[http://www.datadisk.co.uk/html_docs/redhat/hpacucli.htm|Cheat sheet]]) | * Kill processes (''/sbin/fuser -m /dev/cciss/c0d1p1'', then kill with ''-k''), then unmount |
| * Blink LED (''ctrl slot=1 pd 1E:1:10 modify led=on''), Replace disk, |
| * Re-enable logical drive (''ctrl slot=1 ld 2 modify reenable forced'') (from [[http://www.datadisk.co.uk/html_docs/redhat/hpacucli.htm|Cheat sheet]]) |
| * Make new file system, ''/sbin/mkfs.ext3 -L /data /dev/cciss/c0d1p1'' |
* RAID PHYS CONFIG | * RAID PHYS CONFIG |
* Drive 1 = left, top; Count seems to be 1-3 = left column, top to bottom. RAID0=two lowest in second-to-right colum + entire right column | * Drive 1 = left, top; Count seems to be 1-3 = left column, top to bottom. RAID0=two lowest in second-to-right colum + entire right column |
* CROSSMOUNTS | * CROSSMOUNTS |
* removed automounter. changed ''/etc/exports'' and node script to hard mount. Homedirs are /export/home/ | * removed automounter. changed ''/etc/exports'' and node script to hard mount. Homedirs are /export/home/ (make sure this is correct in /etc/passwd) |
* ''make -C /var/411 clean; make -C /var/411; make -C /var/411 force; cluster-fork 411get -''''-all'' | * ''make -C /var/411 clean; make -C /var/411; make -C /var/411 force; cluster-fork 411get -''''-all'' |
* ''/etc/rc.d/init.d/nfs restart; /etc/rc.d/init.d/nfs restart; make -C /var/411; service autofs reload; exportfs -a'' | * ''/etc/rc.d/init.d/nfs restart; /etc/rc.d/init.d/nfs restart; make -C /var/411; service autofs reload; exportfs -a'' |
* Read speeds: RAID0 160MB/s, RAID5/local 140MB/s | * Read speeds: RAID0 160MB/s, RAID5/local 140MB/s |
* Increased number of NFS threads from 8 to 32 (link [[http://tldp.org/HOWTO/NFS-HOWTO/performance.html|1]][[https://lists.ubuntu.com/archives/edubuntu-users/2007-September/002213.html|2]]) | * Increased number of NFS threads from 8 to 32 (link [[http://tldp.org/HOWTO/NFS-HOWTO/performance.html|1]][[https://lists.ubuntu.com/archives/edubuntu-users/2007-September/002213.html|2]]) |
| * VNC/ETC |
| * RealVNC server |
| * FreeNX (yum install nx freenx ; http://wiki.centos.org/HowTos/FreeNX) |
* GRID ENGINE | * GRID ENGINE |
* Some HOWTO links: [[http://biowiki.org/HowToUseSunGridEngine|1]], [[http://gridengine.sunsource.net/howto/howto.htm|2]], [[http://gridengine.info/2008/01/20/understanding-queue-error-state-e| removing state 'E']] | * Some HOWTO links: [[http://biowiki.org/HowToUseSunGridEngine|1]], [[http://gridengine.sunsource.net/howto/howto.htm|2]], [[http://gridengine.info/2008/01/20/understanding-queue-error-state-e| removing state 'E']] |
* ppgplot: ''ppgplot_libraries = ["cpgplot", "pgplot", "X11", "png", "m", "g2c"]'' + ''ppgplot_library_dirs = ["/usr/X11R6/lib"]'' | * ppgplot: ''ppgplot_libraries = ["cpgplot", "pgplot", "X11", "png", "m", "g2c"]'' + ''ppgplot_library_dirs = ["/usr/X11R6/lib"]'' |
* Set UseDNS to NO in /etc/ssh/sshd_config for master+nodes, after very slow logins after IP changes to front node | * Set UseDNS to NO in /etc/ssh/sshd_config for master+nodes, after very slow logins after IP changes to front node |
* Which turned out to be caused by outdated DNS server in named.conf and resolve.conf: ''rocks set var Kickstart PublicDNSServers 195.169.63.49'' | * Which turned out to be caused by outdated DNS server in named.conf and resolve.conf: ''rocks set var Kickstart PublicDNSServers 195.169.63.49'' in python/setup.py |
in python/setup.py | |
* NODE PACKAGES | * NODE PACKAGES |
* cd /export/rocks/install/contrib/5.1/x86_64/RPMS | * cd /export/rocks/install/contrib/5.1/x86_64/RPMS |
* LIGHTPATH | * LIGHTPATH |
* installed eth2 on c4. edited ''/etc/sysconfig/network-scripts/ifcfg-eth2'' to static ''IPADDR=192.87.39.129'', ''NETMASK=255.255.255.248'', ''MTU=9000'' | * installed eth2 on c4. edited ''/etc/sysconfig/network-scripts/ifcfg-eth2'' to static ''IPADDR=192.87.39.129'', ''NETMASK=255.255.255.248'', ''MTU=9000'' |
* On command line, added ''route add 145.100.26.152 gw 192.87.39.130'' for Huygens | * On command line, added ''/sbin/route add 145.100.26.152 gw 192.87.39.130'' for Huygens |
| |
| 2013 BOOT |
| Running CentOS5.9 now |
| Kernel 2.6.18-238.9.1.el5 got beyond RAID wait after 5 minutes. |
| 348.18. Bad IRQ, kernel panic (as was original problem) |
| .16. Wait=6min, Bad IRQ |
| .12. Wait=6min, Bad IRQ |
| .6. Wait=6min, Bad IRQ |
| (e)dit, remove quiet |
| |
TODO | TODO |
* Read OAK topics and redo | * Read OAK topics and redo |
| |
| |