| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| drop_setup [2011/03/29 08:02] – leeuwen | drop_setup [2013/10/22 14:04] (current) – leeuwen |
|---|
| CMDS | CMDS |
| * [root] export USER=<username>; useradd ${USER} ; passwd ${USER} ; /usr/sbin/setquota -u ${USER} 20000000 30000000 0 0 /export ; # usermod -G vault ${USER} | * [root] export USER=<username>; useradd ${USER} ; passwd ${USER} ; /usr/sbin/setquota -u ${USER} 20000000 30000000 0 0 /export ; # usermod -G vault ${USER} |
| | * force update using "411" line below / or "rock sync users", but that messes up home dirs |
| |
| TROUBLESHOOTING | TROUBLESHOOTING |
| * Remove 'old' boot image on master, for this node: **/tftpboot/pxelinux/pxelinux.cfg/C0A800F?** where ? = EDCBA9 for nodes 012345 | * Remove 'old' boot image on master, for this node: **/tftpboot/pxelinux/pxelinux.cfg/C0A800F?** where ? = EDCBA9 for nodes 012345 |
| * Manually reboot node (force PXE boot with F12 if needed) | * Manually reboot node (force PXE boot with F12 if needed) |
| | * Prompts for "language"? Change master root dir permissions: ''chmod a+rx ~root'' |
| * If network is slow (dmesg | grep eth says "Link is up at 10 Mbps, half duplex") force with ethtool: | * If network is slow (dmesg | grep eth says "Link is up at 10 Mbps, half duplex") force with ethtool: |
| * ''ethtool -s eth1 speed 1000'' | * ''/sbin/ethtool -s eth1 speed 1000'' (if fails, set ''/sbin/ethtool -s eth1 autoneg'' off then on) |
| * for eth0 ''ifdown eth0; ifup eth0'' | * for eth0 ''ifdown eth0; ifup eth0'' |
| | * Keys: [ctrl-ctrl] = bring up display server screen ; [ctrl-alt-f2/f3/f4] = emergy sh, dmesg, etc. |
| DONE | DONE |
| * ROCKS user guide: http://www.rocksclusters.org/roll-documentation/base/5.1/ | * ROCKS user guide: http://www.rocksclusters.org/roll-documentation/base/5.1/ |
| | * BACKUP |
| | * Run ''~leeuwen/bin/backup-vault-to-sara'' (tar + sftp to /archive/joerivl/drop/) |
| * Changed aliases c0 .. c5 ; drop ; f for frontend | * Changed aliases c0 .. c5 ; drop ; f for frontend |
| * ''[root@drop ~]# for i in `seq 0 5`; do rocks add host alias compute-0-$i c$i; done'' | * ''[root@drop ~]# for i in `seq 0 5`; do rocks add host alias compute-0-$i c$i; done'' |
| * per http://www.mulder.franken.de/workstuff/ (changed to incorporate "Bay Numbers" vs HW drives) | * per http://www.mulder.franken.de/workstuff/ (changed to incorporate "Bay Numbers" vs HW drives) |
| * ''/etc/cron.hourly/raidcheck'' + ''/root/bin/check-hp-raid-status.pl'' | * ''/etc/cron.hourly/raidcheck'' + ''/root/bin/check-hp-raid-status.pl'' |
| * [TODO] add /root/bin/backup-vault-to-sara to cron.weekly (after initial tunneled scp to archive.sara.nl) | |
| * cd /export/; scp -pr home vault joerivl@archive:/archive/joerivl/drop/ | |
| * front-end | * front-end |
| * e2label /dev/sda5 /scratch ..etc | * e2label /dev/sda5 /scratch ..etc |
| * turned off automount ; /etc/auto.master now empty ; /etc/exports now /data/ and /exports/ | * turned off automount ; /etc/auto.master now empty ; /etc/exports now /data/ and /exports/ |
| * using fdisk "type fd" and mdadm --create etc, added 2x750 software RAID0 | * using fdisk "type fd" and mdadm --create etc, added 2x750 software RAID0 |
| | * /sbin/mdadm -Ac partitions -m 0 /dev/md0 (to bring up after reboot) |
| | * /sbin/fsck.ext4 /dev/md0 |
| | * mount -t ext4 /dev/md0 /scratch2 |
| * RAID INITIAL BUILD with '' /usr/sbin/hpacucli'' ([[http://h10032.www1.hp.com/ctg/Manual/c00709035.pdf|Guide, p44+]]) | * RAID INITIAL BUILD with '' /usr/sbin/hpacucli'' ([[http://h10032.www1.hp.com/ctg/Manual/c00709035.pdf|Guide, p44+]]) |
| * ''ctrl slot=1 pd all show'' | * ''ctrl slot=1 pd all show'' |
| * Drive 1 = left, top; Count seems to be 1-3 = left column, top to bottom. RAID0=two lowest in second-to-right colum + entire right column | * Drive 1 = left, top; Count seems to be 1-3 = left column, top to bottom. RAID0=two lowest in second-to-right colum + entire right column |
| * CROSSMOUNTS | * CROSSMOUNTS |
| * removed automounter. changed ''/etc/exports'' and node script to hard mount. Homedirs are /export/home/ | * removed automounter. changed ''/etc/exports'' and node script to hard mount. Homedirs are /export/home/ (make sure this is correct in /etc/passwd) |
| * ''make -C /var/411 clean; make -C /var/411; make -C /var/411 force; cluster-fork 411get -''''-all'' | * ''make -C /var/411 clean; make -C /var/411; make -C /var/411 force; cluster-fork 411get -''''-all'' |
| * ''/etc/rc.d/init.d/nfs restart; /etc/rc.d/init.d/nfs restart; make -C /var/411; service autofs reload; exportfs -a'' | * ''/etc/rc.d/init.d/nfs restart; /etc/rc.d/init.d/nfs restart; make -C /var/411; service autofs reload; exportfs -a'' |
| * Read speeds: RAID0 160MB/s, RAID5/local 140MB/s | * Read speeds: RAID0 160MB/s, RAID5/local 140MB/s |
| * Increased number of NFS threads from 8 to 32 (link [[http://tldp.org/HOWTO/NFS-HOWTO/performance.html|1]][[https://lists.ubuntu.com/archives/edubuntu-users/2007-September/002213.html|2]]) | * Increased number of NFS threads from 8 to 32 (link [[http://tldp.org/HOWTO/NFS-HOWTO/performance.html|1]][[https://lists.ubuntu.com/archives/edubuntu-users/2007-September/002213.html|2]]) |
| | * VNC/ETC |
| | * RealVNC server |
| | * FreeNX (yum install nx freenx ; http://wiki.centos.org/HowTos/FreeNX) |
| * GRID ENGINE | * GRID ENGINE |
| * Some HOWTO links: [[http://biowiki.org/HowToUseSunGridEngine|1]], [[http://gridengine.sunsource.net/howto/howto.htm|2]], [[http://gridengine.info/2008/01/20/understanding-queue-error-state-e| removing state 'E']] | * Some HOWTO links: [[http://biowiki.org/HowToUseSunGridEngine|1]], [[http://gridengine.sunsource.net/howto/howto.htm|2]], [[http://gridengine.info/2008/01/20/understanding-queue-error-state-e| removing state 'E']] |
| * ppgplot: ''ppgplot_libraries = ["cpgplot", "pgplot", "X11", "png", "m", "g2c"]'' + ''ppgplot_library_dirs = ["/usr/X11R6/lib"]'' | * ppgplot: ''ppgplot_libraries = ["cpgplot", "pgplot", "X11", "png", "m", "g2c"]'' + ''ppgplot_library_dirs = ["/usr/X11R6/lib"]'' |
| * Set UseDNS to NO in /etc/ssh/sshd_config for master+nodes, after very slow logins after IP changes to front node | * Set UseDNS to NO in /etc/ssh/sshd_config for master+nodes, after very slow logins after IP changes to front node |
| * Which turned out to be caused by outdated DNS server in named.conf and resolve.conf: ''rocks set var Kickstart PublicDNSServers 195.169.63.49'' | * Which turned out to be caused by outdated DNS server in named.conf and resolve.conf: ''rocks set var Kickstart PublicDNSServers 195.169.63.49'' in python/setup.py |
| in python/setup.py | |
| * NODE PACKAGES | * NODE PACKAGES |
| * cd /export/rocks/install/contrib/5.1/x86_64/RPMS | * cd /export/rocks/install/contrib/5.1/x86_64/RPMS |
| * LIGHTPATH | * LIGHTPATH |
| * installed eth2 on c4. edited ''/etc/sysconfig/network-scripts/ifcfg-eth2'' to static ''IPADDR=192.87.39.129'', ''NETMASK=255.255.255.248'', ''MTU=9000'' | * installed eth2 on c4. edited ''/etc/sysconfig/network-scripts/ifcfg-eth2'' to static ''IPADDR=192.87.39.129'', ''NETMASK=255.255.255.248'', ''MTU=9000'' |
| * On command line, added ''route add 145.100.26.152 gw 192.87.39.130'' for Huygens | * On command line, added ''/sbin/route add 145.100.26.152 gw 192.87.39.130'' for Huygens |
| | |
| | 2013 BOOT |
| | Running CentOS5.9 now |
| | Kernel 2.6.18-238.9.1.el5 got beyond RAID wait after 5 minutes. |
| | 348.18. Bad IRQ, kernel panic (as was original problem) |
| | .16. Wait=6min, Bad IRQ |
| | .12. Wait=6min, Bad IRQ |
| | .6. Wait=6min, Bad IRQ |
| | (e)dit, remove quiet |
| | |
| TODO | TODO |
| * Read OAK topics and redo | * Read OAK topics and redo |
| |
| |