User Tools

Site Tools


drop_setup

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
drop_setup [2011/03/28 12:29] leeuwendrop_setup [2013/10/22 13:49] leeuwen
Line 1: Line 1:
 CMDS CMDS
   * [root] export USER=<username>; useradd ${USER} ; passwd ${USER} ; /usr/sbin/setquota -u ${USER} 20000000 30000000 0 0 /export ; # usermod -G vault ${USER}   * [root] export USER=<username>; useradd ${USER} ; passwd ${USER} ; /usr/sbin/setquota -u ${USER} 20000000 30000000 0 0 /export ; # usermod -G vault ${USER}
 +  * force update using "411" line below / or "rock sync users", but that messes up home dirs
  
 TROUBLESHOOTING TROUBLESHOOTING
Line 6: Line 7:
       * Remove 'old' boot image on master, for this node: **/tftpboot/pxelinux/pxelinux.cfg/C0A800F?** where ? = EDCBA9 for nodes 012345       * Remove 'old' boot image on master, for this node: **/tftpboot/pxelinux/pxelinux.cfg/C0A800F?** where ? = EDCBA9 for nodes 012345
       * Manually reboot node (force PXE boot with F12 if needed)       * Manually reboot node (force PXE boot with F12 if needed)
 +      * Prompts for "language"? Change master root dir permissions: ''chmod a+rx ~root''
   * If network is slow (dmesg | grep eth says "Link is up at 10 Mbps, half duplex") force with ethtool:   * If network is slow (dmesg | grep eth says "Link is up at 10 Mbps, half duplex") force with ethtool:
-      * ''ethtool -s eth1 speed 1000''+      * ''/sbin/ethtool -s eth1 speed 1000'' (if fails, set ''/sbin/ethtool -s eth1 autoneg'' off then on)
       * for eth0 ''ifdown eth0; ifup eth0''       * for eth0 ''ifdown eth0; ifup eth0''
 +  * Keys: [ctrl-ctrl] = bring up display server screen ; [ctrl-alt-f2/f3/f4] = emergy sh, dmesg, etc.
 DONE DONE
   * ROCKS user guide: http://www.rocksclusters.org/roll-documentation/base/5.1/   * ROCKS user guide: http://www.rocksclusters.org/roll-documentation/base/5.1/
 +  * BACKUP
 +    * Run ''~leeuwen/bin/backup-vault-to-sara'' (tar + sftp to /archive/joerivl/drop/)
   * Changed aliases c0 .. c5 ; drop ; f for frontend   * Changed aliases c0 .. c5 ; drop ; f for frontend
      * ''[root@drop ~]# for i in `seq 0 5`; do rocks add host alias compute-0-$i c$i; done''      * ''[root@drop ~]# for i in `seq 0 5`; do rocks add host alias compute-0-$i c$i; done''
Line 19: Line 23:
      * per http://www.mulder.franken.de/workstuff/ (changed to incorporate "Bay Numbers" vs HW drives)      * per http://www.mulder.franken.de/workstuff/ (changed to incorporate "Bay Numbers" vs HW drives)
      * ''/etc/cron.hourly/raidcheck'' + ''/root/bin/check-hp-raid-status.pl''       * ''/etc/cron.hourly/raidcheck'' + ''/root/bin/check-hp-raid-status.pl'' 
-     * [TODO] add /root/bin/backup-vault-to-sara to cron.weekly (after initial tunneled scp to archive.sara.nl) 
-       * cd /export/; scp -pr home vault joerivl@archive:/archive/joerivl/drop/ 
      * front-end       * front-end 
        * e2label /dev/sda5 /scratch  ..etc        * e2label /dev/sda5 /scratch  ..etc
Line 26: Line 28:
        * turned off automount ; /etc/auto.master now empty ; /etc/exports now /data/ and /exports/        * turned off automount ; /etc/auto.master now empty ; /etc/exports now /data/ and /exports/
        * using fdisk "type fd" and mdadm --create etc, added 2x750 software RAID0        * using fdisk "type fd" and mdadm --create etc, added 2x750 software RAID0
 +         * /sbin/mdadm -Ac partitions -m 0 /dev/md0 (to bring up after reboot)
 +         * /sbin/fsck.ext4 /dev/md0
 +          mount -t ext4 /dev/md0 /scratch2
    * RAID INITIAL BUILD with '' /usr/sbin/hpacucli'' ([[http://h10032.www1.hp.com/ctg/Manual/c00709035.pdf|Guide, p44+]])    * RAID INITIAL BUILD with '' /usr/sbin/hpacucli'' ([[http://h10032.www1.hp.com/ctg/Manual/c00709035.pdf|Guide, p44+]])
      * ''ctrl slot=1 pd all show''      * ''ctrl slot=1 pd all show''
Line 35: Line 40:
        * ''mkpartfs primary ext3 0 -0''        * ''mkpartfs primary ext3 0 -0''
    * RAID0 REBUILD    * RAID0 REBUILD
-     * Blink LED, Replace disk, re-enable logical drive ([[http://www.datadisk.co.uk/html_docs/redhat/hpacucli.htm|Cheat sheet]]+     * Kill processes (''/sbin/fuser -m /dev/cciss/c0d1p1'', then kill with ''-k''), then unmount 
 +     * Blink LED (''ctrl slot=1 pd  1E:1:10 modify led=on''), Replace disk,  
 +     * Re-enable logical drive (''ctrl slot=1 ld 2 modify reenable forced'') (from [[http://www.datadisk.co.uk/html_docs/redhat/hpacucli.htm|Cheat sheet]]
 +     * Make new file system, ''/sbin/mkfs.ext3 -L /data /dev/cciss/c0d1p1''
    * RAID PHYS CONFIG    * RAID PHYS CONFIG
      * Drive 1 = left, top; Count seems to be 1-3 = left column, top to bottom. RAID0=two lowest in second-to-right colum + entire right column      * Drive 1 = left, top; Count seems to be 1-3 = left column, top to bottom. RAID0=two lowest in second-to-right colum + entire right column
    * CROSSMOUNTS     * CROSSMOUNTS 
-     * removed automounter. changed ''/etc/exports'' and node script to hard mount. Homedirs are /export/home/+     * removed automounter. changed ''/etc/exports'' and node script to hard mount. Homedirs are /export/home/ (make sure this is correct in /etc/passwd)
      * ''make -C /var/411 clean; make -C /var/411; make -C /var/411 force; cluster-fork 411get -''''-all''      * ''make -C /var/411 clean; make -C /var/411; make -C /var/411 force; cluster-fork 411get -''''-all''
      * ''/etc/rc.d/init.d/nfs restart; /etc/rc.d/init.d/nfs restart; make -C /var/411;  service autofs reload;  exportfs -a''      * ''/etc/rc.d/init.d/nfs restart; /etc/rc.d/init.d/nfs restart; make -C /var/411;  service autofs reload;  exportfs -a''
Line 48: Line 56:
      * Read speeds: RAID0 160MB/s, RAID5/local 140MB/s      * Read speeds: RAID0 160MB/s, RAID5/local 140MB/s
      * Increased number of NFS threads from 8 to 32 (link [[http://tldp.org/HOWTO/NFS-HOWTO/performance.html|1]][[https://lists.ubuntu.com/archives/edubuntu-users/2007-September/002213.html|2]])      * Increased number of NFS threads from 8 to 32 (link [[http://tldp.org/HOWTO/NFS-HOWTO/performance.html|1]][[https://lists.ubuntu.com/archives/edubuntu-users/2007-September/002213.html|2]])
 +   * VNC/ETC
 +     * RealVNC server
 +     * FreeNX (yum install nx freenx ; http://wiki.centos.org/HowTos/FreeNX)
    * GRID ENGINE    * GRID ENGINE
      * Some HOWTO links: [[http://biowiki.org/HowToUseSunGridEngine|1]], [[http://gridengine.sunsource.net/howto/howto.htm|2]], [[http://gridengine.info/2008/01/20/understanding-queue-error-state-e| removing state 'E']]      * Some HOWTO links: [[http://biowiki.org/HowToUseSunGridEngine|1]], [[http://gridengine.sunsource.net/howto/howto.htm|2]], [[http://gridengine.info/2008/01/20/understanding-queue-error-state-e| removing state 'E']]
Line 75: Line 86:
        * ppgplot: ''ppgplot_libraries = ["cpgplot", "pgplot", "X11", "png", "m", "g2c"]'' +  ''ppgplot_library_dirs = ["/usr/X11R6/lib"]''        * ppgplot: ''ppgplot_libraries = ["cpgplot", "pgplot", "X11", "png", "m", "g2c"]'' +  ''ppgplot_library_dirs = ["/usr/X11R6/lib"]''
      * Set UseDNS to NO in /etc/ssh/sshd_config for master+nodes, after very slow logins after IP changes to front node      * Set UseDNS to NO in /etc/ssh/sshd_config for master+nodes, after very slow logins after IP changes to front node
-       * Which turned out to be caused by outdated DNS server in named.conf and resolve.conf: ''rocks set var Kickstart PublicDNSServers 195.169.63.49'' +       * Which turned out to be caused by outdated DNS server in named.conf and resolve.conf: ''rocks set var Kickstart PublicDNSServers 195.169.63.49'' in python/setup.py
- in python/setup.py+
    * NODE PACKAGES    * NODE PACKAGES
      * cd /export/rocks/install/contrib/5.1/x86_64/RPMS      * cd /export/rocks/install/contrib/5.1/x86_64/RPMS
Line 90: Line 100:
    * LIGHTPATH    * LIGHTPATH
      * installed eth2 on c4. edited ''/etc/sysconfig/network-scripts/ifcfg-eth2'' to static ''IPADDR=192.87.39.129'', ''NETMASK=255.255.255.248'', ''MTU=9000''      * installed eth2 on c4. edited ''/etc/sysconfig/network-scripts/ifcfg-eth2'' to static ''IPADDR=192.87.39.129'', ''NETMASK=255.255.255.248'', ''MTU=9000''
-     * On command line, added ''route add 145.100.26.152 gw 192.87.39.130'' for Huygens +     * On command line, added ''/sbin/route add 145.100.26.152 gw 192.87.39.130'' for Huygens 
-     + 
 +2013 BOOT 
 + Kernel 2.6.18-238.9.1.el5 got beyond RAID wait after 5 minutes. 
 +               348.18.    Bad IRQ, kernel panic (as was original problem) 
 +                  .16.    Wait=6min, Bad IRQ  
 +                  .12.    Wait=6min, Bad IRQ  
 +                  .6.     Wait=6min, Bad IRQ  
 +(e)dit, remove quiet                      
 + 
 TODO TODO
   * Read OAK topics and redo   * Read OAK topics and redo
  
  
drop_setup.txt · Last modified: 2013/10/22 14:04 by leeuwen