dragnet:system_software

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:system_software [2017-06-07 17:46] – add more installed pkgs on DRAGNET amesfoortdragnet:system_software [2017-08-18 01:04] (current) – [QPID Message Broker Config for Operations] clarify amesfoort
Line 2: Line 2:
  
 All DRAGNET nodes were installed by Mike Sipior (ASTRON) with CentOS 7 using cobbler and ansible. The cobbler and ansible settings are available in a git repo on the dragnet headnode at ''/var/lib/git/dragnet.git/'' All DRAGNET nodes were installed by Mike Sipior (ASTRON) with CentOS 7 using cobbler and ansible. The cobbler and ansible settings are available in a git repo on the dragnet headnode at ''/var/lib/git/dragnet.git/''
 +
 +Most changes have been tracked here and should ideally go into the ansible/cobbler settings git repo. However, it is unlikely going to happen (time is better spent on other tasks), so the rough notes are tracked here in case we ever have to reinstall. (Obviously, up-to-date and completeness guarantees of this list are low, but it goes a long way.)
  
 Many system software packages have been installed, settings changed, CentOS updated to 7.2, /opt (+ some /usr/local) installed (by Alexander), while Vlad and Cees installed all pulsar user tools under /usr/local (NFS). Many system software packages have been installed, settings changed, CentOS updated to 7.2, /opt (+ some /usr/local) installed (by Alexander), while Vlad and Cees installed all pulsar user tools under /usr/local (NFS).
  
-Many packages installed by Alexander on DRAGNET have a ''/home/alexander/pkg/PKGNAME-install.txt'' with commands close to a shell script used to config/build/install the package on DRAGNET. 
  
-Most changes have been tracked and should ideally go into the ansible/cobbler settings git repo. Howeverit is unlikely going to happen (time is better spent on other tasks), so the rough notes are tracked here in case we ever have to reinstall. (Obviously, up-to-date and completeness guarantees of this list are low, but it goes a long way.)+===== LOFAR Builds ===== 
 +LOFAR software builds on DRAGNET can be build+deployed and selected/activated using the scripts in that repo, viewable under https://svn.astron.nl/viewvc/LOFAR/trunk/SubSystems/Dragnet/scripts/ 
 +  * LOFAR-Dragnet-deploy.sh (takes ~15 mins) 
 +  * LOFAR-Dragnet-activate.sh (takes 10 s) 
 +Normallythese scripts are kicked off via [[https://support.astron.nl/jenkins/ | Jenkins]]. (See my slides ''DRAGNET-Observatory operations by Alexander (3 Jul 2017)'' available from the [[dragnet:start | DRAGNET wiki start page]] for what Jenkins buttons to press. If you don't have access to Jenkins, ask Arno (LOFAR software release manager).)\\ 
 +As described in the scripts, these scripts can also be run from the command-line //as user lofarbuild//. You then have to manually look up the release name to be used.\\ 
 +Regardless of which branch or tag you build via Jenkins, the Jenkins jobs //always// svn export from the trunk!\\ 
 + 
 +The LOFAR package built on DRAGNET is named ''Dragnet'', as can be seen from the ''cmake'' command in the ''LOFAR-Dragnet-deploy.sh''. This is simply a meta-package described in the package's [[https://svn.astron.nl/viewvc/LOFAR/trunk/SubSystems/Dragnet/CMakeLists.txt?view=markup | CMakeLists.txt]]. 
 + 
 +Any LOFAR build on DRAGNET depends heavily on many dependencies, the paths of which are listed in hostname matching files under https://svn.astron.nl/viewvc/LOFAR/trunk/CMake/variants/ 
 + 
 +We only have ''variants.dragnet'' (auto-selected on our headnode) and a ''variants.dragproc'' symlink. //This means that ''cmake'' runs on other nodes will fail, unless you manually add another symlink locally!// (The reason is that such builds are slow anyway, unless done from/to local disks. Prefer building on the head node (or ''dragproc'').) 
 + 
 +Fixing LOFAR builds is thus often a matter of small commits to the config files and/or dependent software upgrades on DRAGNETinstead of fixing the deploy script. One deploy script caveat is that it assumes all DRAGNET nodes are working... 
 + 
 + 
 +===== Other Packages installed by Alexander ===== 
 +Many packages installed by Alexander on DRAGNET have a ''/home/alexander/pkg/PKGNAME-install.txt'' with commands close to a shell script used to config/build/install the package on DRAGNET. If you need to upgrade/reinstall, just copy-paste each command line by line with your brain engaged.\\ 
 + 
 + 
 +===== QPID Message Broker Config for Operations ==== 
 +To keep this rather complex config beast as low profile as possible on DRAGNETthis is only set up on DRAGNET to facilitate observation feedback flowing back to Observatory systems (MoM). This is inevitable (COBALT expects the local qpid queues), although failure impact is low: status in MoM.\\ 
 + 
 +To use [[operator:resourcetool | resourcetool]], qpid is also needed, but by always specifying a broker host on the command line, we can avoid tracking RO qpid config just for that. It also makes operations vs test systems explicit (ccu001 vs ccu199). 
 + 
 +QPID is going to be used more and more, e.g. also for user ingest. 
 + 
 +Reinoud (and Jan David) are the people to debug qpid trouble with. 
 + 
 + 
 +=== QPID Config for Feedback === 
 +On DRAGNET, I created 3 queues on each node (twice, once for operations and once for the test system), and routes from all nodes to the head node, and from the head node to ccu001 (operations) and ccu199 (test).\\ 
 +See **/home/amesfoort/build_qpid_queues-dragnet.sh** although typically I use it as notes instead of running it nilly-willy... RO software also has scripts where I added our queues and routes in case everything would need to be reset.\\ 
 + 
 +Overview on a node (1st queue with pseudo-random name is from the viewing operation itself): 
 +[amesfoort@dragnet ~]$ qpid-stat -q 
 +  Queues 
 +    queue                                     dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind 
 +    ========================================================================================================================= 
 +    a1fe3b70-1595-4e4d-9313-8d1706861ba0:0.0              Y        0          0            0        0             2 
 +    lofar.task.feedback.dataproducts          Y                      0  11.4k  11.4k      0   39.1m    39.1m        1     1 
 +    lofar.task.feedback.processing            Y                      0          0            0        0             1 
 +    lofar.task.feedback.state                                      0          0            0        0             1 
 +    test.lofar.task.feedback.dataproducts                          0    61     61          185k     185k        1     1 
 +    test.lofar.task.feedback.processing                            0          0            0        0             1 
 +    test.lofar.task.feedback.state            Y                      0          0            0        0             1 
 + 
 +Overview of all routes //to// the ''dragnet'' head node (6 per node): 
 +  [amesfoort@dragnet ~]$ qpid-route route list 
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg02.control.lofar:5672   
 +  [...] 
 +  dragnet:5672 drg22.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 + 
 + 
 +===== System Config Changes ===== 
 +On top of git repo with ansible/cobbler settings: 
 + 
 +==== Crontab ==== 
 +=== casacore measures tables === 
 +On host ''dragnet'' (the script downloads oncethen applies the update on all nodes), run the command every Mon 04:00 AM.\\ 
 +This auto-updates the casacore measures tables with info on observatories, solar bodies, leap seconds, int'l earth rotation (IERS) coefficients, etc. 
 +  [amesfoort@dragnet ~]$ sudo crontab -u lofarsys -l 
 +  0 4 * * 1 /opt/IERS/cron-update-IERS-DRAGNET.sh 2> /home/lofarsys/lofar/var/log/IERS/cron-update-IERS-DRAGNET.log 
 + 
 +=== resourcetool === 
 +On any host but ''dragnet'' (it has no RADB resources), run the [[operator:resourcetool|resourcetool]] command with the -E and possibly -U option(s) every 20 mins starting 1 min past the hour.\\ 
 +This auto-updates storage claim end times in the Observatory's RADB. Else, Observatory systems will eventually think our disks are full and scheduling observations becomes impossible. But we manage disk space ourselves. (The tool also has some other useful capabilities.) 
 +  [amesfoort@any_but_dragnet ~]$ sudo crontab -u lofarsys -l 
 +  1,21,41 * * * * source /opt/lofar/lofarinit.sh; LOFARENV=PRODUCTION /opt/lofar/bin/resourcetool --broker=scu001.control.lofar --end-past-tasks-storage-claims > /home/lofarsys/lofar/var/log/resourcetool/cron-update-resourcetool-$HOSTNAME.log 2>&1
  
 +==== /etc ====
 +Apply ''/home/amesfoort/etc/*'' to /etc/
  
-===== System Config Changes (on top of git repo with ansible/cobbler settings) =====+==== Other ====
 <file> <file>
 newgrp dragnet newgrp dragnet
Line 40: Line 130:
 nethogs nethogs
 erfa-devel erfa-devel
 +armadillo-devel
 python-astropy python-astropy
 python-jinja2  # for the FACTOR imaging pipeline module python-jinja2  # for the FACTOR imaging pipeline module
Line 203: Line 294:
 install pkgs from ~/pkg such as log4cplus, ... install pkgs from ~/pkg such as log4cplus, ...
  
-add /etc/modulefiles/* to ansible+add changed /etc/modulefiles/* to ansible
  
 /etc/security/limits.conf: /etc/security/limits.conf:
Line 220: Line 311:
 sudo systemctl restart qpidd sudo systemctl restart qpidd
 (& check if systemctl enable qpidd (and start qpidd) are indeed in ansible) (& check if systemctl enable qpidd (and start qpidd) are indeed in ansible)
- 
-add LofarObservationStartListener.service ? 
  
 added routing table entries for drg*, dragproc in ansible added routing table entries for drg*, dragproc in ansible
  
 ----- -----
-for lustre mount cep4 (drg nodes only (need ib atm), further install by hand atm (need rpm rebuild from src rpm)):+for lustre mount cep4 (drg nodes only (need ib atm), further install by hand atm (need rpm rebuild from src rpm)). On all drgXX nodes:
 # create /etc/modprobe.d/lnet.conf with: # create /etc/modprobe.d/lnet.conf with:
 options lnet networks=o2ib(ib0) options lnet networks=o2ib(ib0)
 +
 +# create/adjust /etc/modprobe.d/ko2iblnd.conf with:
 +#comment out any 'alias' and 'options' lines other than the next (which MUST match the settings on the Lustre MGS (and thus all other clients as well)):
 +options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=2048 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
 +# optional:
 +install ko2iblnd /usr/sbin/ko2iblnd-probe
 +
 +# create mount point as root:
 +mkdir -p /cep4data
  
 # append to /etc/fstab # append to /etc/fstab
 meta01.cep4.infiniband.lofar@o2ib:meta02.cep4.infiniband.lofar@o2ib:/cep4-fs /cep4data lustre defaults,ro,flock,noauto 0 0 meta01.cep4.infiniband.lofar@o2ib:meta02.cep4.infiniband.lofar@o2ib:/cep4-fs /cep4data lustre defaults,ro,flock,noauto 0 0
  
-mkdir -p /cep4data 
 ----- -----
  
  • Last modified: 2017-06-07 17:46
  • by amesfoort