dragnet:system_software

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
dragnet:system_software [2017-06-01 14:18] – created: heading + long item list of system software and settings changes not in ansible/cobbler git repo amesfoortdragnet:system_software [2017-08-18 01:04] (current) – [QPID Message Broker Config for Operations] clarify amesfoort
Line 3: Line 3:
 All DRAGNET nodes were installed by Mike Sipior (ASTRON) with CentOS 7 using cobbler and ansible. The cobbler and ansible settings are available in a git repo on the dragnet headnode at ''/var/lib/git/dragnet.git/'' All DRAGNET nodes were installed by Mike Sipior (ASTRON) with CentOS 7 using cobbler and ansible. The cobbler and ansible settings are available in a git repo on the dragnet headnode at ''/var/lib/git/dragnet.git/''
  
-Many system software packages have been installed, settings changed, CentOS updated to 7.2/opt installed (by Alexander), while Vlad and Cees installed most pulsar user tools under /usr/local.+Most changes have been tracked here and should ideally go into the ansible/cobbler settings git repoHoweverit is unlikely going to happen (time is better spent on other tasks), so the rough notes are tracked here in case we ever have to reinstall. (Obviously, up-to-date and completeness guarantees of this list are low, but it goes a long way.)
  
-Apart from /usr/local, most changes have been tracked and should ideally go into the ansible/cobbler settings git repo. Howeverit is unlikely going to happen (time is better spent on other tasks), so the rough notes are tracked here in case we ever have to reinstall. (Up-to-date and completeness guarantees of the complete list is low.)+Many system software packages have been installed, settings changedCentOS updated to 7.2, /opt (+ some /usr/localinstalled (by Alexander), while Vlad and Cees installed all pulsar user tools under /usr/local (NFS).
  
  
-===== System Config Changes (on top of git repo with ansible/cobbler settings) =====+===== LOFAR Builds ===== 
 +LOFAR software builds on DRAGNET can be build+deployed and selected/activated using the scripts in that repo, viewable under https://svn.astron.nl/viewvc/LOFAR/trunk/SubSystems/Dragnet/scripts/ 
 +  * LOFAR-Dragnet-deploy.sh (takes ~15 mins) 
 +  * LOFAR-Dragnet-activate.sh (takes 10 s) 
 +Normally, these scripts are kicked off via [[https://support.astron.nl/jenkins/ | Jenkins]]. (See my slides ''DRAGNET-Observatory operations by Alexander (3 Jul 2017)'' available from the [[dragnet:start | DRAGNET wiki start page]] for what Jenkins buttons to press. If you don't have access to Jenkins, ask Arno (LOFAR software release manager).)\\ 
 +As described in the scripts, these scripts can also be run from the command-line //as user lofarbuild//. You then have to manually look up the release name to be used.\\ 
 +Regardless of which branch or tag you build via Jenkins, the Jenkins jobs //always// svn export from the trunk!\\ 
 + 
 +The LOFAR package built on DRAGNET is named ''Dragnet'', as can be seen from the ''cmake'' command in the ''LOFAR-Dragnet-deploy.sh''. This is simply a meta-package described in the package's [[https://svn.astron.nl/viewvc/LOFAR/trunk/SubSystems/Dragnet/CMakeLists.txt?view=markup | CMakeLists.txt]]. 
 + 
 +Any LOFAR build on DRAGNET depends heavily on many dependencies, the paths of which are listed in hostname matching files under https://svn.astron.nl/viewvc/LOFAR/trunk/CMake/variants/ 
 + 
 +We only have ''variants.dragnet'' (auto-selected on our headnode) and a ''variants.dragproc'' symlink. //This means that ''cmake'' runs on other nodes will fail, unless you manually add another symlink locally!// (The reason is that such builds are slow anyway, unless done from/to local disks. Prefer building on the head node (or ''dragproc'').) 
 + 
 +Fixing LOFAR builds is thus often a matter of small commits to the config files and/or dependent software upgrades on DRAGNET, instead of fixing the deploy script. One deploy script caveat is that it assumes all DRAGNET nodes are working... 
 + 
 + 
 +===== Other Packages installed by Alexander ===== 
 +Many packages installed by Alexander on DRAGNET have a ''/home/alexander/pkg/PKGNAME-install.txt'' with commands close to a shell script used to config/build/install the package on DRAGNET. If you need to upgrade/reinstall, just copy-paste each command line by line with your brain engaged.\\ 
 + 
 + 
 +===== QPID Message Broker Config for Operations ==== 
 +To keep this rather complex config beast as low profile as possible on DRAGNET, this is only set up on DRAGNET to facilitate observation feedback flowing back to Observatory systems (MoM). This is inevitable (COBALT expects the local qpid queues), although failure impact is low: status in MoM.\\ 
 + 
 +To use [[operator:resourcetool | resourcetool]], qpid is also needed, but by always specifying a broker host on the command line, we can avoid tracking RO qpid config just for that. It also makes operations vs test systems explicit (ccu001 vs ccu199). 
 + 
 +QPID is going to be used more and more, e.g. also for user ingest. 
 + 
 +Reinoud (and Jan David) are the people to debug qpid trouble with. 
 + 
 + 
 +=== QPID Config for Feedback === 
 +On DRAGNET, I created 3 queues on each node (twice, once for operations and once for the test system), and routes from all nodes to the head node, and from the head node to ccu001 (operations) and ccu199 (test).\\ 
 +See **/home/amesfoort/build_qpid_queues-dragnet.sh** although typically I use it as notes instead of running it nilly-willy... RO software also has scripts where I added our queues and routes in case everything would need to be reset.\\ 
 + 
 +Overview on a node (1st queue with pseudo-random name is from the viewing operation itself): 
 +[amesfoort@dragnet ~]$ qpid-stat -q 
 +  Queues 
 +    queue                                     dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind 
 +    ========================================================================================================================= 
 +    a1fe3b70-1595-4e4d-9313-8d1706861ba0:0.0              Y        0          0            0        0             2 
 +    lofar.task.feedback.dataproducts          Y                      0  11.4k  11.4k      0   39.1m    39.1m        1     1 
 +    lofar.task.feedback.processing            Y                      0          0            0        0             1 
 +    lofar.task.feedback.state                                      0          0            0        0             1 
 +    test.lofar.task.feedback.dataproducts                          0    61     61          185k     185k        1     1 
 +    test.lofar.task.feedback.processing                            0          0            0        0             1 
 +    test.lofar.task.feedback.state            Y                      0          0            0        0             1 
 + 
 +Overview of all routes //to// the ''dragnet'' head node (6 per node): 
 +  [amesfoort@dragnet ~]$ qpid-route route list 
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 dragproc.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg01.control.lofar:5672   
 +  dragnet:5672 drg02.control.lofar:5672   
 +  [...] 
 +  dragnet:5672 drg22.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 +  dragnet:5672 drg23.control.lofar:5672   
 + 
 + 
 +===== System Config Changes ===== 
 +On top of git repo with ansible/cobbler settings
 + 
 +==== Crontab ==== 
 +=== casacore measures tables === 
 +On host ''dragnet'' (the script downloads once, then applies the update on all nodes), run the command every Mon 04:00 AM.\\ 
 +This auto-updates the casacore measures tables with info on observatories, solar bodies, leap seconds, int'l earth rotation (IERS) coefficients, etc. 
 +  [amesfoort@dragnet ~]$ sudo crontab -u lofarsys -l 
 +  0 4 * * 1 /opt/IERS/cron-update-IERS-DRAGNET.sh 2> /home/lofarsys/lofar/var/log/IERS/cron-update-IERS-DRAGNET.log 
 + 
 +=== resourcetool === 
 +On any host but ''dragnet'' (it has no RADB resources), run the [[operator:resourcetool|resourcetool]] command with the -E and possibly -U option(s) every 20 mins starting 1 min past the hour.\\ 
 +This auto-updates storage claim end times in the Observatory's RADB. Else, Observatory systems will eventually think our disks are full and scheduling observations becomes impossible. But we manage disk space ourselves. (The tool also has some other useful capabilities.) 
 +  [amesfoort@any_but_dragnet ~]$ sudo crontab -u lofarsys -l 
 +  1,21,41 * * * * source /opt/lofar/lofarinit.sh; LOFARENV=PRODUCTION /opt/lofar/bin/resourcetool --broker=scu001.control.lofar --end-past-tasks-storage-claims > /home/lofarsys/lofar/var/log/resourcetool/cron-update-resourcetool-$HOSTNAME.log 2>&
 + 
 +==== /etc ==== 
 +Apply ''/home/amesfoort/etc/*'' to /etc/ 
 + 
 +===Other ====
 <file> <file>
 newgrp dragnet newgrp dragnet
Line 38: Line 130:
 nethogs nethogs
 erfa-devel erfa-devel
 +armadillo-devel
 python-astropy python-astropy
 python-jinja2  # for the FACTOR imaging pipeline module python-jinja2  # for the FACTOR imaging pipeline module
 python-daemon python-daemon
 python-matplotlib-qt4 python-matplotlib-qt4
 +python-psycopg2 mysql-connector-python PyGreSQL  # LOFAR mysql, postgresql DB python interface modules (used for self-tests only?)
 +python2-mock  # for python LOFAR self-tests under SAS/ and elsewhere
 qpid-cpp-server-linearstore  (add to qpid pkgs) qpid-cpp-server-linearstore  (add to qpid pkgs)
 patch patch
Line 93: Line 188:
 librdmacm-devel   # idem librdmacm-devel   # idem
 mstflint          # idem mstflint          # idem
 +
 +# Python packages N/A in CentOS package manager; use pip install
 +python-monetdb  # for LOFAR GSM (imaging);  on the head node we did: sudo pip install --target=/usr/local/lib/python2.7/site-packages python-monetdb
 +xmlrunner       # for LOFAR Pipeline tests; on the head node we did: sudo pip install --target=/usr/local/lib/python2.7/site-packages xmlrunner
 +
  
 /etc/yum/pluginconf.d/fastestmirror.conf /etc/yum/pluginconf.d/fastestmirror.conf
Line 143: Line 243:
 systemctl start NetworkManager-dispatcher.service systemctl start NetworkManager-dispatcher.service
  
-Correct table example drg16:+Correct table example drg16 (except that CEP2 routes and sub-tables can now be removed):
 [amesfoort@drg16 network-scripts]$ ip ru  [amesfoort@drg16 network-scripts]$ ip ru 
 0: from all lookup local  0: from all lookup local 
Line 194: Line 294:
 install pkgs from ~/pkg such as log4cplus, ... install pkgs from ~/pkg such as log4cplus, ...
  
-add /etc/modulefiles/* to ansible+add changed /etc/modulefiles/* to ansible
  
 /etc/security/limits.conf: /etc/security/limits.conf:
Line 211: Line 311:
 sudo systemctl restart qpidd sudo systemctl restart qpidd
 (& check if systemctl enable qpidd (and start qpidd) are indeed in ansible) (& check if systemctl enable qpidd (and start qpidd) are indeed in ansible)
- 
-add LofarObservationStartListener.service ? 
  
 added routing table entries for drg*, dragproc in ansible added routing table entries for drg*, dragproc in ansible
- 
-add michilli and mariaarias to dragnet group 
  
 ----- -----
-for lustre mount cep4 (drg nodes only (need ib atm), further install by hand atm (need rpm rebuild from src rpm)):+for lustre mount cep4 (drg nodes only (need ib atm), further install by hand atm (need rpm rebuild from src rpm)). On all drgXX nodes:
 # create /etc/modprobe.d/lnet.conf with: # create /etc/modprobe.d/lnet.conf with:
 options lnet networks=o2ib(ib0) options lnet networks=o2ib(ib0)
 +
 +# create/adjust /etc/modprobe.d/ko2iblnd.conf with:
 +#comment out any 'alias' and 'options' lines other than the next (which MUST match the settings on the Lustre MGS (and thus all other clients as well)):
 +options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=2048 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
 +# optional:
 +install ko2iblnd /usr/sbin/ko2iblnd-probe
 +
 +# create mount point as root:
 +mkdir -p /cep4data
  
 # append to /etc/fstab # append to /etc/fstab
 meta01.cep4.infiniband.lofar@o2ib:meta02.cep4.infiniband.lofar@o2ib:/cep4-fs /cep4data lustre defaults,ro,flock,noauto 0 0 meta01.cep4.infiniband.lofar@o2ib:meta02.cep4.infiniband.lofar@o2ib:/cep4-fs /cep4data lustre defaults,ro,flock,noauto 0 0
  
-mkdir -p /cep4data 
 ----- -----
  
  • Last modified: 2017-06-01 14:18
  • by amesfoort