public:lta_howto

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:lta_howto [2020-10-30 16:32] – [Staging Transient Buffer Board (TBB) data] Sander ter Veenpublic:lta_howto [2020-11-04 15:36] (current) Bernard Asabere
Line 22: Line 22:
 The LTA catalogue can be searched directly without needing any account. Access to all projects and search queries will return results of the entire catalogue because metadata are public for all LTA content. The LTA catalogue can be searched directly without needing any account. Access to all projects and search queries will return results of the entire catalogue because metadata are public for all LTA content.
  
-Staging and subsequent downloading of __**public**__  data always requires an account with LTA user access privileges. This automatically happens if you were a member of the original project proposal in Northstar/MoM. If you do not have an account yet, you can register with "[[https://lofar.astron.nl/useradministration/public/setUpUserAccount.do|Create account]]". LTA access privileges will need to be granted by Science Data Centre Operations (SDCO). Please, __once you have created your account, submit a support request to the [[https://support.astron.nl/rohelpdesk|Science Data Centre Operations helpdesk]] asking for LTA privileges stating clearly your username__.+Staging and subsequent downloading of __**public**__  data always requires an account with LTA user access privileges. This automatically happens if you were a member of the original project proposal in Northstar/MoM. If you do not have an account yet, you can register with "[[https://lofar.astron.nl/useradministration/public/setUpUserAccount.do|Create account]]". LTA access privileges will need to be granted by Science Data Centre Operations (SDCO). Please, __once you have created your account, submit a support request to the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]] asking for LTA privileges stating clearly your username__.
  
 To stage and retrieve project-related data in the LTA which are __**proprietary**__  you need to have an account in [[https://lofar.astron.nl/mom3|MoM]] that is enabled for the archive and coupled with the projects of interest. To this aim you can request SDCO to be added to the list of co-authors of the project. When you send such a request, you must add the project's PI in cc. After SDCO adds you to the project, you might get an email asking you to set a new password in [[https://webportal.astron.nl/pwm/private/Login|ASTRON Web Applications Password Self Service]]. Please note that this will set a new password not just for the LTA //but for MoM (LOFAR/WSRT) and Northstar as well//. To stage and retrieve project-related data in the LTA which are __**proprietary**__  you need to have an account in [[https://lofar.astron.nl/mom3|MoM]] that is enabled for the archive and coupled with the projects of interest. To this aim you can request SDCO to be added to the list of co-authors of the project. When you send such a request, you must add the project's PI in cc. After SDCO adds you to the project, you might get an email asking you to set a new password in [[https://webportal.astron.nl/pwm/private/Login|ASTRON Web Applications Password Self Service]]. Please note that this will set a new password not just for the LTA //but for MoM (LOFAR/WSRT) and Northstar as well//.
Line 124: Line 124:
   - On a 1 Gbit/s connection as a general rule of thumb, you should be able to retrieve data at about 100-500 GB/hour, especially if you try to retrieve 4-8 files concurrently. If you see speeds much lower than this, you might have some kind of network problem and should in general contact your IT staff.   - On a 1 Gbit/s connection as a general rule of thumb, you should be able to retrieve data at about 100-500 GB/hour, especially if you try to retrieve 4-8 files concurrently. If you see speeds much lower than this, you might have some kind of network problem and should in general contact your IT staff.
   - Staging the data from tape to disk might take quite a bit of time. In the large data centres that the LTA uses, the tape drives are shared with all users and requests are queued. This is not just users of LOFAR but large data other projects like the LHC. This might mean that it takes anywhere from a few hours to a day or more to stage a copy of your data from tape to disk.   - Staging the data from tape to disk might take quite a bit of time. In the large data centres that the LTA uses, the tape drives are shared with all users and requests are queued. This is not just users of LOFAR but large data other projects like the LHC. This might mean that it takes anywhere from a few hours to a day or more to stage a copy of your data from tape to disk.
-  - The amount of space available for staging data is limited although quite large. This space is however shared between all LOFAR LTA users. This includes LTA operations for buffering data from CEP to the LTA before it gets moved to tape. If many users are staging data at the same time, and/or SDCO operations is transferring large amounts of data, the system might temporarily run low on disk space. You might then get a message that your request was only partially successful. In general the request will still finish 1-2 days later and we do monitor if requests don't get stuck and restart if needed.+  - The amount of space available for staging data is limited although quite large. This space is however shared between all LOFAR LTA users. This includes LTA operations for buffering data from CEP to the LTA before it gets moved to tape. If many users are staging data at the same time, and/or SDC operations is transferring large amounts of data, the system might temporarily run low on disk space. You might then get a message that your request was only partially successful. In general the request will still finish 1-2 days later and we do monitor if requests don't get stuck and restart if needed.
   - We strive to keep a copy of data that was staged on disk for 1-2 weeks so you have some time to download it. After that it might get removed to make space for more recent requests. The copy of the data on tape is only read and will still be available if you need to access the data again at a later stage but you might need to stage a copy to disk again.   - We strive to keep a copy of data that was staged on disk for 1-2 weeks so you have some time to download it. After that it might get removed to make space for more recent requests. The copy of the data on tape is only read and will still be available if you need to access the data again at a later stage but you might need to stage a copy to disk again.
   - We are continuously trying to improve the reliability and speed of the available services. Please contact SDCO if you have any problems or suggestions for improvement.   - We are continuously trying to improve the reliability and speed of the available services. Please contact SDCO if you have any problems or suggestions for improvement.
Line 136: Line 136:
 wget --no-check-certificate https://lofar-download.grid.surfsara.nl/lofigrid/SRMFifoGet.py?surl=<filename> . wget --no-check-certificate https://lofar-download.grid.surfsara.nl/lofigrid/SRMFifoGet.py?surl=<filename> .
  
-Note: the filename should start or be prepended with srm://////+The filename should start or be prepended with srm://
  
 </file> </file>
  
-You will need a valid LTA account to access this data. If the download is very short, you can view (e.g. cat) the filename for errors and report them to the helpdesk.+You will need a valid LTA account to access this data. If the filename is very short, you can view (e.g. cat) it to view errors that have occured.
  
-//==== = Download data ===== You can download your requested data with the files from your e-mail notification. There are different possibilities and tools to do this. If you're unsure, which one to use, please refer to the according  FAQ Answer .  ==== HTTP download ====  If you open ''html.txt'' this file contains a list of http links that you can feed to a unix commandline tool like ''wget'' or ''curl'' or even use in a browser. For wget you can use the following command line: '' wget -i html.txt '' This will download the files in ''html.txt'' to the current directory (option '-i' reads the urls from the specified file). Preferrably, especially when downloading large files, you should also use option '-c'. This will continue unfinished earlier downloads instead of starting a fresh download of the whole file. (Make sure to first delete existing files that contain error messages instead of data, if you use this option): '' wget -ci html.txt '' Do not set the username and password on the wget command line because this allows other users on the system to view them in the process list. Instead you should create a file ~/.wgetrc with two lines according to the following example: '' user=lofaruser password=secret '' Note: This is only an example, you have to edit the file and enter your own personal user name and password! Set access authorizations of the .wgetrc file to user only so that the credentials are not exposed to anybody else, e.g.: '' chmod 600 .wgetrc '' There is no easy way to have wget rename the files as part of the command directly. It does not accept the -O flag inside a file it gets with -i. You can either rename files afterward, e.g. using the following command: '' find . -name __GESHI_QUOT__SRMFifoGet*__GESHI_QUOT__ | awk -F %2F '{system(__GESHI_QUOT__mv __GESHI_QUOT__ $0__GESHI_QUOT__ __GESHI_QUOT__ $NF)}' '' or add the -O option to each line in html.txt but then feed each line to wget separately like this: cat ''html.txt | xargs wget''. By default the html.txt file does not contain such options. The following Python script will take care of renaming and untarring the downloaded files:  #M.C. Toribio #toribio@astron.nl # #Script to untar data retrieved from the LTA by using wget #It will DELETE the .tar file after extracting it. # #Notes: #When using wget, the files are named, as an example: #SRMFifoGet.py?surl=srm:%2F%2Fsrm.grid.sara.nl:8443%2Fpnfs%2Fgrid.sara.nl%2Fdata%2Flofar%2Fops%2Fprojects%2Flofarschool%2F246403%2FL246403_SAP000_SB000_uv.MS_7d4aa18f.tar # This scripts will rename those files as the string after the last '%' # If you want to change that behaviour, modify line # outname=filename.split(__GESHI_QUOT__ %__GESHI_QUOT__)[-1] # # Version: # 2014/11/12: M.C. Toribio import os import glob for filename in glob.glob(__GESHI_QUOT__*SB*.tar*__GESHI_QUOT__): outname=filename.split(__GESHI_QUOT__ %__GESHI_QUOT__)[-1] os.rename(filename, outname) os.system('tar -xvf '+outname) os.system('rm -r '+outname ) print outname+' untarred.'  Another Python script for renaming the downloaded (and previously untarred) files. It removes the random part of the filename before the .tar extension:  import os import sys import glob # AUTHOR: J.B.R. OONK (ASTRON/LEIDEN UNIV. 2015) # - changes LTA retrieval filename to standard filename # - run in the directory where LTA files are located # FILE DIRECTORY path = __GESHI_QUOT__./__GESHI_QUOT__ #DIRECTORY filelist = glob.glob(path+'*.tar') print 'LIST:', filelist #FILE STRING SEPARATORS sp1d='%' sp2d='2F' extn='.MS' extt='.tar' #LOOP print '##### STARTING THE LOOP #####' for infile_orig in filelist: #GET FILE infiletar = os.path.basename(infile_orig) infile = infiletar print 'doing file: ', infile spl1=infile.split(sp1d)[11] spl2=spl1.split(sp2d)[1] spl3=spl2.split(extn)[0] newname = spl3+extn+extt # SPECIFY FILE MV COMMAND command='mv ' + infile + ' ' +newname print command # CARRY OUT FILENAME CHANGE !!! # - COMMENT FOR TESTING OUTPUT # - UNCOMMENT TO PERFORM FILE MV COMMAND #os.system(command) print 'finished rename of: ', newname  Note that wget does not overwrite existing files. If you use the continue option ('-c') it will append any missing parts to the existing file. If you don't use the continue option and there is a file present (e.g. from a stopped earlier download), wget creates a new file by appending a number (e.g., '.1') to the filename. There are some small example links if you browse to  https://lofar-download.grid.sara.nl/  where you can test with for example the file1M (which is 1 MB) if your setup is correct.  ==== SRM download ==== If you open the file ''srm.txt'' this file contains a list of srm locations which you would feed to ''srmcp''. SRM is a GRID specific protocol that is currently supported for data at the SARA and Jülich locations. It is faster, especially if you have significantly more than 1 Gbit/s bandwidth. It requires a valid [[:public:grid_certificate|GRID certificate]] and installation of the [[:public:grid_srm_software_installation|GRID srm software]]. NB There is an [[:public:srmclientinstallation|alternative installation that does not require root privileges]]. Contact Science Operations and Support if you think you might need a GRID account but it can not be provided by your own institute. An example command line would be: '' srmcp -server_mode=passive -copyjobfile=srm.txt '' to retrieve all requested files contained in srm.txt or e.g. '' srmcp -server_mode=passive srm:'' //''lofar-srm.juelich.de:8443/pnfs/fz-jeulich.de/data/lofar/ops/projects/commissioning2012/file.tar file:/data/files/file.tar ''+===== Download data =====
  
-to retrieve a single fileYou need ''–server_mode=passive'' if you are behind a firewall or on an internal network. Omitting this option may result in improved transfer speed as it will attempt to use multiple streams when retrieving a file. An alternative strategy to improve the overall transfer speed is to run multiple srmcp requests in parallel, e.g. by splitting the provided srm.txt file and feeding the partial lists to separate srmcp commands.+You can download your requested data with the files from your e-mail notification. There are different possibilities and tools to do thisIf you're unsure, which one to use, please refer to the according [[:public:lta_faq#there_are_different_ways_to_download_which_one_is_the_best|FAQ Answer]].
  
-If you do experience insufficient transfer speeds with srmcp, you may want to look into using srmcp with a [[:public:srmclientinstallation#fnal_dcache_client_tools|globus-url-copy]] copy script.+==== HTTP download ====
  
 +If you open ''html.txt''  this file contains a list of http links that you can feed to a unix commandline tool like ''wget''  or ''curl''  or even use in a browser.
 +
 +For wget you can use the following command line:
 +<code>
 +
 +wget -i html.txt
 +
 +</code>
 +
 +This will download the files in ''html.txt''  to the current directory (option '-i' reads the urls from the specified file).
 +
 +Preferrably, especially when downloading large files, you should also use option '-c'. This will continue unfinished earlier downloads instead of starting a fresh download of the whole file. (Make sure to first delete existing files that contain error messages instead of data, if you use this option):
 +
 +<code>
 +wget -ci html.txt
 +
 +</code>
 +
 +Do not set the username and password on the wget command line because this allows other users on the system to view them in the process list. Instead you should create a file ~/.wgetrc with two lines according to the following example:
 +
 +<code>
 +user=lofaruser
 +password=secret
 +
 +</code>
 +
 +Note: This is only an example, you have to edit the file and enter your own personal user name and password!
 +
 +Set access authorizations of the .wgetrc file to user only so that the credentials are not exposed to anybody else, e.g.:
 +
 +<code>
 +chmod 600 .wgetrc
 +
 +</code>
 +
 +There is no easy way to have wget rename the files as part of the command directly. It does not accept the -O flag inside a file it gets with -i. You can either rename files afterward, e.g. using the following command:
 +
 +<code>
 +find . -name "SRMFifoGet*" | awk -F %2F '{system("mv "$0" "$NF)}'
 +
 +</code>
 +
 +or add the -O option to each line in html.txt but then feed each line to wget separately like this: cat ''html.txt | xargs wget''. By default the html.txt file does not contain such options.
 +
 +The following Python script will take care of renaming and untarring the downloaded files:
 +<file>
 +
 +#M.C. Toribio
 +#toribio@astron.nl
 +#
 +#Script to untar data retrieved from the LTA by using wget
 +#It will DELETE the .tar file after extracting it.
 +#
 +#Notes:
 +#When using wget, the files are named, as an example:
 +#SRMFifoGet.py?surl=srm:%2F%2Fsrm.grid.sara.nl:8443%2Fpnfs%2Fgrid.sara.nl%2Fdata%2Flofar%2Fops%2Fprojects%2Flofarschool%2F246403%2FL246403_SAP000_SB000_uv.MS_7d4aa18f.tar
 +# This scripts will rename those files as the string after the last '%'
 +# If you want to change that behaviour, modify line
 +# outname=filename.split("%")[-1]
 +#
 +# Version:
 +# 2014/11/12: M.C. Toribio
 +
 +import os
 +import glob
 +
 +for filename in glob.glob("*SB*.tar*"):
 +  outname=filename.split("%")[-1]
 +  os.rename(filename, outname)
 +  os.system('tar -xvf '+outname)
 +  os.system('rm -r '+outname )
 +
 +  print outname+' untarred.'
 +
 +</file>
 +
 +Another Python script for renaming the downloaded (and previously untarred) files. It removes the random part of the filename before the .tar extension:
 +
 +<file>
 +import os
 +import sys
 +import glob
 +
 +# AUTHOR: J.B.R. OONK  (ASTRON/LEIDEN UNIV. 2015)
 +# - changes LTA retrieval filename to standard filename
 +# - run in the directory where LTA files are located
 +
 +# FILE DIRECTORY
 +path = "./"  #DIRECTORY
 +
 +filelist = glob.glob(path+'*.tar')
 +print 'LIST:', filelist
 +
 +#FILE STRING SEPARATORS
 +sp1d='%'
 +sp2d='2F'
 +extn='.MS'
 +extt='.tar'
 +
 +#LOOP
 +print '#####  STARTING THE LOOP  #####'
 +for infile_orig in filelist:
 +
 +  #GET FILE
 +  infiletar  = os.path.basename(infile_orig)
 +  infile     = infiletar
 +  print 'doing file: ', infile
 +
 +  spl1=infile.split(sp1d)[11]
 +  spl2=spl1.split(sp2d)[1]
 +  spl3=spl2.split(extn)[0]
 +  newname = spl3+extn+extt
 +
 +  # SPECIFY FILE MV COMMAND
 +  command='mv ' + infile + ' ' +newname
 +  print command
 +
 +  # CARRY OUT FILENAME CHANGE !!!
 +  # - COMMENT FOR TESTING OUTPUT
 +  # - UNCOMMENT TO PERFORM FILE MV COMMAND
 +  #os.system(command)
 +
 +  print 'finished rename of: ', newname
 +
 +</file>
 +
 +Note that wget does not overwrite existing files. If you use the continue option ('-c') it will append any missing parts to the existing file. If you don't use the continue option and there is a file present (e.g. from a stopped earlier download), wget creates a new file by appending a number (e.g., '.1') to the filename.
 +
 +There are some small example links if you browse to [[https://lofar-download.grid.sara.nl/|https://lofar-download.grid.sara.nl/]] where you can test with for example the file1M (which is 1 MB) if your setup is correct.
 +
 +==== SRM download ====
 +
 +If you open the file ''srm.txt''  this file contains a list of srm locations which you would feed to ''srmcp''. SRM is a GRID specific protocol that is currently supported for data at the SARA and Jülich locations. It is faster, especially if you have significantly more than 1 Gbit/s bandwidth. It requires a valid [[:public:grid_certificate|GRID certificate]] and installation of the [[:public:grid_srm_software_installation|GRID srm software]]. NB There is an [[:public:srmclientinstallation|alternative installation that does not require root privileges]]. Contact SDC Operations via __[[https://support.astron.nl/rohelpdesk|ASTRON helpdesk ]]__if you think you might need a GRID account but it can not be provided by your own institute. An example command line would be:
 +<code>
 +
 +srmcp -server_mode=passive -copyjobfile=srm.txt
 +
 +</code>
 +
 +to retrieve all requested files contained in srm.txt or e.g.
 +
 +<code>
 +srmcp -server_mode=passive srm://lofar-srm.juelich.de:8443/pnfs/fz-jeulich.de/data/lofar/ops/projects/commissioning2012/file.tar file://///data/files/file.tar
 +
 +</code>
 +
 +to retrieve a single file. You need ''–server_mode=passive''  if you are behind a firewall or on an internal network. Omitting this option may result in improved transfer speed as it will attempt to use multiple streams when retrieving a file. An alternative strategy to improve the overall transfer speed is to run multiple srmcp requests in parallel, e.g. by splitting the provided srm.txt file and feeding the partial lists to separate srmcp commands.
 +
 +If you do experience insufficient transfer speeds with srmcp, you may want to look into using srmcp with a [[:public:srmclientinstallation#fnal_dcache_client_tools|globus-url-copy]] copy script.
  
 ===== Troubleshooting ===== ===== Troubleshooting =====
  • Last modified: 2020-10-30 16:32
  • by Sander ter Veen