Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
public:lta_howto [2013-06-26 09:01] – [Retrieving data] Adriaan Renting | public:lta_howto [2025-01-17 11:00] (current) – [Change of account registration method] Hanno Holties | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Long Term Archive Howto ====== | ====== Long Term Archive Howto ====== | ||
- | This is a short manual on how to search for and retrieve data from the Long Term Archive. | + | This is a short manual on how to search for and retrieve data from the LOFAR Long Term Archive. |
- | ===== User Access ===== | + | To access the LTA, go to: [[https:// |
- | To access the LTA you need to have an account | + | For background information and in case of problems, please refer to the [[:public: |
- | - This automatically happens if you were a member | + | |
- | - Otherwise Science Support needs to add you to the project to which you need access. | + | |
- | - For public | + | |
- | If you were not originally a member of the project in MoM and Science Support adds you to it, you might get an email asking you to set a new password in [[https:// | + | ====== Release Notes ====== |
- | ===== Finding | + | ^Release^Description| |
+ | |July 2018|1) All data is now searchable, not only released data or data of projects you're a member of. Note that for downloading data (staging) the proprietary restrictions still apply. \\ 2) All projects can now be selected by all users, not only by project members. This provides the means to filter data based on project. \\ 3) Cone search algorithm implemented based on the Haversine formula for angular distance calculation. The calculated angular distance to the reference coordinates is now displayed in the search results. \\ 4) Search keys: added "time resolution", | ||
- | Once your account is set up, you can navigate to [[http:// | + | ====== User Access ====== |
- | Login into the website by clicking in ' | + | ==== Change of account registration method ==== |
- | {{:public:lta_howto0.png|}} | + | :!: We are in the process of upgrading the LOFAR applications. The services that have been offered through [[https:// |
- | Currently you can only search the LTA catalogue per project. This means you need to select a project first by clicking on the ' | + | === I forgot my password === |
- | {{:public:lta_howto1.png|}} | + | Please visit [[https:// |
- | Once you have selected your project, you can use either: | + | === Searching |
- | - The //Search// screen which allows you to search by RA/Dec, ObservationId, | + | |
- | - The //Show Latest// screen which shows you the most recently added data for this project. | + | |
- | The result of either query will be a list of data products or observations similar to this: | + | The LTA catalogue can be searched directly without needing any account. Access to all projects and search queries |
- | {{:public:lta_show_latest.png|}} | + | Staging and subsequent downloading of __**public**__ data always requires an account with LTA user access privileges. This automatically happens if you were a member of the original project proposal in Northstar/ |
- | If you have a list of observations, | + | To stage and retrieve project-related data in the LTA which are __**proprietary**__ |
- | ===== Retrieving | + | Please read the [[https:// |
+ | === Step-by-step guide to search and retrieve data === | ||
+ | |||
+ | __Basic search__: | ||
+ | |||
+ | * log in to [[https:// | ||
+ | * click SEARCH DATA in the top menu | ||
+ | * specify the data product types of interest and a target name or coordinated | ||
+ | * click on " | ||
+ | * from the screen that follows, you should be able to stage the data products | ||
+ | |||
+ | __Advanced search__: | ||
+ | |||
+ | * log in to [[https:// | ||
+ | * click SEARCH DATA in the top menu | ||
+ | * click on the side Advanced Search drop-down list | ||
+ | * specify the data product types of interest from the drop down list | ||
+ | * select products features and specify a target name or coordinated | ||
+ | * click on " | ||
+ | * from the screen that follows, you should be able to stage the data products | ||
+ | |||
+ | __Project search (to restrict all data searches to that project only)__: | ||
+ | |||
+ | * log in to [[https:// | ||
+ | * click BROWSE PROJECTS in the top menu | ||
+ | * at this level membership can be checked, with the first column showing if you are a member of the project or for finding public projects. Available options are: | ||
+ | * click on the project name to view the project details | ||
+ | * use the ' | ||
+ | * use the 'show data' button to select the project and to show all data in it | ||
+ | * from the screen that follows, you should be able to either search / select / stage the data products | ||
+ | |||
+ | |||
+ | ====== How to find data in the archive ====== | ||
+ | |||
+ | Once your account is set up, or as anonymous user you can navigate the catalogue. In the former case you can login by clicking on the top right LOGIN button shown below. | ||
+ | |||
+ | === Page navigation === | ||
+ | |||
+ | The LTA menu, as shown below, gives access to the main functionality. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | A search in the LTA catalogue can be initiated by clicking on the SEARCH DATA button on the menu. At this point a default basic search is setup, where users can select the data product type of interest and perform a cone search. An advanced search mode, with more advanced parameters per data type, can also be selected by clicking on the drop menu on the left side. | ||
+ | |||
+ | A " | ||
+ | |||
+ | * Click on the project name to view the project details and eventually select it. | ||
+ | * Use the ' | ||
+ | * Use the 'show data' button to select the project and to show all data in it. | ||
+ | |||
+ | === Finding Data === | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Depending on the search parameters, e.g., which data products were requested (observation, | ||
+ | |||
+ | - select observations/ | ||
+ | - select observations and "show pipelines" | ||
+ | - select observations/ | ||
+ | |||
+ | Note that observations often have no raw data in the archive, but the metadata is visible because subsequent pipelines have processed the raw data further. To get to the pipelines related to observations, | ||
+ | |||
+ | To see whether observations or pipelines have data products in the LTA, look for the " | ||
+ | |||
+ | Once you have a list of dataproducts on your screen, the " | ||
+ | |||
+ | There is a separate page with **[[: | ||
+ | |||
+ | ==== Unspecified Data/ | ||
+ | |||
+ | Some data has had problems somewhere in the automation and control part of the LOFAR software during observation or processing. Sometimes a few subbands might be affected, sometimes an entire observation. Science Data Centre Operations will check the data, (re)run things manually or fix things if needed and then archive the data. This does mean that the automation and control sometimes loses track of the files and the archiving process has no information beyond the Observation ID and filename itself. In such cases a few subbands or an entire observation might end up under " | ||
+ | |||
+ | If an Observation is missing, or is missing subbands, please check if it ended up under Unspecified. | ||
+ | |||
+ | ===== Staging | ||
Once you have a list of dataproducts, | Once you have a list of dataproducts, | ||
- | {{:public:lta_howto2.png|}} | + | {{:public:lta_staging_1.png?900}} |
- | When you have made your selection of files, you click on //stage//. This shows you the following message. It means that a request | + | The LOFAR Archive stores data on magnetic tape. This means that it cannot be downloaded right away, but has to be copied |
- | {{: | + | When you have made your selection of files, click on //stage//. This shows you the following message. It means that a request has been sent to the LTA staging service to start retrieving the requested files from the tape and make them available on disk. You will get a confirmation e-mail, to acknowledge that your staging request was received and the process was queued. When the files are staged, you will get a notification email informing you that your data are ready for retrieval. |
- | The e-mail that you get when the tape retrieval is complete gives you a list of files and has two attachments, | + | {{:public: |
- | {{: | + | The e-mail that you get when the staging on disk is complete gives you a list of files and has several attachments. Amongst them are two files '' |
- | There are two ways you can use this list to retrieve the files: http and srm | + | {{:public: |
- | === Please take note of the following ==== | + | There are two different ways to download your files with these attachments: |
- | | + | We also attach plain lists of the files/SURLs that were scheduled for staging (in the confirmation mail), those that were successfully staged, and (if any) those that could not be staged (in the success / partial success notifications). |
+ | |||
+ | === Please take note of the following === | ||
+ | |||
+ | | ||
- On a 1 Gbit/s connection as a general rule of thumb, you should be able to retrieve data at about 100-500 GB/hour, especially if you try to retrieve 4-8 files concurrently. If you see speeds much lower than this, you might have some kind of network problem and should in general contact your IT staff. | - On a 1 Gbit/s connection as a general rule of thumb, you should be able to retrieve data at about 100-500 GB/hour, especially if you try to retrieve 4-8 files concurrently. If you see speeds much lower than this, you might have some kind of network problem and should in general contact your IT staff. | ||
- Staging the data from tape to disk might take quite a bit of time. In the large data centres that the LTA uses, the tape drives are shared with all users and requests are queued. This is not just users of LOFAR but large data other projects like the LHC. This might mean that it takes anywhere from a few hours to a day or more to stage a copy of your data from tape to disk. | - Staging the data from tape to disk might take quite a bit of time. In the large data centres that the LTA uses, the tape drives are shared with all users and requests are queued. This is not just users of LOFAR but large data other projects like the LHC. This might mean that it takes anywhere from a few hours to a day or more to stage a copy of your data from tape to disk. | ||
- | - The amount of space available for staging data is limited although quite large. This space is however shared between all LOFAR LTA users. This includes LTA operations for buffering data from CEP to the LTA before it gets moved to tape. If many users are staging data at the same time, and/ | + | - The amount of space available for staging data is limited although quite large. This space is however shared between all LOFAR LTA users. This includes LTA operations for buffering data from CEP to the LTA before it gets moved to tape. If many users are staging data at the same time, and/ |
- | - We strive to keep a copy of data that was staged on disk for 1-2 weeks so you have some time to download it. After that it might get removed to make space for more recent requests. The the copy of the data on tape is only read and will still be available if you need to access the data again at a later stage but you might need to stage a copy to disk again. | + | - We strive to keep a copy of data that was staged on disk for 1-2 weeks so you have some time to download it. After that it might get removed to make space for more recent requests. The copy of the data on tape is only read and will still be available if you need to access the data again at a later stage but you might need to stage a copy to disk again. |
- | - We are continuously trying to improve the reliability and speed of the available services. Please contact | + | - We are continuously trying to improve the reliability and speed of the available services. Please contact |
- | - The data centres the LTA uses also have maintenance or small outages sometimes. | + | - The data centres the LTA uses also have maintenance or small outages sometimes. |
+ | |||
+ | ==== Staging Transient Buffer Board (TBB) data ==== | ||
+ | |||
+ | TBB data needs to be staged by hand. Please send a request at [[https:// | ||
+ | < | ||
+ | |||
+ | wget --no-check-certificate https:// | ||
+ | |||
+ | The filename should start or be prepended with srm:// | ||
+ | |||
+ | </ | ||
+ | |||
+ | You will need a valid LTA account to access this data. If the filename is very short, you can view (e.g. cat) it to view errors that have occured. | ||
+ | |||
+ | ===== Download data ===== | ||
+ | |||
+ | You can download your requested data with the files from your e-mail notification. There are different possibilities and tools to do this. If you're unsure, which one to use, please refer to the according [[: | ||
==== HTTP download ==== | ==== HTTP download ==== | ||
- | If you open '' | + | If you open '' |
For wget you can use the following command line: | For wget you can use the following command line: | ||
- | | + | < |
- | This will download the files in '' | + | |
- | user=lofaruser | + | wget -i html.txt |
- | password=secret | + | |
+ | </ | ||
+ | |||
+ | This will download the files in '' | ||
+ | |||
+ | Preferrably, | ||
+ | |||
+ | < | ||
+ | wget -ci html.txt | ||
+ | |||
+ | </ | ||
+ | |||
+ | Do not set the username and password on the wget command line because this allows other users on the system to view them in the process list. Instead you should create a file ~/.wgetrc with two lines according to the following example: | ||
+ | |||
+ | < | ||
+ | user=lofaruser | ||
+ | password=secret | ||
+ | |||
+ | </ | ||
+ | |||
+ | Note: This is only an example, you have to edit the file and enter your own personal user name and password! | ||
Set access authorizations of the .wgetrc file to user only so that the credentials are not exposed to anybody else, e.g.: | Set access authorizations of the .wgetrc file to user only so that the credentials are not exposed to anybody else, e.g.: | ||
- | | + | |
- | There is no easy way to have wget rename the files as part of the command directly. It does not accept the -O flag inside a file it gets with -i. You can either rename files afterward, or add the -O option to each line in html.txt but then feed each line to wget separately like this: cat '' | + | < |
+ | chmod 600 .wgetrc | ||
+ | |||
+ | </ | ||
+ | |||
+ | There is no easy way to have wget rename the files as part of the command directly. It does not accept the -O flag inside a file it gets with -i. You can either rename files afterward, | ||
+ | |||
+ | < | ||
+ | find . -name " | ||
+ | |||
+ | </ | ||
+ | |||
+ | or add the -O option to each line in html.txt but then feed each line to wget separately like this: cat '' | ||
+ | |||
+ | The following Python script will take care of renaming and untarring the downloaded files: | ||
+ | < | ||
+ | |||
+ | #M.C. Toribio | ||
+ | # | ||
+ | # | ||
+ | #Script to untar data retrieved from the LTA by using wget | ||
+ | #It will DELETE the .tar file after extracting it. | ||
+ | # | ||
+ | #Notes: | ||
+ | #When using wget, the files are named, as an example: | ||
+ | # | ||
+ | # This scripts will rename those files as the string after the last ' | ||
+ | # If you want to change that behaviour, modify line | ||
+ | # outname=filename.split(" | ||
+ | # | ||
+ | # Version: | ||
+ | # 2014/11/12: M.C. Toribio | ||
+ | |||
+ | import os | ||
+ | import glob | ||
+ | |||
+ | for filename in glob.glob(" | ||
+ | outname=filename.split(" | ||
+ | os.rename(filename, | ||
+ | os.system(' | ||
+ | os.system(' | ||
+ | |||
+ | print outname+' | ||
+ | |||
+ | </ | ||
+ | |||
+ | Another Python script for renaming the downloaded (and previously untarred) files. It removes the random part of the filename before the .tar extension: | ||
+ | |||
+ | < | ||
+ | import os | ||
+ | import sys | ||
+ | import glob | ||
+ | |||
+ | # AUTHOR: J.B.R. OONK (ASTRON/ | ||
+ | # - changes LTA retrieval filename to standard filename | ||
+ | # - run in the directory where LTA files are located | ||
+ | |||
+ | # FILE DIRECTORY | ||
+ | path = " | ||
+ | |||
+ | filelist = glob.glob(path+' | ||
+ | print ' | ||
+ | |||
+ | #FILE STRING SEPARATORS | ||
+ | sp1d=' | ||
+ | sp2d=' | ||
+ | extn=' | ||
+ | extt=' | ||
+ | |||
+ | #LOOP | ||
+ | print '##### | ||
+ | for infile_orig in filelist: | ||
+ | |||
+ | #GET FILE | ||
+ | infiletar | ||
+ | infile | ||
+ | print 'doing file: ', infile | ||
+ | |||
+ | spl1=infile.split(sp1d)[11] | ||
+ | spl2=spl1.split(sp2d)[1] | ||
+ | spl3=spl2.split(extn)[0] | ||
+ | newname = spl3+extn+extt | ||
+ | |||
+ | # SPECIFY FILE MV COMMAND | ||
+ | command=' | ||
+ | print command | ||
+ | |||
+ | # CARRY OUT FILENAME CHANGE !!! | ||
+ | # - COMMENT FOR TESTING OUTPUT | ||
+ | # - UNCOMMENT TO PERFORM FILE MV COMMAND | ||
+ | # | ||
+ | |||
+ | print ' | ||
+ | |||
+ | </ | ||
+ | |||
+ | Note that wget does not overwrite existing files. If you use the continue option (' | ||
+ | |||
+ | There are some small example links if you browse to [[https:// | ||
==== SRM download ==== | ==== SRM download ==== | ||
- | If you open the file '' | + | If you open the file '' |
- | An example command line would be: | + | < |
- | srmcp -server_mode=passive -copyjobfile=srm.txt | + | |
+ | srmcp -server_mode=passive -copyjobfile=srm.txt | ||
+ | |||
+ | </ | ||
to retrieve all requested files contained in srm.txt or e.g. | to retrieve all requested files contained in srm.txt or e.g. | ||
- | | + | |
- | to retrieve a single file. You need '' | + | < |
+ | srmcp -server_mode=passive srm:// | ||
+ | |||
+ | </ | ||
+ | |||
+ | to retrieve a single file. You need '' | ||
+ | |||
+ | If you do experience insufficient transfer speeds with srmcp, you may want to look into using srmcp with a [[: | ||
+ | |||
+ | ===== Troubleshooting ===== | ||
+ | |||
+ | * There is a [[: | ||