Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
public:lta_tricks [2019-06-06 11:23]
Joern Kuensemoeller [Python Module for Staging]
public:lta_tricks [2019-09-20 07:25] (current)
Thomas Jürges Updated link to LOFAR stager API Python module to point to the ASTRON gitlab repo
Line 6: Line 6:
  
   * You can use colons in numeric queries, to select ranges. This will for example give all observations and pipelines that have a SAS/​Observation ID in the range from 432000 to 432190:   * You can use colons in numeric queries, to select ranges. This will for example give all observations and pipelines that have a SAS/​Observation ID in the range from 432000 to 432190:
-{{ :​public:​lta_range_selection.png |}} 
  
-In textual entries, wildcards can be used. +{{  :​public:​lta_range_selection.png ​ }} 
-{{ :​public:​lta_wildcard_selection.png ​|}}+ 
 +In textual entries, wildcards can be used. {{  :​public:​lta_wildcard_selection.png ​ }}
  
   * You can put a list of SAS/​Observation IDs in the query:   * You can put a list of SAS/​Observation IDs in the query:
  
-{{ :​public:​lta_list_query.png ​|}}+{{  :​public:​lta_list_query.png ​ }}
  
 ===== Viewing data ===== ===== Viewing data =====
Line 19: Line 19:
 When you are looking at the results of a query you might see something like this: When you are looking at the results of a query you might see something like this:
  
-{{ :​public:​lta_observation_with_no_archived_data.png ​|}}+{{  :​public:​lta_observation_with_no_archived_data.png ​ }}
  
 This means that the observation is known in the LTA, it knows what data was produced, the produced data was not archived, but further processing happened on the raw data and the results of some of those pipelines were archived. If you click on the zero, you will see something like this: This means that the observation is known in the LTA, it knows what data was produced, the produced data was not archived, but further processing happened on the raw data and the results of some of those pipelines were archived. If you click on the zero, you will see something like this:
  
-{{ :​public:​lta_dataproduct_is_link.png ​|}}+{{  :​public:​lta_dataproduct_is_link.png ​ }}
  
 This allows you to navigate from a pipeline back to the original observation,​ or from the observation to any pipelines that have run on the raw data. This allows you to navigate from a pipeline back to the original observation,​ or from the observation to any pipelines that have run on the raw data.
Line 31: Line 31:
   * You can retrieve data on the Observation and Pipeline level, you don't have to select all files individually.   * You can retrieve data on the Observation and Pipeline level, you don't have to select all files individually.
  
-{{ :​public:​lta_observation_selection.png ​|}} +{{  :​public:​lta_observation_selection.png ​ }}
  
   * If you have a query with more than 1000 results, you can open the multiple pages each in a separate tab/window.   * If you have a query with more than 1000 results, you can open the multiple pages each in a separate tab/window.
  
-{{ :​public:​lta_page_selection.png ​|}} +{{  :​public:​lta_page_selection.png ​ }}
  
   * With the small triangle next to a list, you can fold or unfold the list to get a better overview.   * With the small triangle next to a list, you can fold or unfold the list to get a better overview.
 +
 == Folded entries == == Folded entries ==
-{{ :​public:​lta_folded.png |}} 
  
 +{{  :​public:​lta_folded.png ​ }}
  
 == Unfolded entries == == Unfolded entries ==
-{{ :​public:​lta_unfolded.png ​|}}+ 
 +{{  :​public:​lta_unfolded.png ​ }}
  
 ===== DBView ===== ===== DBView =====
  
-There is a server that gives the option to run your own queries on the database https://​lta-dbview.lofar.eu/​+There is a server that gives the option to run your own queries on the database ​[[https://​lta-dbview.lofar.eu/​|https://​lta-dbview.lofar.eu/​]]
  
-A useful query might be this one, that gives you all files for a certain Obs Id (SAS VIC tree ID). +A useful query might be this one, that gives you all files for a certain Obs Id (SAS VIC tree ID).<​code>​ 
-  SELECT fo.URI, dp."​dataProductType",​ dp."​dataProductIdentifier",​ +SELECT fo.URI, dp."​dataProductType",​ dp."​dataProductIdentifier",​ 
-   ​dp."​processIdentifier"​ + ​dp."​processIdentifier"​ 
-  FROM AWOPER."​DataProduct+"​ dp, +FROM AWOPER."​DataProduct+"​ dp, 
-       ​AWOPER.FileObject fo, +     ​AWOPER.FileObject fo, 
-       ​AWOPER."​Process+"​ pr +     ​AWOPER."​Process+"​ pr 
-  WHERE dp."​processIdentifier"​ = pr."​processIdentifier"​ +WHERE dp."​processIdentifier"​ = pr."​processIdentifier"​ 
-    AND pr."​observationId"​ = '​123456'​ +  AND pr."​observationId"​ = '​123456'​ 
-    AND fo.data_object = dp."​object_id"​ +  AND fo.data_object = dp."​object_id"​ 
-    AND dp."​isValid"​ > 0+  AND dp."​isValid">​ 0 
 +</​code>​
  
-In this '​123456'​ should be replaced with the Obs Id of an Observation/​Pipeline you're looking for. Pipelines also have an "​observationId"​ == the SAS Id, even though that's a but confusing. +In this '​123456'​ should be replaced with the Obs Id of an Observation/​Pipeline you're looking for. Pipelines also have an "​observationId"​ == the SAS Id, even though that's a but confusing. To be able to run this query, you have to go to the link above, login as the right user, select the right project, and then put this query into the "​Manual SQL".
-To be able to run this query, you have to go to the link above, login as the right user, select the right project, and then put this query into the "​Manual SQL".+
  
-**Example** +**Example** ​ You can also modify these queries. for example if you want to also know the MD5 checksum, you can run: 
-You can also modify these queries. for example if you want to also know the MD5 checksum, you can run: + 
-  SELECT fo.URI, fo.hash_md5,​ dp."​dataProductType",​ dp."​dataProductIdentifier",​ +<​code>​ 
-   ​dp."​processIdentifier"​ +SELECT fo.URI, fo.hash_md5,​ dp."​dataProductType",​ dp."​dataProductIdentifier",​ 
-  FROM AWOPER."​DataProduct+"​ dp, + ​dp."​processIdentifier"​ 
-       ​AWOPER.FileObject fo, +FROM AWOPER."​DataProduct+"​ dp, 
-       ​AWOPER."​Process+"​ pr +     ​AWOPER.FileObject fo, 
-  WHERE dp."​processIdentifier"​ = pr."​processIdentifier"​ +     ​AWOPER."​Process+"​ pr 
-    AND pr."​observationId"​ = '​123456'​ +WHERE dp."​processIdentifier"​ = pr."​processIdentifier"​ 
-    AND fo.data_object = dp."​object_id"​ +  AND pr."​observationId"​ = '​123456'​ 
-    AND dp."​isValid"​ > 0+  AND fo.data_object = dp."​object_id"​ 
 +  AND dp."​isValid">​ 0 
 +</​code>​
  
 ===== AstroWise Python Interface ===== ===== AstroWise Python Interface =====
  
-There is a Python client library for accessing the LTA. With this library, you can script your own queries. The installation description can be found here: [[lta:​client_installation|LTA Client installation]]. Be sure to have the latest version installed. Note that since January 2018 this library uses python3, python2 is no longer supported.+There is a Python client library for accessing the LTA. With this library, you can script your own queries. The installation description can be found here: [[:lta:​client_installation|LTA Client installation]]. Be sure to have the latest version installed. Note that since January 2018 this library uses python3, python2 is no longer supported.
  
 Once you have installed the client, set up your user name and password. These are the same as for MoM. Remember that this is just a different interface to the LTA catalogue: you will need the same credentials as for the web interface. Once you have installed the client, set up your user name and password. These are the same as for MoM. Remember that this is just a different interface to the LTA catalogue: you will need the same credentials as for the web interface.
  
-After installing the LTA client, the file .awe/​Environment.cfg will appear in your home directory (if not, then create one). Make sure the file at least contains the following lines: +After installing the LTA client, the file .awe/​Environment.cfg will appear in your home directory (if not, then create one). Make sure the file at least contains the following lines:<​file>​
- +
-<​file>​+
 [global] [global]
 database_user ​      : <your username>​ database_user ​      : <your username>​
Line 91: Line 91:
 </​file>​ </​file>​
  
-The following script can be used to test your installation: ​+The following script can be used to test your installation:​
  
 <code python> <code python>
Line 138: Line 138:
 You may need to kill the script, because it will print out all the observations in a certain patch of the sky archived in the LTA. You may need to kill the script, because it will print out all the observations in a certain patch of the sky archived in the LTA.
  
-In case of errors, there may be the need to open some port on the firewall at your institution. Specifically,​ port 1521 should be open. Also make sure that the LTA client library can be found in your PYTHONPATH (see [[lta:​client_installation|LTA Client installation]] for more details). In case of trouble, get in contact with Science Operations and Support.+In case of errors, there may be the need to open some port on the firewall at your institution. Specifically,​ port 1521 should be open. Also make sure that the LTA client library can be found in your PYTHONPATH (see [[:lta:​client_installation|LTA Client installation]] for more details). In case of trouble, get in contact with Science Operations and Support.
  
 == Examples == == Examples ==
  
-Once you have tested that your connection to the catalogue is working, you are ready to browse the archive and stage the data you need. +Once you have tested that your connection to the catalogue is working, you are ready to browse the archive and stage the data you need. Here we will list a few examples of python scripts that can be used to access the LTA. All of them will need to import some modules:
-Here we will list a few examples of python scripts that can be used to access the LTA. All of them will need to import some modules:+
  
 <code python> <code python>
Line 199: Line 198:
 </​code>​ </​code>​
  
-The following script will find subbands 301 and 302 for all targets within two different projects. ​+The following script will find subbands 301 and 302 for all targets within two different projects.
  
-Pay attention to the difference between the keys subband and stationSubband;​ the former is a sequential number assigned to each subband in an observation,​ while the latter is linked to the frequency at which the observation was performed. Example: an observation was set up covering the range 30-77.3 MHz with two simultaneous beams using 244 subbands each. In this case, subband will range from 0 to 487, while stationSubband from 153 to 396. The stationSubband information is stored in the observation,​ but not in the pipeline products (which instead contain the frequency). If you want to search on stationSubband,​ you must perform your search on observations first, then fetch the pipelines linked to those observations. If you use frequency, you can search directly on pipelines. ​+Pay attention to the difference between the keys subband and stationSubband;​ the former is a sequential number assigned to each subband in an observation,​ while the latter is linked to the frequency at which the observation was performed. Example: an observation was set up covering the range 30-77.3 MHz with two simultaneous beams using 244 subbands each. In this case, subband will range from 0 to 487, while stationSubband from 153 to 396. The stationSubband information is stored in the observation,​ but not in the pipeline products (which instead contain the frequency). If you want to search on stationSubband,​ you must perform your search on observations first, then fetch the pipelines linked to those observations. If you use frequency, you can search directly on pipelines.
  
 As a general advise, before performing a search, you need to **understand thoroughly the meaning of the keywords that you are using and where their values are stored**, otherwise you may not find the data you are looking for. As a general advise, before performing a search, you need to **understand thoroughly the meaning of the keywords that you are using and where their values are stored**, otherwise you may not find the data you are looking for.
Line 214: Line 213:
 # Query for private data of the project, you must be member of the project # Query for private data of the project, you must be member of the project
 private_data = False private_data = False
- +
 # All URIS to stage # All URIS to stage
 uris = { uris = {
Line 220: Line 219:
     project2: set(),     project2: set(),
 } }
- +
 for project in (project1, project2) : for project in (project1, project2) :
     print("​Using project %s" % project)     print("​Using project %s" % project)
Line 244: Line 243:
             else :             else :
                 print("​No URI found for %s with dataProductIdentifier %d" % (dataproduct.__class__.__name__,​ dataproduct.dataProductIdentifier))                 print("​No URI found for %s with dataProductIdentifier %d" % (dataproduct.__class__.__name__,​ dataproduct.dataProductIdentifier))
- +
 for project in (project1, project2) : for project in (project1, project2) :
     print("​Total URI's found for project %s: %d" % (project, len(uris[project])))     print("​Total URI's found for project %s: %d" % (project, len(uris[project])))
- +
 stager = LtaStager() stager = LtaStager()
 if do_stage : if do_stage :
Line 262: Line 261:
 freq2 = 178.0 freq2 = 178.0
 day1 = datetime(2014,​8,​26) # this could include time; ie hours, minutes, secondes day1 = datetime(2014,​8,​26) # this could include time; ie hours, minutes, secondes
-day2 = datetime(2014,​8,​29) # idem +day2 = datetime(2014,​8,​29) # idem
 # DataProduct class to query; CorrelatedDataProduct,​ SkyImageDataProduct,​ etc ... # DataProduct class to query; CorrelatedDataProduct,​ SkyImageDataProduct,​ etc ...
 cls = CorrelatedDataProduct cls = CorrelatedDataProduct
 # Query for private data of the project, you must be member of the project # Query for private data of the project, you must be member of the project
 private_data = False private_data = False
- +
 # To see private data of this project, you must be member of this project # To see private data of this project, you must be member of this project
 if private_data : if private_data :
Line 273: Line 272:
     if project != context.get_current_project().name:​     if project != context.get_current_project().name:​
         raise Exception("​You are not member of project %s" % project)         raise Exception("​You are not member of project %s" % project)
- +
 query_observations = ( query_observations = (
     (Observation.startTime >= day1) &     (Observation.startTime >= day1) &
Line 294: Line 293:
         else :         else :
             print("​No URI found for %s with dataProductIdentifier %d" % (dataproduct.__class__.__name__,​ dataproduct.dataProductIdentifier))             print("​No URI found for %s with dataProductIdentifier %d" % (dataproduct.__class__.__name__,​ dataproduct.dataProductIdentifier))
- +
 print("​Total URI's found %d" % len(uris)) print("​Total URI's found %d" % len(uris))
- +
 if do_stage : if do_stage :
     stager = LtaStager()     stager = LtaStager()
     stager.stage_uris(uris)     stager.stage_uris(uris)
-  ​ 
  
 </​code>​ </​code>​
Line 318: Line 316:
 ===== Python Module for Staging ===== ===== Python Module for Staging =====
  
 +The python interaction with the LTA catalog can be complemented with the use of a specific module developed to give users more control over their staging requests. ​ The module is made available **[[https://​git.astron.nl/​ro/​lofar_stager_api/​-/​tags|here]]** and its functions are mostly self-explanatory.
  
-The python interaction with the LTA catalog can be complemented with the use of a specific module developed to give users more control over their staging requests.\\ The module is made available **{{:​public:​user_software:​api_python_module_1-4.tar.gz|here}}** and its functions are mostly self-explanatory. +**Alternatively to the .awe/​Environment.cfg described above, user credentials can also be provided via a file ~/​.stagingrc with credentials of your Lofar account**, similar to ./wgetrc:<​code>​ 
- +  user=XXX 
-**Alternatively to the .awe/​Environment.cfg described above, user credentials can also be provided via a file ~/​.stagingrc with credentials of your Lofar account**, similar to ./wgetrc:+  password=YYY 
 +</​code>​
  
-    user=XXX 
-    password=YYY 
-    ​ 
 For a description of what the user can do, we list here the functions that are available. For a description of what the user can do, we list here the functions that are available.
  
-**stage(surls)** \\ +**stage(surls)** \\ It takes in a list of surls, queues a staging request for those urls, and outputs the ID of the request.
-It takes in a list of surls, queues a staging request for those urls, and outputs the ID of the request.+
  
-**get_status(stageid)** ​  ​\\ +**get_status(stageid)** \\ It tells the user if a request is queued, in progress or finished (success). Possible statuses: "​new",​ "​scheduled",​ "in progress",​ "​aborted",​ "​failed",​ "​partial success",​ "​success",​ "on hold"
-It tells the user if a request is queued, in progress or finished (success). +
-Possible statuses: "​new",​ "​scheduled",​ "in progress",​ "​aborted",​ "​failed",​ "​partial success",​ "​success",​ "on hold"+
  
-**abort(stageid)** \\ +**abort(stageid)** \\ It allows users to end a staging request.
-It allows users to end a staging request.+
  
-**get_surls_online(stageid)** \\ +**get_surls_online(stageid)** \\ It gives a list of the surls that have been staged for the relative request. The list is updated whenever a new surl comes on line.
-It gives a list of the surls that have been staged for the relative request. The list is updated whenever a new surl comes on line.+
  
-**get_srm_token(stageid)** ​ \\ +**get_srm_token(stageid)** \\ The srm token is useful to interact directly with the SRM site through GRID/SRM tools. 
-The srm token is useful to interact directly with the SRM site through GRID/SRM tools. + 
-   +**reschedule(stageid)** \\ If a request failed, it can be rescheduled. 
-**reschedule(stageid)**\\ + 
-If a request failed, it can be rescheduled. +**get_progress()** \\ No input needed. It returns the statuses of all the requests owned by the user.
-     +
-**get_progress()**\\ +
-No input needed. It returns the statuses of all the requests owned by the user.+
  
 Below is an example of how to use this: Below is an example of how to use this:
-  > python 
-  Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
-  [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin 
-  Type "​help",​ "​copyright",​ "​credits"​ or "​license"​ for more information. 
-  >>>​ import stager_access as sa 
-  2016-11-24 16:​39:​55.865000 stager_access:​ Parsing user credentials from /​Users/​renting/​.stagingrc 
-  2016-11-24 16:​39:​55.865111 stager_access:​ Creating proxy 
-  >>>​ sa.prettyprint(sa.get_progress()) 
-  + 12227 
-    - File count -> 100 
-    - Files done -> 40 
-    - Flagged abort -> false 
-    - Location -> fz-juelich 
-    - Percent done -> 40 
-    - Status -> on hold 
-    - User id -> 1919 
  
 +<​code>>​ python
 +Python 2.7.10 (default, Oct 23 2015, 19:19:21)
 +[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
 +Type "​help",​ "​copyright",​ "​credits"​ or "​license"​ for more information.>>>​ import stager_access as sa
 +
 +2016-11-24 16:​39:​55.865000 stager_access:​ Parsing user credentials from /​Users/​renting/​.stagingrc
 +2016-11-24 16:​39:​55.865111 stager_access:​ Creating proxy>>>​ sa.prettyprint(sa.get_progress())
 +
 ++ 12227
 +  - File count      ->     100
 +  - Files done      ->     40
 +  - Flagged abort      ->     false
 +  - Location ​     ->     ​fz-juelich
 +  - Percent done      ->     40
 +  - Status ​     ->     on hold
 +  - User id      ->     1919
 +</​code>​
  
  
  • Last modified: 2019-06-06 11:23
  • by Joern Kuensemoeller