Daily Image

Click here or on the picture for a full size image.

ESCAPE data lake dress rehearsal.

Submitter: Yan Grange
Description: The ESCAPE project brings together several partners from astronomy and particle physics. The DIOS (Data Infrastructure for Open Science) work package focuses on exploring frameworks for efficient management (including access and retrieval) of large scale distributed data (order hundreds of petabytes). In this project, we investigate whether and how the data management framework/architecture (generally referred to as the “datalake”) currently being used by the experiments like ATLAS and CMS could also be used for a broader set of data, including radio astronomy (e.g. for SKA). In one of the approaches we examine what adaptations would be needed for the datalake to be suitable by using data from the International LOFAR telescope in a realistic scenario.

On Tuesday the 17th of November 2020, the ”First Dress Rehearsal” took place. During this day, each of the participating experiments (as listed in the above image) performed one or more tests which would realistically mimic data ingestion, retrieval, processing and management. For LOFAR the tests consisted of two independent stages:

  • The first one mimics ingesting a full-day observation with LOFAR to the datalake (20TB).

  • The second stage consisted of data retrieval (200GB) of some previously ingested data, process (flag, calibrate, image etc.) and ingesting back the results in an automated manner.

Both the experiments were a success. The resulting astronomical image (Stage2, as shown above) of the 3C196 field at 135MHz was made by processing 2 hours of LOFAR observations on a virtual machine running on the opennebula cloud at surfsara.

The experiments and results taught us valuable lessons including practical issues while performing actions involving Big Data on the datalake. These lessons will be used to refine and expand our use case, carry out optimizations and provide important feedback to datalake storage partners. In 2021, the aim is to move from the current “demonstrator datalake” towards a fully-functional “datalake prototype” for which inputs from all the experiments is of invaluable importance. We keenly look forward to playing an active role towards efficient and successful realization and further working with the datalake prototype.
Copyright: CC-BY
  Follow us on Twitter
Please feel free to submit an image using the Submit page.