ASTRON is developing a science data centre. It will offer optimal access to the huge amounts of radio-astronomical data that are produced by radio-telescopes such as the Low Frequency Array (LOFAR), Westerbork, and the Square Kilometer Array (SKA). This data center provides a portal to data archives as well as to international high-performance computer systems to enable performing the data analysis that will result in groundbreaking science. In close collaboration with research institutes and data centers that are connected to the European Open Science Cloud (EOSC), ASTRON is unlocking the capabilities of existing and future research cloud infrastructure for its community, supporting the generation and sharing of high-quality scientific results in an accessible and open manner.
Open and FAIR1 sharing of research data is essential to achieve the highest quality scientific results by enabling collaborations and peer-reviewing, as well as to maximise scientific output from research by lowering access to data and to data-analysis capabilities. These principles have recently seen a steep increase in attention for research in general. In the domain of astronomy, they have been appreciated and applied since a long time due to the need to investigate astronomical phenomena by combining data from widely varying astronomical instruments around the world and covering long periods of time. This has resulted in a culture of developing and applying open standards and openly sharing of data and software repositories. As such, radio astronomy now provides an example to guide the development of open data services in other research domains.
A distinguishing characteristic of radio astronomical research is the scale at which data is generated and processed. Developments in network and computational technology are opening up new opportunities for international and cross-domain exchange of data and expertise. At the same time, the technological developments also lead to an exponential increase in the data volumes produced by radio astronomical instruments and to the emergence of a generation of internationally distributed radio astronomical instruments such as the International Low Frequency Array (LOFAR) Telescope, and the Square Kilometer Array (SKA). This poses significant challenges that ASTRON addresses through international partnerships with both public and private organisations. Since the start of LOFAR operations, ASTRON has worked with ICT infrastructure partners such as SURFsara in the Netherlands, the Forschungszentrum Jülich in Germany, and the Poznan Supercomputing and Networking Center in Poland, to develop a distributed data archive that is astronomical both in content and scale. A science datacenter for the SKA will push the radio astronomical data archives further into the exabyte scale2.
ASTRON is participating in the European Open Science Cloud (EOSC) to apply the best practices developed within astronomy as well as to utilise services and infrastructure developed by other research organisations. The objective is to create a European-scale research datacenter that builds on the vast expertise and capabilities of the joint research community and supports cross-domain data and service sharing.
The European Open Science Cloud (EOSC) provides access to a scalable computational infrastructure, supporting data storage and processing services for use by the international research community. ASTRON is participating in EOSC to develop and integrate services enabling the generation and utilisation of scientific data products. The components that form the building blocks for a European scale science datacenter include data archive access services, scientific processing workflow services, and research data repositories. A user portal is developed to provide low-threshold access to the underlying capabilities, by offering e.g. data mining capabilities, Virtual Observatory functionality, and processing pipelines running on integrated compute clusters.
Given the scale of data from instruments like LOFAR and SKA, and the complexity of the processing required to generate science ready data products, many researchers will benefit from getting access to high performing data infrastructure and high throughput processing capabilities without requiring them to organise resources or setting up complex software installations. To enable easy and portable deployment of software, ASTRON is developing container-based. Software installations as well as application images for data analysis pipelines are distributed across connected infrastructure. For the most data-intensive workflows, a user workspace is being realised for temporarily storage of data, e.g. to allow computational resources to become available for next steps in the processing workflow or to assess quality before ingesting data into an archive or a science data repository.
Services are made accessible either openly or through a so-called federated authentication and authorization infrastructure. The latter will allow users to gain access to any service by using home institute login accounts or, if the latter is not an option, through social or special purpose ‘Single Sign On’ accounts. The objective is to minimise the number of user-created accounts and to provide a single mechanism for authentication across internationally distributed services.
A workflow management system in conjunction with a standardised pipeline definition language such as the Common Workflow Language (CWL) will be utilised to develop standard data analysis pipelines that are made available for local deployment, or for data processing on connected compute clusters through the user portal.
As part of our commitment to community provided services and software, ASTRON is hosting the KERN repository of astronomical software packages. Furthermore, ASTRON supports open and FAIR1 data access and is working to integrate with open data repositories and to provide data registration services resulting in the wider use of referenceable and traceable persistent identifiers. These efforts will help researchers to utilise sustainable and reliable storage for sharing data and to receive proper acknowledgement for their scientific work.