A consortium led by ASTRON is developing important software components for processing the vast amounts of data that the soon-to-be largest radio telescope in the world will produce.
Published by the editorial team, 30 March 2022
The SKA Observatory (SKAO) is going to be built in phases. Right now, SKAO is working on the construction of 197 dishes in South Africa, measuring radio waves in the frequency range from 350 MHz to 15.3 GHz (SKA-mid), and around 131,000 antennae in Australia, measuring in the frequency range from 50 MHz to 350 MHz (SKA-low). Its completion is expected for 2027. However, long before the completion date, SKAO will start observing the skies, producing Petabytes of data. After collection, these data must be filtered and processed with sophisticated software and algorithms. For this, SKAO has signed contracts with, among other countries, the Netherlands, to develop software that not only will be able to manage these vast amounts of data, but that is also relatively easy for astronomers to use.
“ASTRON leads a consortium, which also consists of several companies: CGI Space, S[&]T, and TriOpSys,” says Walter Jansen, who is responsible for the ASTRON team which is currently working on these software and algorithms. This team is called ‘Schaap’ (the Dutch word for ‘sheep’). Another team, called Rapthor, is also involved in the development of this software and algorithms, albeit not directy.
What do teams Schaap and Rhaptor do? Team Schaap is developing calibration and imaging software. Team leader Stefan Wijnholds explains: “Our team is busy developing the software that handles the data processing from SKA. The raw observation data are fed into the software, and science-ready data come out.” He makes it sound easy, but when you are working with the tremendous amounts of data that SKA produces, which also consists of artifacts, errors, noise, and other data that you cannot use (right away), you need to develop some really sophisticated software. That software must also work very efficiently, to prevent it from taking up too much computing power.
However, since SKA is still being developed, it does not produce any real data to work with yet. So how do you develop software to handle data that do not exist yet? That is where the second team, team Rapthor comes in. This team, led by André Offringa, does not directly work for SKAO, but it does play a pivotal role in the work that team Schaap does. Where team Schaap develops the software components, team Rapthor connects these components and tests them with real data coming from the LOFAR telescope. Offringa: “SKA-low will produce data that are very comparable to the data that LOFAR produces. So, we feed real LOFAR data into the software that team Schaap has developed and look at what comes out.” Real data often turn out to be somewhat different from simulated data, exposing any weaknesses and glitches that the software might have. By finding these weaknesses early on, team Rapthor saves team Schaap – and SKAO – a lot of time in getting the pipeline to work.
So, what is this pipeline? Offringa: “A pipeline is a chain of algorithms and steps – a sort of recipe – to go from your raw ingredients, the raw data, to a dish: the science-ready data.”
His Schaap-colleague Wijnholds compares the components that his team develops with LEGO: “The components that we build are like building blocks. Astronomers can assemble these blocks into a construction to their liking: they can use our components to easily develop the software that they need for their data analysis.” This is much more efficient than the way in which this is done most of the time, where astronomers need to start from scratch in developing software that converts the raw data into the data that they need for their science. Offringa: “An astronomer can easily spend four years developing software that has already been developed for 80% by someone else, who needed slightly different data.”
Learning from each other
Collaborating with companies is beneficial to and necessary for ASTRON, but it works both ways. For example, the teams work with the SAFe (Scaled Agile Framework) method. Wijnholds: “This is a developing method with which software companies often work. In turn, the companies can depend on ASTRON to know how the scientific data needs to be treated.” Because at the end of the line, the data that need to be handled, have to hold up to scientific standards, not commercial ones.
Every two weeks, team Schaap has a meeting and talks about what each team member wants to achieve in the upcoming two weeks, and how each of them did on the goals which were set in the two weeks before. The team is guided by a scrum master from CGI Space. Wijnholds: “It is a very different way of working from what we scientists are used to, but it is also highly effective in this project. We are learning a lot from it. For example, you get better and better in estimating your own development velocity.”
The SKAO contract aims at developing a properly working pipeline before the telescope goes into service and is more product oriented than the work at ASTRON normally is; companies like TriOpSys, S[&]T and CGI Space are more used to working with these kinds of schedules. In turn, these companies learn a lot from the ASTRON employees, who are far more used to working with scientific data. The companies learn from ASTRON how these data should be handled, which they can apply to any future contracts that they get.
Science meets business
Rob van den Bergh, from CGI Space, has been team Schaap's scrum master for over a year. “The most important part of my job was to steer the team in the right direction; to make sure everyone is headed the same way at the same speed.” One of the things that stood out to him, is that scientists tend to really dive into the research, and they might need a nudge every now and then to move on.
CGI Space software engineer Chiara Salvoni agrees: ‘Usually, software developers from the academic world focus more on achieving the perfect result from a scientific point of view, whereas software engineers are also focused on getting the software ready for the public – in this case astronomers – on time and in an easy-to-maintain state.”
Salvoni is very enthusiastic about the collaboration with ASTRON. “We’re developing algorithms which are very specific to radio astronomy. It’s much more than just building a platform; it’s building the algorithms.”
Many of these algorithms, and with it large parts of the software, have been developed earlier, bit by bit, Maik Nijhuis explains. Nijhuis is a software developer working at TriOpSys. “The amounts of data that the software is going to work with are huge, but eventually many things are very comparable to how LOFAR handles its data.” Those algorithms have been largely developed by scientists, Nijhuis says. “They tend to think from their own research perspective and add algorithms to the software which they subsequently bypass/disable if it doesn't work. That means there’s a lot of redundant code, which we are removing from the software.” The goal is to develop a clean professionalized piece of software, which can be easily maintained. In that regard, SKAO also behaves more like a company than like a research institute, Nijhuis says. “They are very performance oriented: at the end of the line, they expect a piece of working software.”
Scientific software developer Jakob Maljaars, however, does notice a difference with a regular commercial company as a client: “You can really notice that our work is being funded in a research context/environment; we do not experience the pressure that many commercial clients tend to put on you. And the timeframe is much larger than you normally experience with a commercial client: around ten years.” Maljaars works at S[&]T. ASTRON and S[&]T have worked closely together for years. For example, S[&]T is also involved in ASTRON’s DISTURB project. Maljaars: “I think S[&]T mainly takes care of getting the ideas from R&D, and translate them into properly tested, easy to maintain, and efficient code.”
More time for science
In the end, everything comes together: the astronomy and scientific knowledge from ASTRON and the software development from a more commercial point of view – efficient, easy-to-use software – from CGI Space, S[&]T, and TriOpSys.
When team Schaap’s work finishes – with the help of team Rapthor – there will be a framework for a pipeline in which components are easily interchangeable: a flexible pipeline. With this flexible pipeline, astronomers and operators of SKA can quickly create the software that they need for their data analysis, without the need to start from scratch every time. This means that there will be more time for science, optimizing the scientific output of SKAO.