Can IT Support the Big Bang Theory?
An interview with Alberto Di Meglio - Head of CERN openlab, IT Department, CERN - European Organization for Nuclear Research.
IDC: How can you plan and build an IT environment to support future-looking cosmic concerns with unknown outcomes – in other words, how can you future-proof your technology environment?
ADM: Designing, building, deploying, and operating the infrastructure for a multi-decade program like the LHC (Large Hadron Collider), requires a lot of forward-thinking and strong collaboration within our community and with industry. We try to define what the computing and data requirements will be five to ten years down the line and discuss these requirements openly. We have collaborations in place, like CERN openlab, which I’m responsible for, where technology trends can be assessed, requirements discussed, and prototypes built. In some way, by doing this, not only do we future-proof our environment, but we often help industry to understand what might hit them in more typical consumer or industrial markets in a few years’ time. For production systems, we also tend not to rely on the very latest, most performant, and therefore very expensive systems. We rather look for the best value and the most efficient solutions.
IDC:What is the key part of this infrastructure? Is it related to the huge data management of billions of particle collisions? Or is there more investment in the analytics side – to assess this data and the results of the collisions?
ADM: There is no single key part. Going from the physical effects of a proton-to-proton collision to a breakthrough worthy of the Nobel Prize, scientific evidence of the discovery of a new particle requires many steps. We need efficient data acquisition and filtering systems capable of working at rates of petabytes per second, reliable storage systems holding the many tens of petabytes of information produced per year, and an efficient worldwide distributed infrastructure to move and analyze data 365 days a year at sustained rates of 12GBps. The LHC program is continuously revised to identify priority areas for upgrades and improvements. Today, major investments are made in the data acquisition systems, in redesigning our software to exploit modern multi-core platforms, and in deploying or procuring cost-effective cloud infrastructures.
IDC: Tell me about your major cloud implementations? Does CERN really use public cloud? What can businesses and other organizations learn from CERN’s use of cloud computing?
ADM: For the moment, we have our own private cloud infrastructure based on OpenStack. It spans two main datacentres in Geneva and Budapest. Together with other groups in the IT department, we are, however, very actively experimenting with the idea of using public and hybrid cloud infrastructures, for example, as part of the Helix Nebula initiative. Our main requirement is to be able to seamlessly use more than one cloud provider, share data across providers and infrastructures, implement reliable monitoring processes, and, above all, to allow our distributed community of scientists to carry out their work in the simplest way possible.
IDC: Most of our readers are CIOs in the private sector, who get their marching orders from their CEO and board of directors. Who decides what your KPIs are? And what are your KPIs currently?
ADM: Well, indeed for us this is slightly different. We have hierarchically equivalent roles, the director general, the directorate, the CERN Council, etcetera, but our main concern is to collectively satisfy the technical requirements of the LHC accelerator and the experiments, while fitting into the substantially fixed yearly budget coming from the Member States. Operationally, we have clear KPIs and metrics and all the means to measure them; however, the main drivers come from the LHC Experiments Computing Models describing the estimated requirements in terms of computing power, data storage, and network capacity. In a way, we have two main measures of success, the discovery of new physics and how much of the technology produced to get there gets injected back into other scientific domains and industry.
IDC: Can you please share some of the exploratory tech projects you are currently working on? What kinds of projects do you have in machine learning? Cognitive computing? IoT? What use cases would you have for these technologies?
ADM: About two years ago, we produced a whitepaper on the major technology challenges in scientific research. This paper was used to drive the priorities for the current research projects in CERN openlab. They span data acquisition (redesign of the systems to use multi-core devices and very fast interconnects), code modernization (rewrite millions of lines of code to exploit vectorization, parallelization, and other modern techniques), cloud federations and containers, use of machine learning for physics events indexing and proactive maintenance of the LHC control systems, assessment of new storage technologies and in-memory applications, use of SDN for network transfers optimization and security, and many others.
IDC: What keeps you awake at night?
ADM: I’m lucky enough to be able to sleep quite well knowing that there is no chance my job will ever get boring! My main concern is to find enough time to follow the many amazing technologies being developed today and to have the necessary vision to spot the ones that will really disrupt the way we do research and allow us to keep shedding light on the mysteries of the universe.
Alberto Di Meglio is the Head of the CERN openlab, a unit in the CERN IT Department responsible to manage joint research collaborations between CERN and industrial companies and research institutes in the ICT sector to support the long-term research objectives of the LHC accelerator and other CERN initiatives. He is also responsible for the Large-Scale Computing and Data initiative within the CERN Medical Applications Project Forum (CMAPF). Alberto is an Aerospace Engineer (MEng) and Electronic Engineer (PhD) by education and has extensive experience in the design, development and deployment of distributed computing and data infrastructures both for commercial and research applications. Before joining CERN openlab first as CTO in 2013, Alberto was Project Director of the European Middleware Initiative (EMI), a project responsible to develop and maintain most of the software services used today in the Worldwide LHC Computing Grid.