A la croisée des révolutions numériques

Generic selectors
Exact matches only
Search in title
Search in content
Generic selectors
Exact matches only
Search in title
Search in content
Home 5 Évènement 5 HPC challenges for new extreme scale applications

HPC challenges for new extreme scale applications

Exascale machines are now available, based on several different arithmetic (from 64-bits to 16-8 bits arithmetic, including mixed versions and some no longer IEEE standard) and using different architectures (with network-on-chip processors and/or with accelerators).  Some execution and programming paradigms are being rehabilitated, such as data flow models and data parallelism without shared memories. Brain-scale applications, from machine learning and AI for example, manipulate many huge graphs that lead to very sparse non-symmetric linear algebra problems, resulting in performance closer to the HPCG benchmark than to the LINPACK one.

End-users and scientists have to face a lot of challenge associated to these evolutions and the increasing size of the data. The convergence of Data science (big Data) and the computational science to develop new applications generates important challenges.

This two-day workshop aims to bring together senior scientists in the field of HPC and some major applications associated with it, to brainstorm on those challenges and propose potential research collaborations. The number of invited participants is expected to be less than 35 (from Asia, USA and Europe mainly). We plan to organize panels, combined with some talks.



Christophe Calvin

📧 Email

Ewa Deelman


Kengo Nakajima


Serge Petiton

📧 Email

Adjoint à la directrice de la Recherche Fondamentale du CEA en charge du HPC  Research Professor and Research Director, USC Information Sciences Institute Professor, Supercomputing Division Information Technology Center, The University of Tokyo Professeur, Université de Lille. Centre de Recherche en Informatique, Signal et Automatisme de Lille, CNRS


Du 06 mars au 07 mars 2023

Du 06-03 au 07-03-2023

9h30 - 16h00

Hôtel Pullman Paris Montparnasse

19 Rue du Commandant René Mouchotte, 75014 Paris

Ecole Polytechnique


Organisateurs et partenaires


8h30 – 8h45 Welcome  
8h45 – 10h15   Chairperson : Christophe Calvin / CEA
8h45 HPC challenges and new computing frontiers Serge Petiton (U. Lille)
9h15 Innovative Supercomputing by Integration of Simulation/Data/Learning Kengo Nakajima (U. Tokyo/RIKEN)
9h45 Intelligent Simulations Will Demand New Extreme-scale Computing Capabilities Ian Foster (U. Chicago)
10h15 – 10h45 Coffee and tea break  
10h45 – 12h45   Chairperson : Kengo Nakajima
10h45 Exascale challenge : An overview of the French NumPeX project and of some challenges in Astronomy, Earth and Environmental sciences Jean-Pierre Vilotte (CNRS-INSU, IPGP)
11h15 A challenge of exploiting low precision computing in iterative linear solvers Takeshi Fukaya (U. Hokkaido)
11h45 ML/AI Research Directions within the US Department of Energy Osni Marques (LBNL)
12h15 Large-Scale Graph Neural Networks for Real-World Industrial Applications Toyotaro Suzumura (U. Tokyo)
12h45 – 13h30 Lunch, on site  
13h30 – 15h30   Chairperson : France Boillod-Cerneux / CEA
13h30 Modeling a novel laser-driven electron accelerator concept : Particle-In-Cell simulations at the exascale Neïl Zaim (CEA)
14h00 Towards Integrated Hardware/Software Ecosystems for the Edge-Cloud-HPC Continuum : the Transcontinuum Initiative Gabriel Antoniu (INRIA)
14h30 Tiny, Tiny Tasks – Huge Impact Ivo Kabadshow (FZJ)
15h00 Heterogeneous system for exascale using h3-Open-SYS/WaitIO Shinji Sumimoto (U. Tokyo)
15h30 – 16h00 Coffee and tea break  
16h00 – 17h30   Chairperson: Nahid Emad / U. Paris-Saclay
16h00 How much can we really compress scientific data Franck Cappello (ANL)
16h30 Programming Systems for Heterogeneous Memory Architectures Christian Terboven (RWTH)
17h00 FasTensor : Efficient Tensor Computation for Large-Scale Data Analysis Kesheng  Wu (LBNL)
19h30 Dinner « Chez Françoise »


09h30 – 10h00 Discussions Moderator : Kengo Nakajima
10h00 – 12h00   Chair : France Boillod-Cerneux
10h00 Exascale challenges and opportunities for fundamental research Christophe Calvin (CEA)
10h30 Multi-Hybrid Device Programming and Application by Uniform Language Taisuke Boku (U. Tsukuba)
11h00 France within the international Exascale ecosystem France Boillod-Cerneux (CEA)
11h30 Development of a heterogeneous coupling library h3-Open-UTIL/MP Takashi Arakawa (U. Tokyo)
12h00 – 14h00 Lunch break (Lunch on site if in situ)
14h00 – 16h30   Chairperson : Osni Marques
14h00 Towards Next JCAHPC System Toshihiro Hanawa (U. Tokyo)
14h30 Hybrid AI/HPC Approaches and Linear Algebra Nahid Emad (UVSQ)
15h00 Extreme Scale, Tissue Analytics and AI Joel Saltz (U. Stony-Brook)
15h30 Living in a Heterogeneous World : How scientific workflows bridge diverse cyberinfrastructure Ewa Deelman (USC)
16h00 – 17h00 Discussions Moderator : Serge Petiton



HPC challenges and new computing frontiers| Serge G. Petiton, U. of Lille

Existing exascale supercomputers have been designed primarily for computer science, not for machine learning and AI. The increase in the number of nodes, on the one hand, and the network-on-chip, on the other, add two more levels of programming: the task graph, at the top level, and distributed computing on-chip, at the bottom level. In addition, the recent evolution of new processors and arithmetic for new applications that are maturing after the convergence of big data and HPC to machine learning will generate post-exascale computing that will redefine some programming and application development paradigms.
In this talk, I review some results obtained for sparse linear algebra for iterative methods and also for machine learning methods. I also discuss the potential evolution we would face in being able to combine computational science, data science and machine learning on future faster supercomputers.


Innovative Supercomputing by Integration of Simulation/Data/Learning | Kengo Nakajima, The University of Tokyo/RIKEN R-CCS

Supercomputing is shifting from the traditional simulation for computational science to the integration with data science and machine learning/AI. Since 2015, the Information Technology Center, the University of Tokyo (ITC/U.Tokyo) has been working on the “Big Data & Extreme Computing (BDEC)” project aimed at new supercomputing through the integration of “Simulation/Data/Learning (S+D+L)”. In May 2021, Wisteria/BDEC-01, the first system of the BDEC project, began its operation. Wisteria/BDEC-01 has a total peak performance of 33+PF, and consists of a simulation node group (Odyssey) consisting of 7,680x A64FX nodes and a data/learning node group (Aquarius) equipped with 360x NVIDIA A100 GPUs. . Aquarius can be directly connected to the outside and real-time acquisition of observation data is also possible. Some nodes of Aquarius are directly connected to the outside, and real-time acquisition of observation data etc. is also possible via SINET. Since 2019, we have been developing an innovative software platform « h3-Open-BDEC » that realizes the integration of (S+D+L).), with the support of Grant-in-Aid for Scientific Research (S). Integration of (S+D+L) is now being realized on Wisteria/BDEC-01. Those activities are described in the talk with future perspectives.

📧 Email


Intelligent Simulations Will Demand New Extreme-scale Computing Capabilities | Ian Foster, University of Chicago

The search for ever-more accurate and detailed simulations of physical phenomenon has driven decades of improvements in both supercomputer architecture and computational methods. It seems likely that the next several orders of magnitude improvements are likely to come, at least in part, from the use of machine learning and artificial intelligence methods to learn approximations to complex functions and to assist in navigating complex search spaces. Without any aspiration for completeness, I will review some relevant activities in this space and suggest implications for post-exascale research.


Exascale challenge : An overview of the French NumPeX project and of some challenges in Astronomy, Earth and Environmental sciences | Jean-Pierre Vilotte, CNRS-INSU, IPGP

In this presentation we will provide a brief overview of the new French project NumPeX addressing science-driven exascale software stack for capable exascale system. This will be illustrated by some data-driven exascale challenges in the context of the Square Kilometre Array (SKA), the large observational infrastructure in radio-astronomy and of the Earth system and environment modelling, emphasising new data-driven exascale needs in HPC/HPDA/ML.


A challenge of exploiting low precision computing in iterative linear solvers | Takeshi Fukaya, Hokkaido University

Recently, the use of low precision computing such as FP32 and FP16 has attracted much attention, and mixed precision methods that can efficiently use them have been actively investigated in the field of numerical linear algebra. In this talk, we present our attempt of developing a mixed precision iterative linear solver, in which we consider to exploit low precision computing and provide a numerical solution as accurate as that by conventional methods using only FP64. We focus on the GMRES method with restart, so-called the GMRES(m) method, and introduce low precision computing into the method based on the underlying structure of the iterative refinement scheme. Through numerical experiments, we investigate the possibility of aggressively using low precision computing in the GMRES(m) method and discuss issues for further performance improvement.

📧 Email


ML/AI Research Directions within the US Department of Energy | Osni Marques, LBNL

In this presentation, I will summarize some of the research directions within the US Department of Energy (DOE), specifically targeting ML and AI for science and energy applications. The presentation will be based on the report Basic Research Needs Workshop for Scientific Machine Learning – Core Technologies for Artificial Intelligence, held in 2018. The report discusses opportunities for research, in and past the exascale era. I will also summarize efforts in DOE related to mixed precision computations.


Large-Scale Graph Neural Networks for Real-World Industrial Applications | Toyotaro Suzumura, The University of Toky

A graph or network is a powerful data structure that can represent relationships between any entities in both the digital world and the physical world.  The way of analyzing graphs has been advancing from algorithm-based approaches to data-driven approaches with machine learning and neural networks just like other types of data such as text, image, and speech. In this talk, I will talk about how graph neural networks have emerged as a powerful learning paradigm that backs up conventional graph algorithm-based approaches, and also introduce our ongoing research projects and collaborations with industry around graph neural networks such as recommendation. I will briefly introduce a nationwide cloud computing project called “mdx” as well as a nationwide materials informatics project named ARIM (Advanced Research Infrastructure for Materials and Nanotechnology).

📧 Email


Modeling a novel laser-driven electron accelerator concept: Particle-In-Cell simulations at the exascale | Neïl Zaim, CEA/DRF/IRAMIS/LYDL

Intense femtosecond lasers focused on low-density gas jets can accelerate ultra-short electron bunches up to very high energies (from hundreds of MeV to several GeV) over a few millimeters or a few centimeters. However, conventional laser-driven electron acceleration schemes do not provide enough charge for most of the foreseen applications. To address this issue, we have devised a novel scheme consisting of a gas jet coupled to a solid target to accelerate substantially more charge. In 2022 we validated this concept with proof-of-principle experiments at the LOA laser facility (France), and with a large-scale Particle-In-Cell simulation campaign, carried out with the open-source WarpX code. Performing such simulations requires the use of the most powerful supercomputers in the world, as well as advanced numerical techniques such as mesh refinement, which is very challenging to implement in an electromagnetic Particle-In-Cell code, and indeed unique to the WarpX code. A work describing the technical challenges that we addressed to make these simulations possible was awarded the Gordon Bell prize in 2022. In this contribution, we will also discuss the performance portability of the WarpX code by presenting scaling tests on Frontier, Fugaku, Summit, and Perlmutter supercomputers.


Towards Integrated Hardware/Software Ecosystems for the Edge-Cloud-HPC Continuum : the Transcontinuum Initiative | Gabriel Antoniu, INRIA

Modern use cases such as autonomous vehicles, digital twins, smart buildings and precision agriculture, greatly increase the complexity of application workflows. They typically combine physics-based simulations, analysis of large data volumes and machine learning and require a hybrid execution infrastructure: edge devices create streams of input data, which are processed by data analytics and machine learning applications in the Cloud, and simulations on large, specialised HPC systems provide insights into and prediction of future system state. All of these steps pose different requirements for the best suited execution platforms, and they need to be connected in an efficient and secure way. This assembly is called the Computing Continuum (CC). It raises challenges at multiple levels: at the application level, innovative algorithms are needed to bridge simulations, machine learning and data-driven analytics; at the middleware level, adequate tools must enable efficient deployment, scheduling and orchestration of the workflow components across the whole distributed infrastructure; and, finally, a capable resource management system must allocate a suitable set of components of the infrastructure to run the application workflow, preferably in a dynamic and adaptive way, taking into account the specific capabilities of each component of the underlying heterogeneous infrastructure. This talk discusses these challenges and introduces TCI – the Transcontinuum Initiative – a European multidisciplinary collaborative action aiming to identify the related gaps for both hardware and software infrastructures to build CC use cases, with the ultimate goal of accelerating scientific discovery, improving timeliness, quality and sustainability of engineering artefacts, and supporting decisions in complex and potentially urgent situations.


Tiny, Tiny Tasks - Huge Impact | Ivo Kabadshow, FZJ

Programming today’s supercomputers and upcoming Exascale hardware requires us to deal with hierarchical and heterogeneous parallelism. To harvest these FLOPs efficiently a lot of fine-grained parallelism in the examined algorithm needs to be uncovered and exploited. This begs the question if we can utilize the hidden performance from high-level languages with greater abstraction possibilities like C++.
In this talk, we present our current efforts towards a performance portable C++ tasking layer for tiny, inter-dependent tasks. Especially algorithms that cannot rely on weak scaling — like MD, need to be carefully investigated when going to extreme-scale.
Our goal is to reduce the per-task overhead to a minimum and allow execution of the task-dependency graph alongside the critical path of the algorithm. By utilizing ready-to-execute tasks and multiple typed task queues per core, work sharing and work stealing can be optimized on NUMA architectures with hierarchical memory.


Heterogenous system for exascale using h3-Open-SYS/WaitIO | Shinji Sumimoto, The University of Tokyo

This talk presents a system-wide communication library to couple multiple MPI programs for heterogeneous coupling computing called h3-Open-SYS/WaitIO (WaitIO for short). WaitIO provides an inter-program communication environment among MPI programs and supports different MPI libraries with various interconnects and processor types. This talk discusses how should complicated problems should be solved in such heterogeneous systems. We have developed the WaitIO communication library to realize the environments. We present how WaitIO works and performs in such heterogeneous computing environments.

📧 Email


How much can we really compress scientific data | Franck Cappello, ANL

In 2016, the lossy compression of scientific data was in its infancy : few compressors, no rigorous evaluation methodology, and few users.
Exascale made it mandatory for many applications. We are observing a rapid adoption by many communities of modern compressors that are much faster, more effective, and trustable than the initial ones. In this talk, we will address the questions that intrigued potential new users often ask : how do modern lossy compressors works? How do other scientists use them ? What are the use cases and the current results of scientific data compression for different domains ? How fast can we compress ? And, more importantly : how to trust lossy compressors ? We will mainly focus on the SZ lossy compressor developed during the Exascale Computing Project in the USA. Finally, while the lossy compression of scientific data made dramatic progress, we are still observing significant performance improvements; which raises the question: How much can we really compress scientific data ?


Programming Systems for Heterogeneous Memory Architectures | Christian Terboven, RWTH

The memory subsystem is changing: the evolution of the cache hierarchy is followed by new technologies with new kinds of memory, ranging from high-bandwidth (HBM) to large-capacity (NVM). But applications usually have to be heavily modified to use different kinds of memory. A portable, vendor-neutral view of heterogeneous memory could come in the form of a hierarchy of abstractions to cope with the variety of existing hardware. Further, as memory with the highest bandwidth is limited in capacity, decisions about data placement in the different kinds of memory are required, also considering changes in data access patternsover time.
A hierarchy of programming abstractions to expose and manage heterogeneous memory at different levels of detail and control allows the programmer to express hints (traits) for allocations that describe how data is used and accessed. Combined with characteristics of the platforms’ memory subsystem, these traits are exploited by strategies to decide where to place data items. This talk will first present a characterization of different kinds of memory, and then evaluate support for heterogeneous memory in OpenMP and higher-level middleware.


FasTensor : Efficient Tensor Computation for Large-Scale Data Analysis | Kesheng Wu, Bin Dong, and Suren Byna, LBNL

A key strategy for processing the large-scale scientific data is to parallelize the analysis tasks to harness the power of many cores on modern CPUs and GPUs. Before the advent of Big Data Systems, this type of parallel processing required dedicated custom computer hardware and software, or very expensive parallel database systems. Fortunately, along with the growth of internet businesses, a data processing revolution has emerged, and Big Data technology quickly spread.  As exemplified by the MapReduce (MR) system, they enable complex data analyses without requiring users to prescribe details of parallel execution, data management, or error handling.  However, for large-scale data from scientific simulations, experiments, and observations, many common operations, such as convolution, are hard to describe and slow to execute with the current generation of Big Data systems.
This presentation describes a new design named FasTensor to address these challenges.


Exascale challenges and opportunities for fundamental research | Christophe Calvin, CEA
With the exascale come new challenges: the processing of massive data coupled with digital simulation becomes intrinsic to science. In addition, the constraints brought by the architectures of calculation for the exascale impose to also rethink the scientific applications. We are therefore faced with two major challenges. The 1st one: how the new exascale calculators, inscribed in a digital continuum, will be able to provide solutions for the processing of complex workflows combining data processing and simulation. The 2nd, how to design or redesign the applications in order to be able to exploit the architectures of the exascale supercomputers. We will illustrate these 2 challenges through different use cases at the CEA’s Fundamental Research Division.
Multi-Hybrid Device Programming and Application by Uniform Language | Taisuke Boku, Center for Computational Sciences, University of Tsukuba

While GPU is the most powerful main player on the advanced HPC platforms, the applicability on applications or partial codes is limited only when the dominant computation consists of highly parallel regular computation. Especially, when the computation capability is limited by memory capacity, we have to face to a severe strong scalability to improve the performance for time to solution. We have been researching to apply FPGA together with GPU to compensate with each other of devices with different performance behavior. I will introduce several programming approaches include a single notation of OpenACC to apply both GPU and FPGA simultaneously based on our original language system. In the best case, we successfully achieved 10x performance on time to solution comparing with GPU-only case.


France within the international Exascale ecosystem | France Boillod-Cerneux, CEA
Europe has embarked on the international exascale race with the acquisition of 2 exascale class machines in the next 2 to 3 years. France has applied to host one of these two computers through the Exascale-France project, coordinated by GENCI. One of the issues around the exascale is certainly the support of application communities and the development of uses. One of the actions initiated within the framework of this project was to make an inventory as precise as possible of this national application community by evaluating the degrees of maturity of the codes vis-a-vis the exascale and by estimating the necessary resources for scaling up. During this presentation we will also address the problem of application benchmarks which is one of the important elements in this type of project: indeed, depending on the role of each (developer, machine administrator, vendor), the objective of a benchmark is not the same, yet it is a common object which is certainly an important element in the famous co-design process.
Development of a heterogeneous coupling library h3-Open-UTIL/MP | Takashi Arakawa, CliMTech/The University of Tokyo

« Heterogeniety » is one of the key words of recent years in high-performance computing. In fact, the majority of the systems at the top of the TOP500 list are heterogeneous system composed of CPUs and GPUs. This heterogeneity can be classified into two categories. The one is Intra node heterogeneity, such as GPU machine and VE machine. And the other is internode heterogeneity, such as CPU node + GPU node, or CPU node + VE node. Our presentation will focus on the later system. The reason for the development of such a system is that role of HPC has expanded beyond not only simple simulation but also to large-scale data analysis and machine learning. Therefore, software that allows to interact simulation programs with Data analysis/AI programs on heterogeneous systems is required. Based on these backgrounds, we are developing a heterogeneous coupling library h3-Open-UTIL/MP, as a part of the h3-Open-BDEC project. h3-Open-UTIL/MP is a general-purpose coupling library which can couple any simulation models and applications that meet the following two conditions: 1) it has uniquely numbered and time-invariant grid points, and 2) the time interval of data exchange does not change in time. In addition, it can couple on heterogeneous environment by collaborating with a communication library h3-Open-SYS/WaitIO. In out presentation, we will describe the structure and function of h3-Open-UTIL/MP and discuss the results of performance measurements. Furthermore, we introduce one of the applications, the coupling of the atmospheric model NICAM with the machine learning library PyTorch.

📧 Email


Towards Next JCAHPC System | Toshihiro Hanawa, The University of Tokyo

JCAHPC (Joint Center for Advanced HPC) is a virtual organization between the Center for Computational Sciences at the University of Tsukuba (CCS) and the Information Technology Center at the University of Tokyo (ITC) to design, operate and manage a next-generation supercomputer system provided. We plan to introduce the “Oakforest-PACS II” system as the successor to the Oakforest-PACS system in FY2024, targeting a peak performance of 200 PFLOPS mostly using GPUs as accelerators. In this talk, the efforts to design the new system and to port the existing applications to the GPU system will be presented.

📧 Email


Hybrid AI/HPC Approaches and Linear Algebra | Nahid Emad, Paris-Saclay University/Versailles
We highlight the omnipresence of certain linear algebra methods such as the eigenvalue problem or more generally the singular value decomposition in machine learning techniques. We consider some examples of applications by highlighting the essential role of these methods. The ever-increasing production of data requires new methodological and technological approaches to meet the challenge of their effective analyses. A new machine learning approach based on the Unite and Conquer methods will be presented. This intrinsically parallel and scalable technique can be implemented with synchronous or asynchronous communications. Experimental results, demonstrating the interest of the approach for an effective analysis of data in the case of clustering and anomaly detection will be presented.
Extreme Scale, Tissue Analytics and AI | Joel Saltz, Stony Brook University

I will discuss the vision and challenge of creating models and fine tuning pipelines capable of 1) carrying out complex Pathology image classification tasks, 2)  answering nuanced questions requiring examples from comparable patients and scientific literature citations and 3) finding and/or generating examples of similar cases that differ in subtle but important details.  The tremendous success of large language models such as GPT-3 and BERT along with the demonstrated ability to tune such models to carry out discourse (e.g. ChatGPT) suggests that over the coming few years, this ambitious goal may be realizable.  I will discuss the prospects and challenges associated with this ambitious project.


Living in a Heterogenous World: How scientific workflows bridge diverse cyberinfrastructure | Ewa Deelman, USC

Pegasus (http://pegasus.isi.edu) automates the process of executing science workflows on modern cyberinfrastructure. It takes high-level, resource-independent descriptions and maps them onto the available heterogeneous resources: campus clusters, high-performance computing resources, high-throughput resources, clouds, and the edge. This talk describes the challenges and opportunities of workflow management in this heterogeneous context.



Gabriel Antoniu FR INRIA
Takashi Arakawa JPN U. Tokyo
France Boillod-Cerneux FR CEA
Taisuke Boku JPN U. Tsukuba
Christophe Calvin FR CEA
Franck Cappello USA ANL
Eva Cruck FR DGA
Ewa Deelman USA USC
Nahid Emad FR U. Paris-Saclay
Ian Foster USA U. Chicago
Takeshi Fukaya JPN U. Hokkaido
Bernd Haag FR DGA
Toshihiro Hanawa JPN U. Tokyo
Ivo Kabadshow DE FZJ
Osni Marques USA LBNL
Jean-Patrick Mascomere FR TOTAL
Matthias Müller DE RWTH
Kengo Nkajima JPN U. Tokyo
Serge Petiton FR U. Lille
Joel Saltz USA U. Stony-Brook
Shinji Sumimoto JPN U. Tokyo
Toyotaro Suzumura JPN U. Tokyo
Christian Terboven DE RTWH
Jean-Pierre Vilotte FR IPGP
Kesheng Wu USA LBNL
Neïl Zaim FR CEA