

## Few hints for the post exascale architectures



#### Marc Duranton and Denis Dutoit

Commissariat à l'énergie atomique et aux énergies alternatives

May 23th, 2019 Aristote seminar: "En route vers l'exascale!"

#### This presentation is based on the work done in ETP4HPC and in HiPEAC

A blueprint for the new Strategic Research Agenda for High Performance Computing



https://www.etp4hpc.eu/pujades/files/Blueprint%20document\_20190429.pdf

Common document between Gabriel ETP4HPC, Marc As BDVA & Peter Ba HiPEAC Costas B

Gabriel Antoniu, INRIA (BDVA) Marc Asch, U-PICARDIE (BDEC-2) Peter Bauer, ECMWF Costas Bekas, IBM Pascale Bernier-Bruna, **ETP4HPC** Francois Bodin, IRISA Laurent Cargemel, Atos Paul Carpenter, BSC Marc Duranton, CEA (HiPEAC) Maike Gilliot, ETP4HPC Hans-Christian Hoppe, INTEL Jens Krueger, ITWM-FRAUNHOFER Julian Kunkel, Univ. of Reading Erwin Laure, KTH

#### Contributors

Jean-Francois Lavignon, TECHNOLOGY-STRATEGY Guy Lonsdale, SCAPOS Michael Malms, ETP4HPC Fabio Martinelli, CNR (ECSO) Sai Narasimhamurthy, SEGATE Marcin Ostasz, BSC Maria Perez, UPM (BDVA) Dirk Pleiter, JSC Andrea Reale, IBM (BDVA) Pascale Rosse-Laurent, Atos

#### This presentation is based on the work done in ETP4HPC and in HiPEAC







https://www.etp4hpc.eu/pujades/files/Blueprint%20document\_20190429.pdf

https://www.hipeac.net/roadmap

Evolution of application scope: the continuum
 Hardware heterogeneity and orchestration
 Software?

Evolution of application scope: the continuum
 Hardware heterogeneity and orchestration
 Software?

Evolution of application scope: the continuum
 From smart sensors to HPC
 Artificial Intelligence (Deep Learning) loads
 Implications for the architecture
 Hardware heterogeneity and orchestration
 Software?

#### FROM SMART SENSORS TO HPC

#### Mainstream "business" model





### ECONOMICAL DRIVE OF CONNECTED THINGS: BETTER EFFICIENCY IN RESOURCES AND ENERGY

#### Mainstream "business" model



#### HPC in the loop



#### HIGH PERFORMANCE SYSTEMS IN THE LOOP



aka Cognitive CPS aka Intelligent Embedded Systems aka Autonomous CPS (ACPS)

> Enabling Intelligent data processing at the edge: Fog computing Edge computing Stream analytics Fast data...

True collaboration between edge devices and the HPC/cloud ⇒ creating a continuum

#### SRA-4: THE INCREASING INTERPLAY OF SIMULATION, AI, IOT AND ANALYTICS







### ARTIFICIAL INTELLIGENCE (DEEP LEARNING) LOADS



| 17

#### ONE ASPECT OF AI: PERSONAL ASSISTANTS....



Google Assistant (1 billon devices) Apple Siri (+500 millions devices) Amazon Alexa (+100 millions devices) Baidu's DuerOS (+100 millions devices)

#### **DEEP LEARNING AND VOICE RECOGNITION**



#### **DEEP LEARNING AND VOICE RECOGNITION**

" The need for TPUs really emerged about six years ago, when we started using computationally expensive deep learning models in more and more places throughout our products. The computational expense of using these models had us worried. If we considered a scenario where **people use Google voice search for just three minutes a day** and we ran deep neural nets for our speech recognition system on the processing units we were using, we would have had to **double the number of Google data centers!**"

[https://cloudplatform.googleblog.com/2017/04/quantifying-the-performance-of-the-TPU-our-first-machine-learning-chip.html]

## GOOGLE'S CUSTOMIZED HARDWARE...

... required to increase energy efficiency with accuracy adapted to the use (e.g. float 16)



Google's TPU2 : training and inference in a **180 teraflops**<sub>16</sub> (**180 x 10**<sup>12</sup> **Flops**<sub>16</sub>) board (over 200W per TPU2 chip according to the size of the heat sink)

### GOOGLE'S CUSTOMIZED TPU (V2) HARDWARE...

... required to increase energy efficiency with accuracy adapted to the use (e.g. float 16)



Google's TPU2 : 11.5 petaflops<sub>16</sub> of machine learning number crunching (and guessing about **400+ KW**..., 100+ GFlops<sub>16</sub>/W)

From Google

Peta =  $10^{15}$  = million of milliard

#### GOOGLE'S CUSTOMIZED TPU (V3) HARDWARE...



| Chip              | TPUv1         | TPUv2         | TPUv3            |
|-------------------|---------------|---------------|------------------|
| Announced         | 2016          | May-17        | May-18           |
| Access            | Internal-Only | Service Beta  | Undisclosed      |
| Introduction      | 2015          | Feb 2018      | Undisclosed      |
| Process           | 28nm          | 20nm est.     | 16/12nm est.     |
| Die Size          | ~300mm2       | Undisclosed   | Undisclosed      |
| TOPS              | 92/23         | 45            | 90               |
| Matrix Input      | INT8 / INT16  | bfloat16      | bfloat16         |
| Memory            | 8GB DDR3      | 16GB HBM      | 32GB HBM         |
| CPU Interface     | PCle 3.0 x16  | PCIe 3.0 x8   | PCIe 3.0 x8 est. |
| Power Consumption | 40W           | 200-250W est. | 200W est.        |
|                   |               |               |                  |

#### A Brief Guide to Floating Point Formats





From https://www.nextplatform.com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/

#### ALPHAGO ZERO: SELF-PLAYING TO



From doi:10.1038/nature24270 (Received 07 April 2017)

#### **ALPHAZERO FROM DEEP MIND : COMPUTING RESOURCES**



#### EXPONENTIAL INCREASE OF COMPUTING POWER FOR AI TRAINING

"Since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time...

(by comparison, Moore's Law had an 18-month doubling period)\*"



AlexNet to AlphaGo Zero: A 300,000x Increase in Compute

#### **ALWAYS MORE COMPUTING RESOURCES**



From Paul Messina, Argonne National Laboratory



**Traditional Machine Learning Workflow** 



AutoML Workflow

From Forbes

J +

Auto-ML uses optimization approaches to select a "good" set of parameters "*automagically*"

- It is generally very computing expansive (configuration space search)
- Use clever algorithms to avoid exploring all the configuration space More details for example in http://automl.org

#### CHARACTERIZATION OF DEEP LEARNING LOADS



| 32

#### MACHINE LEARNING (DEEP LEARNING) INFERENCE PHASE

- Low precision arithmetic
- Medium to low number of operations
- Co-location computing and storage ("computing in memory")
- Should satisfy the application non-functional requirements



Inference phase

But for large number of inferences (users) -> more cloud like structure, high throughput







#### **COMPLEMENTARITY OF SIMULATION AND IA TECHNIQUES**



Evolution of application scope: the continuum
 Hardware heterogeneity and orchestration
 Software?

# Outline

Evolution of application scope: the continuum
 Hardware heterogeneity and orchestration

 End of Dennard's scaling
 Heterogeneous accelerators
 Heterogeneous integration

 Software?

### END OF DENNARD'S SCALING

### WHAT WILL BE THE NEXT TECHNOLOGY?



## Exponential increase of performances in 33 years



Production car of 1985X 100 000 000Lamborghini Countach 5000QVin 33 yearsMax speed 300 Km/hin 34 years

Star Trek Enterprise Year: about 2290 27 times the speed of light?

### THE END OF MOORE'S LAW DENNARD SCALING

| Parameter<br>(scale factor = a) | Classic<br>Scaling | Everything was<br>easy:                                      |
|---------------------------------|--------------------|--------------------------------------------------------------|
| Dimensions                      | I/a                | Wait for the next technology node                            |
| Voltage                         | I/a                | <ul> <li>Increase<br/>frequency</li> </ul>                   |
| Current                         | I/a                | <ul> <li>Decrease Vdd</li> <li>⇒ Similar increase</li> </ul> |
| Capacitance                     | I/a                | of sequential performance                                    |
| Power/Circuit                   | I/a <sup>2</sup>   | $\Rightarrow$ No need to recompile (except                   |
| Power Density                   | 1                  | if architectural improvements)                               |
| Delay/Circuit                   | I/a                | improvements)                                                |

Source: Krisztián Flautner "From niche to mainstream: can critical systems make the transition?"

## Technology evolution Transistor 2D



### **Technology evolution**



### HETEROGENEOUS ACCELERATORS

## The problem:

### **Expected case scenario**



From "Total Consumer Power Consumption Forecast", Anders S.G. Andrae, October 2017

## SPECIALIZATION LEADS TO MORE EFFICIENCY EFFICIENCY

| <b>CPU</b><br>1690 pJ/flop | GPU<br>140 pJ/flop                                                                  |
|----------------------------|-------------------------------------------------------------------------------------|
| Type of device             | Energy / <sub>at</sub><br>Operation                                                 |
| CPU                        | 1690 pJ                                                                             |
| GPU                        | 140 pJ                                                                              |
| Fixed function             | <b>10 n l</b><br>FPGA with HLS<br>"software programming<br>space and not only time" |
| Westmere<br>32 nm          | Kepler<br>28 nm                                                                     |

Source from Bill Dally (nVidia) « Challenges for Future Computing Systems » HiPEAC conference 2015

#### **TODAY'S HPC ARE HETEREGENEOUS**



- 3.3 peak exacts for emerging AI workloads
  4.608 compute nodes, each containing two 22 core IE
- 4,608 compute nodes, each containing two 22-core IBM Power9 processors and six Nvidia Tesla V100 GPUs
- Interconnected with dual-rail Mellanox EDR 100Gb/s InfiniBand.

### TOP500 #1 & #2: NVIDIA TESLA V100 GPU + IBM POWER9 CPU



Heterogeneous integration driven by compute energy efficiency

From Denis Dutoit

## AMD'S EPYC AND RADEON TO POWER EXASCALE SUPERCOMPUTER May, 2019

https://www.amd.com/es/products/frontier



54

### 52<sup>ND</sup> EDITION OF THE TOP500 LIST (NOVEMBER 11<sup>TH</sup>, 2018)



|     | Rank             | Site                                                                                     | System                                                                                                                                                                                                                                          | Rmax<br>(TFlop/s)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Rpeak<br>(TFlop/s)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|-----|------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Yes | 1                | DOE/SC/Oak Ridge<br>National Laboratory<br>United States                                 | Summit - IBM Power System AC922, IBM<br>POWER9 22C 3.07GHz, NVIDIA Volta<br>GV100, Dual-rail Mellanox EDR Infiniband<br>IBM                                                                                                                     | 143,500.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 200,794.9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Yes | 2                | DOE/NNSA/LLNL<br>United States                                                           | Sierra - IBM Power System S922LC, IBM<br>POWER9 22C 3.1GHz, NVIDIA Volta GV100,<br>Dual-rail Mellanox EDR Infiniband<br>IBM / NVIDIA / Mellanox                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 125,712.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| No  | 3                | National Supercomputing<br>Center in Wuxi<br>China                                       | Sunway TaihuLight - Sunway MPP,<br>Sunway SW26010 260C 1.45GHz, Sunway<br>NRCPC                                                                                                                                                                 | 93,014.6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 125,435.9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Yes | 4                | National Super Computer<br>Center in Guangzhou<br>China                                  | Tianhe-2A - TH-IVB-FEP Cluster, Intel<br>Xeon E5-2692v2 12C 2.2GHz, TH Express-<br>2, Matrix-2000<br>NUDT                                                                                                                                       | 61,444.5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 100,678.7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Yes | 5                | Swiss National<br>Supercomputing Centre<br>(CSCS)<br>Switzerland                         | <b>Piz Daint</b> - Cray XC50, Xeon E5-2690v3<br>12C 2.6GHz, Aries interconnect , NVIDIA<br>Tesla P100<br><b>Cray Inc.</b>                                                                                                                       | 21,230.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 27,154.3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|     | Yes<br>No<br>Yes | 1         Yes         2         Yes         3         No         4         Yes         5 | YesNational Laboratory<br>United StatesYes2DOE/NNSA/LLNL<br>United StatesYes3National Supercomputing<br>Center in Wuxi<br>ChinaYes4National Super Computer<br>Center in Guangzhou<br>ChinaYes5Swiss National<br>Supercomputing Centre<br>(CSCS) | Yes1DOE/SC/Oak Ridge<br>National Laboratory<br>United StatesSummit - IBM Power System AC922, IBM<br>POWER9 22C 3.07GHz, NVIDIA Volta<br>GV100, Dual-rail Mellanox EDR Infiniband<br>IBMYes2DOE/NNSA/LLNL<br>United StatesSierra - IBM Power System S922LC, IBM<br>POWER9 22C 3.1GHz, NVIDIA Volta GV100,<br>Dual-rail Mellanox EDR Infiniband<br>IBM / NVIDIA / MellanoxNo3National Supercomputing<br>Center in Wuxi<br>ChinaSunway TaihuLight - Sunway MPP,<br>Sunway SW26010 260C 1.45GHz, Sunway<br>NRCPCYes4National Super Computer<br>Center in Guangzhou<br>ChinaTianhe-2A - TH-IVB-FEP Cluster, Intel<br>Xeon E5-2692v2 12C 2.2GHz, TH Express-<br>2, Matrix-2000<br>NUDTYes5Swiss National<br>Supercomputing Centre<br>(CSCS)Piz Daint - Cray XC50, Xeon E5-2690v3<br>12C 2.6GHz, Aries interconnect , NVIDIA<br>Tesla P100 | RankSiteSystem(TFlop/s)Yes1D0E/SC/0ak Ridge<br>National Laboratory<br>United StatesSummit - IBM Power System AC922, IBM<br>POWER9 22C 3.07GHz, NVIDIA Volta<br>GV100, Dual-rait Mellanox EDR Infiniband<br>IBM143,500.0Yes2D0E/NNSA/LLNL<br>United StatesSierra - IBM Power System S922LC, IBM<br>POWER9 22C 3.1GHz, NVIDIA Volta GV100,<br>Dual-rait Mellanox EDR Infiniband<br>IBM / NVIDIA / Mellanox94,640.0No3National Supercomputing<br>Center in Wuxi<br>ChinaSunway TaihuLight - Sunway MPP,<br>Sunway SW26010 260C 1.45GHz, Sunway<br>NRCPC93,014.6Yes4National Super Computer<br>Center in Guangzhou<br>ChinaTianhe-2A - TH-IVB-FEP Cluster, Intel<br>Xeon E5-2692v2 12C 2.2GHz, TH Express-<br>2, Matrix-2000<br>NUDT61,444.5Yes5Swiss National<br>Supercomputing Centre<br>(CSCS)Piz Daint - Cray XC50, Xeon E5-2690v3<br>12C 2.6GHz, Aries interconnect, NVIDIA21,230.0 |



HOME SERVICES NEWS EDUCATION ABOUT US

Search

#### Deep Learning Chipset Shipments to Increase to 2.9 Billion Units Annually by 2025,

#### According to mactica

GPUs and CPUs Currently Lead in Market Share, but ASICs will Capture the Lead by 2022, with Expanded Opportunities for SoC Accelerators and FPGAs

May 06, 2019 07:20 AM Eastern Daylight Time



### GOING NEURO-INSPIRED: "SPIKING" NEURAL NETWORKS

Using another way of coding information...not using bits

| euRAM | 3 |  |
|-------|---|--|
|       |   |  |

Ν

|                                    | IBM<br>TrueNorth      | Intel Loihi         | DynapSEL             |
|------------------------------------|-----------------------|---------------------|----------------------|
| Technology                         | 28nm CMOS             | 14 nm CMOS          | 28 nm FDSOI          |
| Supply Voltage                     | 0.7-1.05 V            | 0.5-1.25 V          | 0.73-1 V             |
| Design Type                        | Digital               | Digital             | Mixed-signal         |
| Neurons per core                   | 256                   | Max 1k              | 256                  |
| Core Area                          | 0.094 mm <sup>2</sup> | 0.4 mm <sup>2</sup> | 0.36 mm <sup>2</sup> |
| Computation                        | Time multiplexing     | Time multiplexing   | Parallel processing  |
| Fan In/Out                         | 256/256               | 16/4k               | 2k/8k                |
| On-line Learning                   | No                    | Programmable        | STDP                 |
| Synaptic Operation / Second / Watt | 46 GSOPS/W            |                     | 300 GSOPS/W          |
| Energy per synaptic operation      | 26 pJ                 | 23.6 pJ             | <2 pJ                |





## **FUSING PARADIGMS AT HARDWARE LEVEL**

At the hardware level, the good old Von Neumann/ CMOS partnership can act as a computing substrate, or orchestrator of various accelerators/technologies

- Acting as coordination / communication node
- Allowing Hardware / Software integration



#### NON VOLATILE MEMORIES



## **NEW ARCHITECTURE PARADIGMS WITH NVM**



From Denis Dutoit

### SOLVING THE ENERGY CHALLENGE: COST OF MOVING DATA



Source: Bill Dally, « To ExaScale and Beyond »

www.nvidia.com/content/PDF/sc\_2010/theater/Dally\_SC10.pdf

### **PROCESSOR ARCHITECTURE EVOLUTION**



### **HETEROGENEOUS INTEGRATION**



### FROM ADVANCED PACKAGING TECHNOLOGIES ....

... TO CHIPLETS



From Denis Dutoit

### SOME RECENT ANNOUCEMENTS ... ON CHIPLETS & ACTIVE INTERPOSERS

### ... and INTEL



Technology

S

FOVERO

https://www.engadget.com/2018/12/12/intel-foverus-3d-chip/?yptr=yahoo&guccounter=2

From Denis Dutoit

### From AMD .....



https://spectrum.ieee.org/tech-talk/semiconductors/design/amd-tackles-comingchiplet-revolution-with-new-chip-network-scheme



[J. Yin et al., "Modular Routing Design for Chiplet-based Systems", ISCA'2018]

| 66

## EUROPEAN PROCESSOR INITIATIVE



www.european-processor-initiative.eu



Copyright © European Processor Initiative 2019.

### **COMMON PLATFORM FOR MULTI-LEVEL HETEROGENEOUS INTEGRATION**



| 69

## **ELECTRONS VERSUS PHOTONS**

Electrons: Easy to create and interface Attenuation with the distance (Ohm's law)

Photons: Energy demanding for creation and interfacing

Low attenuation with the distance

## **OFF-CHIP PHOTONICS**

Photonics: cost in sending information, nearly *nothing in transmission* 



## **IN-PACKAGE PHOTONICS**



### DEMONSTRATION OF A THERMALLY TUNED WDM ELECTRO-OPTICAL LINK



### LETI'S SI-PHOTONICS ROADMAP FOR POST-EXASCALE COMPUTING



### POTENTIAL SOLUTION FOR POST EXASCALE BOARD



From Denis Dutoit

| 75

## Outline

Evolution of application scope: the continuum
 Hardware heterogeneity and orchestration
 Software?

#### PARALLELISM AND SPECIALIZATION ARE NOT FOR FREE...

Frequency limit → parallelism Energy efficiency → heterogeneity

Ease of programming

#### PARALLELISM AND SPECIALIZATION ARE NOT FOR FREE...

Frequency limit → parallelism Energy efficiency → heterogeneity

Ease of programming

## MANAGING COMPLEXITY....

## "Nontrivial software written with threads, semaphore, and mutexes is incomprehensible by humans"



Edward A. Lee

The future of embedded software ARTEMIS 2006

Parallelism, multi-cores, heterogeneity, distributed computing, seems to be too complex for humans ?

## MANAGING COMPLEXITY



"And that's why we need a computer."

Cognitive solutions for complex computing systems:

- Using AI and optimization techniques for computing systems
  - Creating new hardware
  - Generating code
  - Optimizing systems
- Similar to *Generative design* for mechanical engineering

### USING AI FOR MAKING COMPUTING SYSTEMS: "GENERATIVE DESIGN" APPROACH

The user only states desired goals and constraints

-> The *complexity wall* might *prevent explaining* the solution



Motorcycle swingarm: the piece that hinges the rear wheel to the bike's frame

## EXAMPLE: DESIGN SPACE EXPLORATION FOR DESIGN MULTI-CORE PROCESSORS<sup>1</sup> (2010)

- Ne-XVP project Follow-up of the TriMedia VLIW (<u>https://en.wikipedia.org/wiki/Ne-XVP</u>)
- 1,105,747,200 heterogeneous multicores in the design space
- 2 millions years to evaluate all design points
- "Al inspired" techniques allowed to reduce the induction time to only few days

#### => x16 performance increase



<sup>1</sup> M. Duranton et all., "Rapid Technology-Aware Design Space Exploration for Embedded HeterogeneousMultiprocessors" in Processor and System-on-Chip Simulation, Ed. R. Leupers, 2010

## THIS IS ALSO VALID FOR SOFTWARE: AUTOML AND OTHER PROGRAM GENERATORS

#### contributed articles

an he obtained from these specific

lems, while the tedious task of det that works heat in a given u

med minn

vial BM ILOG CPLEX Optimia re for solving mixed-integer p

for important components of the pri-

**Dertormance Matters** nputer programs and the alg

Pb0 aims to avoid or

in the form of parameter i cally optimized for spee example, up to 5

Avoid premature commitment, seek design

alternatives, and automatically generate

Programming

performance-optimized software.

by

Optimization WHEN CREATING SOFTWARE, developers usually explor different ways of achieving certain tasks. These alternatives are often eliminated or abandoned early in the process, based on the idea that the flexibility they afford would be difficult or impossible to exploit later. This article challenges this view, advocating an approach that encourages developers to not only avoid premature commitment to certain design choices but to actively develop promising alternativ for parts of the design. In this approach, dubbed Programming by Optimization, or PbO, developers specify a potentially large design space of programs that accomplish a given task, from which versions of the program optimized for various use contexts are generated automatically, including parallel versions derived from the same sequential sources. We outline a simple, generic programming language extension that supports the specification of such design spaces and discuss ways specific programs

AT HIMS OF THE AN M I FEEDLIARY 2012 - HOL SE I NO. 2



ithms on which they are based fre | erations of maintainability, extensispansity molve difference ways of get-ing controlling down. Investments, this present and performance of the parton increases down. Investments, the programs and/or development. This matches meally been shared based on the later aspect in a present performance, consider-ing a present performance, consider-nt and a spacent performance, consider-uption of the performance of a program in the bases results, such design which he performance of a program. t many levels, from high-level archi-ectural aspects of a software system ach part of the program for which one five decades) software perform npiementation details. or more candidate designs are avail-made based on consid-able, even though these choices do not upon closer inspect ices do not upon closer inspection this is far from CODIADE 2017 | VOL 55 | NO 2 | COMMUNICATIONS OF THE

Communications of the ACM, 55(2), pp. 70-80, February 2012

www.prog-by-opt.net

### Microsoft's AI is learning to write code by itself, not steal it

Written by Dave Gershgorn

What if instead of searching through menus within programs like Microsoft Excel, our computers could understand the problem we're trying to solve and write the software to solve it? It's a hyper-futuristic idea, but one that has recently seen progress from Microsoft Research and the University of Cambridge.

In a November 2016 paper (pdf), which gained notoriety after being accepted into one of the year's largest artificial intelligence conferences, Microsoft and Cambridge built an algorithm capable of writing code that would solve simple math problems. The algorithm, named DeepCoder, would be able to augment its own ability by also looking at potential combinations of code for how a problem could be solved. (It's a bit complicated; we'll break it down later.) However, this doesn't mean it steals code, or copy and pastes it from existing software, or searches the internet for solutions, as some reports have claimed.

## **PROGRAMMING 2.0: LET THE COMPUTER DO THE JOB**

# Describing *what* the program should accomplish, rather than describing *how* to accomplish it

- For example, describe the *concurrency* of an application, not how to parallelize the code for it.
- (Good) compilers know better about architecture than humans, they are better at optimizing code...





## **CONCLUSION: WE LIVE AN EXCITING TIME!**

**"The best way to predict the future is to invent it."** Alan Kay







## Thank you for your attention

Special thank you to Denis Dutoit, Christian Gamrat, Carlo Reita for their slides I borrowed.

