OpenPOWER Summit 2021

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:00
09:00
5min
OpenPOWER Summit Welcome
James Kulina

OPF Summit Welcome and Keynotes

RoomA
09:05
09:05
15min
Antmicro: Software-driven Hardware in the Data Center: LibreBMC, Renode, RowHammer Test Platform and more
Michael Gielda

Antmicro Keynote by Michael Gielda

RoomA
09:20
09:20
15min
OpenCAPI Consortium: Strategies for CXL’s success in a world of proprietary Coherent Busses
Allan Cantle

As the excitement around the promise of CXL builds, industry incumbents have continued to rapidly innovate with their own proprietary coherent interconnect busses.
These developments only serve to confuse what our Industry Standard future landscape will look like.
This presentation will showcase a potential path for CXL to flourish by existing alongside these proprietary busses without the processor companies having to sacrifice their proprietary solutions.

The presentation will conclude that the integration of CXL with an Industry Standard Near Memory bus will drive CXLs rapid organic growth in a shared memory centric world.

RoomA
09:35
09:35
15min
The Open Hardware Diversity Alliance: What it is, why it’s here, and how you can get involved!
Kim McMahon

The Open Hardware Diversity Alliance formed in August 2021 with a partnership between RISC-V, Chips Alliance, OpenPower Foundation, Western Digital, and IBM with a mission to provide programs to encourage participation and support the professional advancement of women and underrepresented individuals in open source hardware.
We asked ourselves:
Why are there few women and underrepresented individuals in the open hardware community?
Is it because open hardware is hard to navigate?
Is career progression a mystery?
Is it a lack of visibility of the talent in open hardware to the community?
In this presentation, Kim will share information about the Alliance, what we have done, what has worked, and didn’t work, and invite anyone interested to join us! This is an interactive presentation, where Kim will also ask the audience to participate in the conversation.

RoomA
09:50
09:50
15min
CHIPS Alliance: Building an Open Source Hardware Ecosystem: From Foundations to Rooftops
Rob Mains

This talk will explore the creation of an open source hardware ecosystem that is composed of many ingredients. The talk will look at different parts of the hardware ecosystem, from baseline technology ingredients, EDA tooling, IP building blocks, and implementation, and how organizations such as CHIPS Alliance, OpenPower, and RISC-V International are working together to help make the vision a reality. The talk will also highlight the need to build a diverse and inclusive talent pool from the bottom up.

RoomA
10:30
10:30
45min
AI acceleration in POWER10
Rajalakshmi S

The Matrix Multiply Assist (MMA) architecture introduced in POWER10 is an important feature in AI acceleration.  This talk describes implementation of MMA support in AI libraries and frameworks and performance improvements in some of the workloads.

RoomA
10:30
45min
Beyond Machine Boundaries: Experience with Memory Disaggregation
Felix Eberhardt, Andreas Grapentin

The disaggregation of system resources promises various benefits, such as an increased flexibility of provisioning, better consolidation of workloads, and higher limits for bursts of resource consumption.
Similarly to how storage in the datacenter tends to be combined into large pools to be shared across different machines, portions of main memory could be disaggregated and made available on demand as well.
This could have the potential of making memory disaggregation a bulding block of the future datacenter.
With Memory Inception, a disaggregation technology is announced for Power 10 systems, and an OpenCAPI-based prototype – ThymesisFlow – is already available.
In this talk, we will outline our experimental setup using two IC922 Power 9 machines connected with ThymesisFlow, as well as present a selection of the projects currently running in our lab that use this technology.
In particular, we will discuss what new challenges arise for scale-up workloads such as In-Memory Databases and show early measurements with Hyrise, an open source In-Memory Database developed for research.

RoomB
10:30
45min
OpenPOWER ISA curriculum
Abhinandan S P

System on Chip (SoC) is increasingly driving embedded and IoT devices due to its
ability to tightly integrate microprocessors, microcontrollers, and peripherals. Moreover,
hardware accelerators are being used widely in machine learning in the form of SoCs to improve
performance and reduce energy consumption. In this presentation, we will talk about the course
we designed with multiple goals. First, we want to introduce and build a community for POWER
ISA architecture. Second, to bridge the gap between the academic and industry that prevails in
the SoC design and verification. The course is designed in collaboration between NIE, SASTRA
University, SRM University, JNTU Ananthapur, IIT Guwahati and Object Automation Solutions and IBM. The course covers SoC design with Libre-SoC toolchain
and System Verilog, IP verification, SoC verification, and application development. Initially, the developed SoC design is implemented in FPGA (Field Programmable Gate Array). Testing
procedures are applied over it to make a front-end design flow familiar to the learner. The
developed SoC is subjected to the backend tool flow, which covers open source tools to convert
the design into a GDS II file. This course includes the contents needed to have hands-on
experience right from the understanding of OpenPOWER architecture to GDS II generation required for chip tap-out in both design and test perspectives.

Hardware
RoomC
10:30
240min
SoC in Hours - A Power Chat
MANIKANDAN NAGARAJAN, Vinod Bussa, Abhishek Sharma, Harinagarjun, Dr. Sumalatha

SoC in Hours - A Power Chat
1. OpenPOWER - A matured ISA
2. Art of System Building - Its Libre-SoC
3. Microwatt in FPGA - A Rapid Flow
4. Tapeout Microwatt in a click
5. Bug the SoC - A Fire test

Silicon
RoomD
11:15
11:15
45min
A Vision for Transforming 21st-Century Pedagogy via Open Standards: OpenPOWER
Wu Feng

The teaching of the inter-related areas of computer architecture, computer organization, and computer systems is at a crossroads, one that could lead to another pedagogical (r)evolution. The first revolution occurred in the early 1990s, spurred by research in the 1980s that re-visited RISC architectures and resulting in the seminal book entitled "Computer Architecture: A Quantitative Approach" in 1989 and its subsequent prequel book entitled "Computer Organization and Design: The Hardware/Software Interface" in 1993, both centered around the MIPS instruction set architecture (ISA). By the 2000s, many institutions in higher education transitioned their traditional course on operating systems concepts to a hands-on computer systems curriculum based on the CISC x86-64 ISA, as captured by the seminal book entitled "Computer Systems: A Programmer’s Perspective." While these books have served as exemplars for their respective areas, one might argue that the use of disparate ISAs – MIPS versus x86-64 – serves as an unnecessary learning impediment and source of confusion. A potential solution to this problem would be to align the teaching of all these inter-related areas with the MIPS ISA entirely or x86-64 ISA entirely; however, the former has limited real-world deployment while the latter is closed (and unnecessarily complex, i.e., CISC). In contrast, the POWER architecture is open source and enjoys widespread deployment, including two of the fastest supercomputers in the world (i.e., Sierra at Lawrence Livermore National Lab and Summit at Oak Ridge National Lab). For these reasons, we envision a vertically integrated curriculum from hardware to systems software based on the POWER instruction set architecture.

Enablement
RoomC
11:15
45min
Next-gen Dynamic UAV using Power9 Systems
Sri Kamani, Sashank, Deepthi, Barat. T, Adithya Gopan

ABSTRACT: Forest fires are on the increase worldwide. Forest fires are a threat to our environment because they spread quickly and can burn down acres of lush forest if they are not attended to. Forest fires occur due to various reasons. As climate changes continue and temperatures increase by a few degrees every year, the forest fire will increase also. Trees that took many years to grow disappear in a very short time because of fires, leaving mountain areas barren, no longer providing protection from rains and mudslides following those rains, no longer providing oxygen clean air and shelter and food for birds and animals.
Usually the forest fires originate very discreetly and rangers are notified about the fire until it’s too late. This is because fire occur in dense forests where humans can’t possibly pose a challenge for the rangers. So, to overcome this problem we will use Drones to navigate into the thick part of the forest and integrate Computer Vision into this by utilizing a state-of-the-art Convolutional Neural Network (CNN) to achieve the task. The entire process is treated as classification task where the deep neural network model is responsible for classifying whether it's fire or non-fire from the image provided by the camera which is attached to the drone we deployed. The training is performed over a dataset containing both fire and non-fire images, collected from various sources.

KEYWORDS: Enterprise AI, POWER9, AC922, Deep Learning(DL), Neural Network.

PROPOSED SYSTEM: We will develop the proper neural network architecture for the problem based on the data and the goal set. Using a large set of data obtained from various resources and department of forestry, we will train the neural network to provide the most optimal strategy. Using the test data, we will test the neural network for its ability to provide the optimal strategy. We will use the forest fires of that year and have the domain experts to verify the optimality of the neural network’s strategy. Proposed models are very complicated, and require intrinsic knowledge about specific programming languages and tools. Setting up the system for DL model is difficult. Personal systems lack computational powers, which restricts the capabilities of DL models. Hence, a need for on Premise based DL servers are required to help a large mass of people. To meet this need, IBM has revealed Power9 processor, the AC922 Power systems server, designed for computing heavy artificial intelligence workloads.
To provide up to 5.6 times the bandwidth for data-intensive workloads, the AC922 Power server incorporates next-generation I/O architectures such as PCle Gen4, CAPI2.0, OpenCAPI and Nvidia NVLINK. We use this server to train our model and optimize our performance in an efficient way.

RoomA
11:15
45min
The TAU Performance System
Allen D. Malony

The TAU Performance System is an open source toolkit for parallel performance measurement and analysis target to high-performance computing (HPC) and enterprise systems. Developed at the University of Oregon, TAU has been ported to POWER systems for many years and is fully supported on the latest POWER processors. The talk will introduce the TAU technologies and showcase performance analysis and optimization outcomes for applications running on POWER platforms. It will cover the latest features of TAU and future directions, especially with respect to opportunities for enterprise use.

Software
RoomB
12:15
12:15
45min
A course on Accelerating Big Data Analytics Application with FPGA and OpenCAPI to Improve the Reach out
Arghya Kusum Das, Dr. Peter Hofstee

Large-scale streaming and big data applications requiring large amounts of memory have made the OpenCAPI technology and FPGA an appealing and cost-effective solution. Large research labs to data analytic startups are increasingly utilizing the technology to accelerate their applications creating new jobs and research opportunities. In this presentation we will discuss about the course that we designed to teach high performance analysis of big data leveraging the FPGA technology together with OpenCAPI.

Enablement
RoomC
12:15
45min
AI APPLICATIONS ON POWER9 SYSTEM
Sridhar Ramasubramanian, Vaibhav Raja

In the recent days, Machine Learning & Neural Networks also freely referred to as Artificial Intelligence has grown so rapidly that it is no more a system used only by Researchers in Universities, but, has evolved so much that it is now actually deployed at Enterprise level across Organizations for their Production environment to analyse the data & gather meaningful insights from it. Many industries and organizations that have incorporated AI into its infrastructure have gained a competitive edge compared to their peers & this happening across industries. With AI in industries helping organizations building solutions for the betterment of the industries in an efficient way, employees can focus on things such as communicating and strategizing to build solutions that solve problems that is otherwise side-lined. With the advancement in Technology, Companies are continuously embedding more & more powerful resources in the chips so they can process complex & resource heavy Big Data, Cloud & AI Applications. The latest chips being open-sourced also paves the way for running Enterprise AI Applications. One such great example is IBM POWER9 system which addresses complex workloads such as Cognitive & Enterprise AI Applications. The POWER9 systems with its high powered GPUs help organizations manage their data on transactional information & Product feature that can be easily analyzed & get insights with the Machine Learning (ML)/Deep Learning (DL) Models.

RoomA
12:15
45min
IBM Bayesian Optimization Accelerator
Xinghong He

IBM Bayesian Optimization Accelerator (BOA) is a global optimization toolkit which applies machine learning techniques to solve some challenges arising from many practical engineering and designing problems: computational or experimental simulations of the sampling space is very expensive; the objective functions have multiple local optima; the collected data are noisy and do not have derivatives or analytic forms. Other features of BOA include batch sampling, parameter analysis, extensive implementations of kernel functions, acquisition functions and optimization techniques. The solution is integrated as an appliance which can be easily hooked to existing High Performance Computing (HPC) or enterprise environment of different Operating Systems. In this talk, we will discuss how BOA works with existing HPC environment to get the optimization done. How to write interface functions to connect BOA with external workload to be optimized. We will also present some use case studies which show performance gains against some traditional methods such as grid search and random search.

Software
RoomB
13:00
13:00
45min
A Course on Machine Learning for Software Developers
Arghya Kusum Das

Understanding machine learning algorithm is essential but the development, acceleration, and production engineering capabilities are also required in industry. This machine learning course introduces students to the concepts of data preprocessing, algorithmic overview of different supervised and unsupervised learning techniques, their development strategies and accelerating those algorithms using different hardware such as, IBM Power hardware. We developed the course in collaboration with experts from different industries (e.g., Facebook and IBM). The course will help the community to know more about the capabilities of IBM POWER processor while Design an ML production system end-to-end including project scoping, data needs, modeling strategies, and deployment requirements.

Enablement
RoomC
13:00
45min
ANANTH Fabless SoC
A C Venkatesh, Unnamed user

ANANTH is a type of Fabless SoC (System on Chip) designed and developed at VLSI labs, Electronics and Communication Engineering Department, JNTUA college of engineering, Anantapur, Under academic collaboration with Open Power foundation and International Business Machines (IBM) Inc. This is a fabless SoC built around IBM POWER A2O CORE and also has peripherals like AMBA AXI, SPI, I2C, ETHERNET, NAND, NOR, DMA, PCIe,DDR3. This is indigenously developed for academic R&D purposes.

RoomA
13:00
45min
Draft SVP64 in-place Matrix Multiply and FFT / DCT for OpenPOWER
Luke Leighton

Advanced Cray-style Vectors are being developed for the Power ISA, as a
Draft Extension for submission to the new OpenPOWER ISA Working Group,
named SVP64. Whilst in-place Matrix Multiply was planned for a much
later advanced version of SVP64, an investigation into putting FFMPEG's
MP3 CODEC inner loop into Vectorised Assembler resulted in such a large
drop in code size (over 4x reduction) that it warranted priority
investigation.

Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT)
and Number-Theory Transform (NTT) form the basis of too numerous
high-priority algorithms to count. Normal SIMD Processors and even
normal Vector Processors have a hard time dealing with them: inspecting
FFMPEG's source code reveals that heavily optimised inline assembler (no
loops, just hundreds to thousands of lines of assembler) is not uncommon.

The focus of this NLnet-sponsored research is therefore to create enhancements
to SVP64 to be able to cover DFT, DCT, NTT and Matrix-Multiply entirely
in-place. In-place is crucially important for many applications (3D, Video)
to keep power consumption down by avoiding register spill as well as L1/L2
cache strip-mining. General-purpose RADIX-2 DCT and complex DFT will be
shown and explained, as well as the in-place Matrix Multiply which does
not require transposing or register spill for any sized (including non-power-of-two)
Matrices up to 128 FMACs. The basics of SVP64, covered in the Overview [1], will also
be briefly described.

[1] https://libre-soc.org/openpower/sv/overview/

Hardware
RoomB
14:00
14:00
45min
E4S: Extreme-scale Scientific Software Stack
Sameer Shende

The DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. E4S provides a curated set of packages
built using the Spack package manager and provides both bare-metal installation and
containerized distributions that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment,
and use of HPC software, lowering the barriers for HPC users.

Software
RoomB
14:45
14:45
45min
Differential Software Solutions on HPC and IBM Power Architecture
Ander Ochoa Gilo

Everything is changing. From Healthcare to Automotive markets without forgetting the Financial markets or any type of Engineering. Everything has stopped being created by an individual in a single computer or best case scenario a small team in a coupe of computers to something that is being developed and perfectioned by the use of HPC and AI, involving all kind of different people with different skills all around the world.
Even AI is something that we no longer run in a single computer, no matter how powerful it is. What drives everything today is HPC or High Performance Computing. This can help develop better Healthcare, better Automobiles, better Financials and better anything that we run on them.
In this session we will introduce the IBM Power architecture, HPC supercomputers, their differential SW solutions and will run a small demo in the IBM Cloud Pak for data.

Software
RoomB
14:45
45min
Geo-AI applications using H2O.ai on IBM open POWER 9 systems
Bagavathy Priya

Artificial Intelligence (AI) is a powerful science that utilizes sufficient
methodologies, strategies, and systems to take care of unsolvable real-world issues.
There is a wide range of technological advancements and research is going on to
solve many real-time problems with regards to all different aspects of today’s society.
Recent years have witnessed significant advances in geospatial artificial intelligence
(GeoAI), which is the integration of geospatial studies and AI, especially machine
learning and deep learning methods and the latest AI technologies in both academia
and industry. Setting up AI-based machines in solving geospatial applications
requires a high amount of computing. With processors like IBM POWER9, we can
address complex workloads with a huge amount of data while data visualization,
statistical analysis, pattern recognition, and inference building in a very fast and
efficient manner. Thus combining H2O driverless AI and IBM power systems for
enabling geospatial applications to harness AI for competitive gain.

RoomC
14:45
45min
Introduction to the LibreBMC project
Todd Rosedahl

Baseboard management controllers (BMCs) sit at the heart of the boot and control infrastructure in the datacenter, so they require a unique functionality set. Based on work to improve open source FPGA tooling, it’s now possible to replace traditional BMC ASICs with soft processors running on low cost FPGA hardware while still running the familiar OpenBMC software stack. To demonstrate this, the LibreBMC project was created and is being run under the OpenPOWER Foundation. Taking advantage of OCP’s DC-SCM spec, Antmicro has created the first ever open source DC-SCM 1.0 modules with LibreBMC supported FPGAs. This allows drop in replacement on existing designs, fully verifiable hardware, and the unique opportunity to do software style deployment of new “hardware based” security features (like memory tagging) even after hardware has been deployed in the datacenter!

RoomD
14:45
45min
Prepare yourself to switch computing to Open Hardware Power Architecture
Roberto Innocenti

We expect before the end of 2021 to see the life of three prototypes of the Open Hardware GNU/Linux PowerPC Laptop. The project started in late 2014, after a brief summary of the previous episodes and the latest update regarding the prototypes trough the recent electronics shortage and increase of the costs. We disclose how you can take part on the pre-production run. This difficult project, under an uncertainly period 2015-2021 to design a Power Architecture notebook, how is inserted in the constellation of an Open Hardware Power Architecture computing switch. As this is a Community Driven open hardware power architecture project we see how you can be a protagonist of this switch.

Hardware
RoomA
15:30
15:30
15min
OpenPOWER Working Groups Walkthrough
Toshaan Bharvani

Toshaan Bharvani, as the TSC of the OpenPOWER Foundation will go through the working groups and their activities during the past year.

RoomA
16:00
16:00
45min
High-Performance and Scalable Middleware for HPC and Deep Learning on OpenPOWER Platforms
Dhabaleswar K (DK) Panda, Donglai Dai

This talk will focus on high-performance and scalable middleware for MPI and DL applications on the OpenPOWER platform. The focus will be on three products with commercial support being available from X-ScaleSolutions. The first product focuses on the OSU MVAPICH2 MPI libraries and their capabilities for high-performance computing with both CPUs (OpenPOWER) and GPUs (NVIDIA). The second product focuses on tight integration between the OSU MVAPICH2-GDR MPI library and the Horovod stack to provide high-performance and scalable Deep Learning (DL) with deep introspection (DI) capabilities for DL frameworks like TensorFlow and PyTorch. The DI capabilities allow DL users and runtime developers to easily optimize their DL applications on modern systems. The third product focuses on a high-performance and scalable checkpointing library for HPC and DL applications. Performance results from the ORNL SUMMIT system (#2nd) and Lassen (#20th) with thousands of GPUs and POWER9 CPUs will be presented.

RoomB
16:00
10min
Testing Framework to port and optimize SIMD library to OpenPOWER Systems
Daisuke Oka

To port Intel x86 intrinsic to OpenPower intrinsic, we have to make testing framework. Porting must be accurate and without error or mistake. Test cannot be done by hand. It must be done automatically.
In porting process, result must be same and it is indispensable. Performance must be good. Must be able to measure and compare latency and throughput. This testing framework will made by open source.
This testing framework is executes both Intel x86 and OpenPower machine. Both is connected by network because both is not build on same machine.
We cannot test all pattern so we generate random data to test. Input data both Intel x86 Systems and OpenPower Systems and check result value is equal.
Network Module may be made by Python and Testing module may be made by C language extension of Python

Software
RoomA
16:00
45min
The Toy-SRAM Project
Robert Montoye

TOY-SRAM
Robert Montoye
October 28 2021
Microprocessors use multiple high-speed multiport register files to improve their performance. In BOOM, the high-performance RISC V, the custom register file took as much design effort as the rest of the design. The TOY-SRAM, open-source, multi-port memory system is designed to replace custom circuit design with a simple set of choices for the fab and the system designer. For the chip fabrication facility, it offers a canonical specification that can be optimized for use in a wide variety of applications. The system designer can then select the desired fast, low voltage friendly multiport memory from a menu of choices. This talk will discuss plans on 130 nm 5LM silicon which eases prototyping into future fabs while making the 10T SRAM the high bandwidth voltage scalable memory cell and encouraging fabs on its density.

RoomD
16:00
45min
coreboot on POWER9
Piotr Król

coreboot is open source firmware development framework that powers more then
10% of desktops, laptops and workstations. It supports multiple architetcures:
x86, ARM, ARM64, RISC-V. Bringing it to OpenPOWER means introduction of
established and sizable community, diverse economy of licensed service
providers as well as less complexity during new hardware integration.

This talk will present the progress of porting the POWER9 architecture to
coreboot along with Talos II and Talos II Lite machines (FSF RYF certified).
This project became possible with the cooperation of 3mdeb Embedded Systems
Consulting and Insurgo Technologies Libres/Open Technologies. We will present
problems and hardships encountered during the work and other exciting stories
with the initial results of the first hostboot code audit conducted as part of
the work. Finally, we will present the current state of the booting process on
Talos II using coreboot.

Software
RoomC
16:10
16:10
10min
Open Hardware through Open Power SBC
Manuel Virgilio

We aim to bring togheter a group of people that is interested in the hardware that they use to build the platform of the future. There are a lot of SBC that are very affordable and performing but none of them are based on Power architecture.
I started in december 2020 to design the Django0's schematic, a SBC based on the last (in 2020) accessible low power Power Based CPU, the NXP T1040, but unfortunately it has a loss in terms of cost/performance, in relation with competitors.
The actual Django design is slowly turning in a proof of concept, but what we want to support is the knowledge of the requested know-how that can be reused for upcoming power architectures, and make it open and available to all the community.
We want to be ready for the next Open Power step!

Hardware
RoomA
16:20
16:20
15min
OpenPOWER Collaboration Walkthrough
Toshaan Bharvani

Toshaan Bharvani the OpenPOWER Technical Steering Committee chair will be presenting the new suite of tools that OpenPOWER Foundation is using for collaboration and engagement with the OpenPOWER ecosystem. The foundation is adopting open source tools and collaboration platforms typical of other open source projects to expand community reach and drive engagement.

RoomA
16:45
16:45
45min
Panel: A Complete Re-Think for Memory Configurations
Jim Handy, Brian Allison, Tanj Bennett, Tom Coughlin

The industry has begun to reconfigure the standard memory topology, and these new configurations promise to make computing systems better than ever. Enormous pools of cache-coherent disaggregated “Far” memory will be made available to all processors, even coprocessors, and will be read and written using standard memory protocols no matter which processor accesses it. “Near” memory is set to move away from limiting DDR interfaces to support larger capacities at higher speeds while using less energy to move data around. What will this do to computing system architecture? How will it be supported? Which applications will benefit the most? Will other systems use this approach outside of the realm of supercomputers? In this session a panel of distinguished industry participants will share their sometimes-contradictory/sometimes-controversial views on these questions and more as the audience learns that there are many alternatives vying to win out in this space.

Hardware
RoomA
17:30
17:30
5min
OpenPOWER Summit 2021 Closing Remarks
James Kulina

OpenPOWER Closing

RoomA