OpenPOWER Summit 2021
OPF Summit Welcome and Keynotes
Antmicro Keynote by Michael Gielda
As the excitement around the promise of CXL builds, industry incumbents have continued to rapidly innovate with their own proprietary coherent interconnect busses.
These developments only serve to confuse what our Industry Standard future landscape will look like.
This presentation will showcase a potential path for CXL to flourish by existing alongside these proprietary busses without the processor companies having to sacrifice their proprietary solutions.
The presentation will conclude that the integration of CXL with an Industry Standard Near Memory bus will drive CXLs rapid organic growth in a shared memory centric world.
The Open Hardware Diversity Alliance formed in August 2021 with a partnership between RISC-V, Chips Alliance, OpenPower Foundation, Western Digital, and IBM with a mission to provide programs to encourage participation and support the professional advancement of women and underrepresented individuals in open source hardware.
We asked ourselves:
Why are there few women and underrepresented individuals in the open hardware community?
Is it because open hardware is hard to navigate?
Is career progression a mystery?
Is it a lack of visibility of the talent in open hardware to the community?
In this presentation, Kim will share information about the Alliance, what we have done, what has worked, and didn’t work, and invite anyone interested to join us! This is an interactive presentation, where Kim will also ask the audience to participate in the conversation.
This talk will explore the creation of an open source hardware ecosystem that is composed of many ingredients. The talk will look at different parts of the hardware ecosystem, from baseline technology ingredients, EDA tooling, IP building blocks, and implementation, and how organizations such as CHIPS Alliance, OpenPower, and RISC-V International are working together to help make the vision a reality. The talk will also highlight the need to build a diverse and inclusive talent pool from the bottom up.
The Matrix Multiply Assist (MMA) architecture introduced in POWER10 is an important feature in AI acceleration. This talk describes implementation of MMA support in AI libraries and frameworks and performance improvements in some of the workloads.
The disaggregation of system resources promises various benefits, such as an increased flexibility of provisioning, better consolidation of workloads, and higher limits for bursts of resource consumption.
Similarly to how storage in the datacenter tends to be combined into large pools to be shared across different machines, portions of main memory could be disaggregated and made available on demand as well.
This could have the potential of making memory disaggregation a bulding block of the future datacenter.
With Memory Inception, a disaggregation technology is announced for Power 10 systems, and an OpenCAPI-based prototype – ThymesisFlow – is already available.
In this talk, we will outline our experimental setup using two IC922 Power 9 machines connected with ThymesisFlow, as well as present a selection of the projects currently running in our lab that use this technology.
In particular, we will discuss what new challenges arise for scale-up workloads such as In-Memory Databases and show early measurements with Hyrise, an open source In-Memory Database developed for research.
System on Chip (SoC) is increasingly driving embedded and IoT devices due to its
ability to tightly integrate microprocessors, microcontrollers, and peripherals. Moreover,
hardware accelerators are being used widely in machine learning in the form of SoCs to improve
performance and reduce energy consumption. In this presentation, we will talk about the course
we designed with multiple goals. First, we want to introduce and build a community for POWER
ISA architecture. Second, to bridge the gap between the academic and industry that prevails in
the SoC design and verification. The course is designed in collaboration between NIE, SASTRA
University, SRM University, JNTU Ananthapur, IIT Guwahati and Object Automation Solutions and IBM. The course covers SoC design with Libre-SoC toolchain
and System Verilog, IP verification, SoC verification, and application development. Initially, the developed SoC design is implemented in FPGA (Field Programmable Gate Array). Testing
procedures are applied over it to make a front-end design flow familiar to the learner. The
developed SoC is subjected to the backend tool flow, which covers open source tools to convert
the design into a GDS II file. This course includes the contents needed to have hands-on
experience right from the understanding of OpenPOWER architecture to GDS II generation required for chip tap-out in both design and test perspectives.
SoC in Hours - A Power Chat
1. OpenPOWER - A matured ISA
2. Art of System Building - Its Libre-SoC
3. Microwatt in FPGA - A Rapid Flow
4. Tapeout Microwatt in a click
5. Bug the SoC - A Fire test
The teaching of the inter-related areas of computer architecture, computer organization, and computer systems is at a crossroads, one that could lead to another pedagogical (r)evolution. The first revolution occurred in the early 1990s, spurred by research in the 1980s that re-visited RISC architectures and resulting in the seminal book entitled "Computer Architecture: A Quantitative Approach" in 1989 and its subsequent prequel book entitled "Computer Organization and Design: The Hardware/Software Interface" in 1993, both centered around the MIPS instruction set architecture (ISA). By the 2000s, many institutions in higher education transitioned their traditional course on operating systems concepts to a hands-on computer systems curriculum based on the CISC x86-64 ISA, as captured by the seminal book entitled "Computer Systems: A Programmer’s Perspective." While these books have served as exemplars for their respective areas, one might argue that the use of disparate ISAs – MIPS versus x86-64 – serves as an unnecessary learning impediment and source of confusion. A potential solution to this problem would be to align the teaching of all these inter-related areas with the MIPS ISA entirely or x86-64 ISA entirely; however, the former has limited real-world deployment while the latter is closed (and unnecessarily complex, i.e., CISC). In contrast, the POWER architecture is open source and enjoys widespread deployment, including two of the fastest supercomputers in the world (i.e., Sierra at Lawrence Livermore National Lab and Summit at Oak Ridge National Lab). For these reasons, we envision a vertically integrated curriculum from hardware to systems software based on the POWER instruction set architecture.
ABSTRACT: Forest fires are on the increase worldwide. Forest fires are a threat to our environment because they spread quickly and can burn down acres of lush forest if they are not attended to. Forest fires occur due to various reasons. As climate changes continue and temperatures increase by a few degrees every year, the forest fire will increase also. Trees that took many years to grow disappear in a very short time because of fires, leaving mountain areas barren, no longer providing protection from rains and mudslides following those rains, no longer providing oxygen clean air and shelter and food for birds and animals.
Usually the forest fires originate very discreetly and rangers are notified about the fire until it’s too late. This is because fire occur in dense forests where humans can’t possibly pose a challenge for the rangers. So, to overcome this problem we will use Drones to navigate into the thick part of the forest and integrate Computer Vision into this by utilizing a state-of-the-art Convolutional Neural Network (CNN) to achieve the task. The entire process is treated as classification task where the deep neural network model is responsible for classifying whether it's fire or non-fire from the image provided by the camera which is attached to the drone we deployed. The training is performed over a dataset containing both fire and non-fire images, collected from various sources.
KEYWORDS: Enterprise AI, POWER9, AC922, Deep Learning(DL), Neural Network.
PROPOSED SYSTEM: We will develop the proper neural network architecture for the problem based on the data and the goal set. Using a large set of data obtained from various resources and department of forestry, we will train the neural network to provide the most optimal strategy. Using the test data, we will test the neural network for its ability to provide the optimal strategy. We will use the forest fires of that year and have the domain experts to verify the optimality of the neural network’s strategy. Proposed models are very complicated, and require intrinsic knowledge about specific programming languages and tools. Setting up the system for DL model is difficult. Personal systems lack computational powers, which restricts the capabilities of DL models. Hence, a need for on Premise based DL servers are required to help a large mass of people. To meet this need, IBM has revealed Power9 processor, the AC922 Power systems server, designed for computing heavy artificial intelligence workloads.
To provide up to 5.6 times the bandwidth for data-intensive workloads, the AC922 Power server incorporates next-generation I/O architectures such as PCle Gen4, CAPI2.0, OpenCAPI and Nvidia NVLINK. We use this server to train our model and optimize our performance in an efficient way.
The TAU Performance System is an open source toolkit for parallel performance measurement and analysis target to high-performance computing (HPC) and enterprise systems. Developed at the University of Oregon, TAU has been ported to POWER systems for many years and is fully supported on the latest POWER processors. The talk will introduce the TAU technologies and showcase performance analysis and optimization outcomes for applications running on POWER platforms. It will cover the latest features of TAU and future directions, especially with respect to opportunities for enterprise use.
Large-scale streaming and big data applications requiring large amounts of memory have made the OpenCAPI technology and FPGA an appealing and cost-effective solution. Large research labs to data analytic startups are increasingly utilizing the technology to accelerate their applications creating new jobs and research opportunities. In this presentation we will discuss about the course that we designed to teach high performance analysis of big data leveraging the FPGA technology together with OpenCAPI.
In the recent days, Machine Learning & Neural Networks also freely referred to as Artificial Intelligence has grown so rapidly that it is no more a system used only by Researchers in Universities, but, has evolved so much that it is now actually deployed at Enterprise level across Organizations for their Production environment to analyse the data & gather meaningful insights from it. Many industries and organizations that have incorporated AI into its infrastructure have gained a competitive edge compared to their peers & this happening across industries. With AI in industries helping organizations building solutions for the betterment of the industries in an efficient way, employees can focus on things such as communicating and strategizing to build solutions that solve problems that is otherwise side-lined. With the advancement in Technology, Companies are continuously embedding more & more powerful resources in the chips so they can process complex & resource heavy Big Data, Cloud & AI Applications. The latest chips being open-sourced also paves the way for running Enterprise AI Applications. One such great example is IBM POWER9 system which addresses complex workloads such as Cognitive & Enterprise AI Applications. The POWER9 systems with its high powered GPUs help organizations manage their data on transactional information & Product feature that can be easily analyzed & get insights with the Machine Learning (ML)/Deep Learning (DL) Models.
IBM Bayesian Optimization Accelerator (BOA) is a global optimization toolkit which applies machine learning techniques to solve some challenges arising from many practical engineering and designing problems: computational or experimental simulations of the sampling space is very expensive; the objective functions have multiple local optima; the collected data are noisy and do not have derivatives or analytic forms. Other features of BOA include batch sampling, parameter analysis, extensive implementations of kernel functions, acquisition functions and optimization techniques. The solution is integrated as an appliance which can be easily hooked to existing High Performance Computing (HPC) or enterprise environment of different Operating Systems. In this talk, we will discuss how BOA works with existing HPC environment to get the optimization done. How to write interface functions to connect BOA with external workload to be optimized. We will also present some use case studies which show performance gains against some traditional methods such as grid search and random search.
Understanding machine learning algorithm is essential but the development, acceleration, and production engineering capabilities are also required in industry. This machine learning course introduces students to the concepts of data preprocessing, algorithmic overview of different supervised and unsupervised learning techniques, their development strategies and accelerating those algorithms using different hardware such as, IBM Power hardware. We developed the course in collaboration with experts from different industries (e.g., Facebook and IBM). The course will help the community to know more about the capabilities of IBM POWER processor while Design an ML production system end-to-end including project scoping, data needs, modeling strategies, and deployment requirements.
ANANTH is a type of Fabless SoC (System on Chip) designed and developed at VLSI labs, Electronics and Communication Engineering Department, JNTUA college of engineering, Anantapur, Under academic collaboration with Open Power foundation and International Business Machines (IBM) Inc. This is a fabless SoC built around IBM POWER A2O CORE and also has peripherals like AMBA AXI, SPI, I2C, ETHERNET, NAND, NOR, DMA, PCIe,DDR3. This is indigenously developed for academic R&D purposes.
Advanced Cray-style Vectors are being developed for the Power ISA, as a
Draft Extension for submission to the new OpenPOWER ISA Working Group,
named SVP64. Whilst in-place Matrix Multiply was planned for a much
later advanced version of SVP64, an investigation into putting FFMPEG's
MP3 CODEC inner loop into Vectorised Assembler resulted in such a large
drop in code size (over 4x reduction) that it warranted priority
investigation.
Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT)
and Number-Theory Transform (NTT) form the basis of too numerous
high-priority algorithms to count. Normal SIMD Processors and even
normal Vector Processors have a hard time dealing with them: inspecting
FFMPEG's source code reveals that heavily optimised inline assembler (no
loops, just hundreds to thousands of lines of assembler) is not uncommon.
The focus of this NLnet-sponsored research is therefore to create enhancements
to SVP64 to be able to cover DFT, DCT, NTT and Matrix-Multiply entirely
in-place. In-place is crucially important for many applications (3D, Video)
to keep power consumption down by avoiding register spill as well as L1/L2
cache strip-mining. General-purpose RADIX-2 DCT and complex DFT will be
shown and explained, as well as the in-place Matrix Multiply which does
not require transposing or register spill for any sized (including non-power-of-two)
Matrices up to 128 FMACs. The basics of SVP64, covered in the Overview [1], will also
be briefly described.
[1] https://libre-soc.org/openpower/sv/overview/
Memory is an extremely important part of solutions and the memory controller is the interface between external memory and your system – it can have consequences for everything from performance, reliability, to even security!
This has been shown by the work Google has been funding to demonstrate new RowHammer exploits found using a fully open source, FPGA based memory platform and controller (paired with open source tooling).
Now using our experience optimizing full system performance through changes to the memory subsystems (through things like tcmalloc), we plan to drive changes in the memory controller space. As part of this effort we are now planning to build silicon to verify this work including at advance nodes with our partners like IBM, Antmicro and the CHIPS Alliance.
Sorbonne Université, in collaboration with Chips4Makers and LibreSOC are
working to provide a complete FOSS toolchain to make ASICs in mature
technological nodes, that is, no smaller than 130nm. We take a circuit
description in HDL, synthetize with Yosys but instead of targeting a FPGA, use
an ASIC standard cell library to get the RTL description. From there, with
Coriolis2, we perform the classical steps of a RTL to GDSII flow, that is,
placement, routage along with very basic timing closure. One key feature is
that all the tools of the chain cooperate together directly in memory, and even
share their underlying data-structures. The toolchain have been successfully
used to build the first LibreSOC chip in TSMC 180nm that is currently under
fabrication. The need for low cost or no-cost ASIC toolchain is increasing as
foundries, like Skywater, start to suppress the NDA on their mature
technological nodes (like Sky130).
HDL: Hardware Description Language, such as Verilog, VHDL, nMigen or Chisel.
RTL: Register Transfert Logic, the design expressed in term on 1-bit DFFs
and basic logic gates, like NOR2, NAND2, XOR2, ...
GDSII: Graphic Design System version 2. The de-facto standard to send the
layout a an ASIC design to any foundry.
Chips4Makers: https://chips4makers.io/
LibreSOC: https://libre-soc.org/
Coriolis2: https://coriolis.lip6.fr/
The DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. E4S provides a curated set of packages
built using the Spack package manager and provides both bare-metal installation and
containerized distributions that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment,
and use of HPC software, lowering the barriers for HPC users.
Everything is changing. From Healthcare to Automotive markets without forgetting the Financial markets or any type of Engineering. Everything has stopped being created by an individual in a single computer or best case scenario a small team in a coupe of computers to something that is being developed and perfectioned by the use of HPC and AI, involving all kind of different people with different skills all around the world.
Even AI is something that we no longer run in a single computer, no matter how powerful it is. What drives everything today is HPC or High Performance Computing. This can help develop better Healthcare, better Automobiles, better Financials and better anything that we run on them.
In this session we will introduce the IBM Power architecture, HPC supercomputers, their differential SW solutions and will run a small demo in the IBM Cloud Pak for data.
Artificial Intelligence (AI) is a powerful science that utilizes sufficient
methodologies, strategies, and systems to take care of unsolvable real-world issues.
There is a wide range of technological advancements and research is going on to
solve many real-time problems with regards to all different aspects of today’s society.
Recent years have witnessed significant advances in geospatial artificial intelligence
(GeoAI), which is the integration of geospatial studies and AI, especially machine
learning and deep learning methods and the latest AI technologies in both academia
and industry. Setting up AI-based machines in solving geospatial applications
requires a high amount of computing. With processors like IBM POWER9, we can
address complex workloads with a huge amount of data while data visualization,
statistical analysis, pattern recognition, and inference building in a very fast and
efficient manner. Thus combining H2O driverless AI and IBM power systems for
enabling geospatial applications to harness AI for competitive gain.
Baseboard management controllers (BMCs) sit at the heart of the boot and control infrastructure in the datacenter, so they require a unique functionality set. Based on work to improve open source FPGA tooling, it’s now possible to replace traditional BMC ASICs with soft processors running on low cost FPGA hardware while still running the familiar OpenBMC software stack. To demonstrate this, the LibreBMC project was created and is being run under the OpenPOWER Foundation. Taking advantage of OCP’s DC-SCM spec, Antmicro has created the first ever open source DC-SCM 1.0 modules with LibreBMC supported FPGAs. This allows drop in replacement on existing designs, fully verifiable hardware, and the unique opportunity to do software style deployment of new “hardware based” security features (like memory tagging) even after hardware has been deployed in the datacenter!
We expect before the end of 2021 to see the life of three prototypes of the Open Hardware GNU/Linux PowerPC Laptop. The project started in late 2014, after a brief summary of the previous episodes and the latest update regarding the prototypes trough the recent electronics shortage and increase of the costs. We disclose how you can take part on the pre-production run. This difficult project, under an uncertainly period 2015-2021 to design a Power Architecture notebook, how is inserted in the constellation of an Open Hardware Power Architecture computing switch. As this is a Community Driven open hardware power architecture project we see how you can be a protagonist of this switch.
Toshaan Bharvani, as the TSC of the OpenPOWER Foundation will go through the working groups and their activities during the past year.
This talk will focus on high-performance and scalable middleware for MPI and DL applications on the OpenPOWER platform. The focus will be on three products with commercial support being available from X-ScaleSolutions. The first product focuses on the OSU MVAPICH2 MPI libraries and their capabilities for high-performance computing with both CPUs (OpenPOWER) and GPUs (NVIDIA). The second product focuses on tight integration between the OSU MVAPICH2-GDR MPI library and the Horovod stack to provide high-performance and scalable Deep Learning (DL) with deep introspection (DI) capabilities for DL frameworks like TensorFlow and PyTorch. The DI capabilities allow DL users and runtime developers to easily optimize their DL applications on modern systems. The third product focuses on a high-performance and scalable checkpointing library for HPC and DL applications. Performance results from the ORNL SUMMIT system (#2nd) and Lassen (#20th) with thousands of GPUs and POWER9 CPUs will be presented.
To port Intel x86 intrinsic to OpenPower intrinsic, we have to make testing framework. Porting must be accurate and without error or mistake. Test cannot be done by hand. It must be done automatically.
In porting process, result must be same and it is indispensable. Performance must be good. Must be able to measure and compare latency and throughput. This testing framework will made by open source.
This testing framework is executes both Intel x86 and OpenPower machine. Both is connected by network because both is not build on same machine.
We cannot test all pattern so we generate random data to test. Input data both Intel x86 Systems and OpenPower Systems and check result value is equal.
Network Module may be made by Python and Testing module may be made by C language extension of Python
TOY-SRAM
Robert Montoye
October 28 2021
Microprocessors use multiple high-speed multiport register files to improve their performance. In BOOM, the high-performance RISC V, the custom register file took as much design effort as the rest of the design. The TOY-SRAM, open-source, multi-port memory system is designed to replace custom circuit design with a simple set of choices for the fab and the system designer. For the chip fabrication facility, it offers a canonical specification that can be optimized for use in a wide variety of applications. The system designer can then select the desired fast, low voltage friendly multiport memory from a menu of choices. This talk will discuss plans on 130 nm 5LM silicon which eases prototyping into future fabs while making the 10T SRAM the high bandwidth voltage scalable memory cell and encouraging fabs on its density.
coreboot is open source firmware development framework that powers more then
10% of desktops, laptops and workstations. It supports multiple architetcures:
x86, ARM, ARM64, RISC-V. Bringing it to OpenPOWER means introduction of
established and sizable community, diverse economy of licensed service
providers as well as less complexity during new hardware integration.
This talk will present the progress of porting the POWER9 architecture to
coreboot along with Talos II and Talos II Lite machines (FSF RYF certified).
This project became possible with the cooperation of 3mdeb Embedded Systems
Consulting and Insurgo Technologies Libres/Open Technologies. We will present
problems and hardships encountered during the work and other exciting stories
with the initial results of the first hostboot code audit conducted as part of
the work. Finally, we will present the current state of the booting process on
Talos II using coreboot.
We aim to bring togheter a group of people that is interested in the hardware that they use to build the platform of the future. There are a lot of SBC that are very affordable and performing but none of them are based on Power architecture.
I started in december 2020 to design the Django0's schematic, a SBC based on the last (in 2020) accessible low power Power Based CPU, the NXP T1040, but unfortunately it has a loss in terms of cost/performance, in relation with competitors.
The actual Django design is slowly turning in a proof of concept, but what we want to support is the knowledge of the requested know-how that can be reused for upcoming power architectures, and make it open and available to all the community.
We want to be ready for the next Open Power step!
Toshaan Bharvani the OpenPOWER Technical Steering Committee chair will be presenting the new suite of tools that OpenPOWER Foundation is using for collaboration and engagement with the OpenPOWER ecosystem. The foundation is adopting open source tools and collaboration platforms typical of other open source projects to expand community reach and drive engagement.
The industry has begun to reconfigure the standard memory topology, and these new configurations promise to make computing systems better than ever. Enormous pools of cache-coherent disaggregated “Far” memory will be made available to all processors, even coprocessors, and will be read and written using standard memory protocols no matter which processor accesses it. “Near” memory is set to move away from limiting DDR interfaces to support larger capacities at higher speeds while using less energy to move data around. What will this do to computing system architecture? How will it be supported? Which applications will benefit the most? Will other systems use this approach outside of the realm of supercomputers? In this session a panel of distinguished industry participants will share their sometimes-contradictory/sometimes-controversial views on these questions and more as the audience learns that there are many alternatives vying to win out in this space.
OpenPOWER Closing