Habana Gaudi2 processor demonstrates two-times throughput over Nvidia’s A100 GPU.
What’s New: Today at Intel Vision, Intel announced that Habana Labs, its data center team focused on AI deep learning processor technologies, launched its second-generation deep learning processors for training and inference: Habana® Gaudi®2 and Habana® Greco™. These new processors address an industry gap by providing customers with high-performance, high-efficiency deep learning compute choices for both training workloads and inference deployments in the data center while lowering the AI barrier to entry for companies of all sizes.
“The launch of Habana’s new deep learning processors is a prime example of Intel executing on its AI strategy to give customers a wide array of solution choices – from cloud to edge – addressing the growing number and complex nature of AI workloads. Gaudi2 can help Intel customers train increasingly large and complex deep learning workloads with speed and efficiency, and we’re anticipating the inference efficiencies that Greco will bring.”
Why It Matters: The new Gaudi2 and Greco processors are purpose-built for AI deep learning applications, implemented in 7-nanometer technology and manufactured on Habana’s high-efficiency architecture. At Intel Vision, Habana Labs revealed Gaudi2’s training throughput performance for the ResNet-50 computer vision model and the BERT natural language processing model delivers twice the training throughput over the Nvidia A100-80GB GPU.
“Compared with the A100 GPU, implemented in the same process node and roughly the same die size, Gaudi2 delivers clear leadership training performance as demonstrated with apples-to-apples comparison on key workloads,” said Eitan Medina, chief operating officer at Habana Labs. “This deep-learning acceleration architecture is fundamentally more efficient and backed with a strong roadmap.”
Gaudi2 deep learning processors deliver:
- Deep learning training efficiency: The Habana Gaudi2 processor significantly increases training performance, building on the same high-efficiency first-generation Gaudi architecture that delivers up to 40% better price performance in the AWS cloud with Amazon EC2 DL1 instances and on-premises with the Supermicro Gaudi Training Server. With a leap in process from 16 nm Gaudi to 7 nm, Gaudi2 provides a significant boost to its compute, memory and networking capabilities. Gaudi2 also introduces an integrated media processing engine for compressed media and offloading the host subsystem. Gaudi2 triples the in-package memory capacity from 32GB to 96GB of HBM2E at 2.45TB/sec bandwidth, and integrates 24 x 100GbE RoCE RDMA NICs, on-chip, for scaling-up and scaling-out using standard Ethernet.
- Customer benefits: Gaudi2 provides customers a higher-performance deep learning training alternative to existing GPU-based acceleration, meaning they can train more and spend less, helping to lower total cost of ownership in the cloud and data center. Built to address many model types and end-market applications, customers can benefit from Gaudi2’s faster time-to-train, which can result in faster time-to-insights and faster time-to-market. Gaudi2 is designed to significantly improve vision modeling of applications used in autonomous vehicles, medical imaging and defect detection in manufacturing, as well as natural language processing applications.
- Networking capacity, flexibility and efficiency: Habana has made it cost-effective and easy for customers to scale out training capacity by amplifying training bandwidth on second-generation Gaudi. With the integration of industry standard RoCE on chip, customers can easily scale and configure Gaudi2 systems to suit their deep learning cluster requirements. With system implementation on widely used industry-standard Ethernet connectivity, Gaudi2 enables customers to choose from a wide array of Ethernet switching and related networking equipment, enabling cost savings. Avoiding proprietary interconnect technologies in the data center (as are offered by competition) is important for IT decision-makers who want to avoid single vendor “lock-in.” The on-chip integration of the networking interface controller (NIC) ports also lowers component costs.
- Simplified model build and migration: The Habana® SynapseAI® software suite is optimized for deep learning model development and to ease migration of existing GPU-based models to Gaudi platform hardware. SynapseAI software supports training models on Gaudi2 and inferencing them on any target, including Intel® Xeon® processors, Habana Greco or Gaudi2 itself. Developers are supported with documentation and tools, how-to content and a community support forum on the Habana Developer Site with reference models and model roadmap on the Habana GitHub. Getting started with model migration is as easy as adding two lines of code; for expert users who wish to program their own kernels, Habana offers the full tool suite.
- About Availability of Gaudi2 Training Solutions: Gaudi2 processors are now available to Habana customers. Habana has partnered with Supermicro to bring the Supermicro Gaudi2 Training Server to market this year. Habana also teamed up with DDN® to deliver turnkey rack-level solutions featuring the Supermicro server with augmented AI storage capacity with the pairing of the DDN AI400X2 storage solution.
What Customers and Partners are Saying:
Mobileye: “As a world leader in automotive and driving assistance systems, training cutting-edge deep learning models for tasks such as object detection and segmentation that enable vehicles to sense and understand their surroundings is mission-critical to Mobileye business and vision,” said Gaby Hayon, executive vice president of R&D at Mobileye. “As training such models is time-consuming and costly, multiple teams across Mobileye have chosen to use Gaudi-accelerated training machines, either on Amazon EC2 DL1 instances or on-prem. Those teams consistently see significant cost savings relative to existing GPU-based instances across model types, enabling them to achieve much better time-to-market for existing models or training much larger and complex models aimed at exploiting the advantages of the Gaudi architecture. We’re excited to see Gaudi2’s leap in performance, as our industry depends on the ability to push the boundaries with large-scale high performance deep learning training accelerators.”
Leidos: “The rapid-pace R&D required to tame COVID demonstrates an urgent need our medical and health sciences customers have for fast, efficient deep learning training of medical imaging datasets – when hours and even minutes count – to unlock disease causes and cures,” Chetan Paul, vice president for Technology Innovation, Government Health and Safety Solutions at Leidos. “We expect Gaudi2, building on the speed and cost-efficiency of first-gen Gaudi, to provide customers with dramatically accelerated model training, while preserving the DL efficiency we experienced with first-gen Gaudi.”
Supermicro: “We’re excited to bring our next-generation AI deep learning server to market featuring the high-performance 7 nm Gaudi2 processor that will enable our customers to achieve faster time-to-train advantages while preserving the efficiency and expanding on the scalability of first-generation Gaudi,” said Charles Liang, Supermicro CEO.
DDN: “We congratulate Habana on the launch of its new high-performance, 7 nm Gaudi2 accelerator. We look forward to collaborating on the turnkey AI solution consisting of our DDN AI400X2 storage appliance combined with Supermicro Gaudi2 Training Servers to help enterprises with large, complex deep learning workloads unlock meaningful business value with simple but powerful storage,” said Paul Bloch, president and co-founder of DataDirect Networks.
More Context: Habana Labs Launches Gaudi2 Deep Learning Training Processor (Fact Sheet) | Habana Gaudi2 (White Paper) | Intel Vision 2022 (Press Kit) | Intel Vision 2022 Keynote (Livestream) | Intel Vision 2022: Day 1 Keynote (Live Blog) | Intel Announces New Cloud-to-Edge Technologies to Solve Challenges of Today and Tomorrow (News) | 12th Gen Intel Core HX Processors Launch as World’s Best Mobile Workstation Platform (News)
The Small Print:
For workloads and configurations, visit the Vision section at www.intel.com/PerformanceIndex. Results may vary.
Intel (Nasdaq: INTC) is an industry leader, creating world-changing technology that enables global progress and enriches lives. Inspired by Moore’s Law, we continuously work to advance the design and manufacturing of semiconductors to help address our customers’ greatest challenges. By embedding intelligence in the cloud, network, edge and every kind of computing device, we unleash the potential of data to transform business and society for the better. To learn more about Intel’s innovations, go to newsroom.intel.com and intel.com.