4U AI Training Server with Habana Gaudi AI Processors and SynapseAI Software
Demand for high-performance AI/Deep Learning (DL) training compute has doubled in size every 3.5 months since 2013 (according to OpenAI) and is accelerating with the growing size of data sets and the number of applications and services based on computer vision, natural language processing, recommendation systems, and more. With the increased demand for greater training performance, throughput, and capacity, the industry needs training systems that offer increased efficiency, lower cost, ease of implementation, flexibility to enable customization, and scaling of training systems. AI has become an essential technology for diverse areas such as virtual assistants, manufacturing operations, autonomous vehicle operations, and medical imaging, to name a few. Supermicro has partnered with Habana Labs to address these growing requirements.
Supermicro X12 Gaudi AI Training System: SYS-420GH-TNGR
The Supermicro X12 Gaudi AI Training System prioritizes two key real-world considerations: training AI models as fast as possible, while simultaneously reducing the cost of training. It features eight Gaudi HL-205 mezzanine cards, dual 3rd Gen Intel® Xeon® Scalable processors, two PCIe Gen 4 switches, four hot swappable NVMe/SATA hybrid hard drives, fully redundant power supplies, and 24 x 100GbE RDMA (6 QSFP-DDs) for unprecedented scale-out system bandwidth. This system contains up to 8TB of DDR4-3200MHz memory, unlocking the Gaudi AI processors' full potential. The HL-205 is OCP-OAM (Open Compute Project Accelerator Module) specification compliant. Each card incorporates a Gaudi HL-2000 processor with 32GB HBM2 memory and ten natively integrated ports of 100GbE RoCE v2 RDMA.
The system enables high-efficiency AI model training for a wide array of applications:
Computer vision applications:
- Manufacturing defect detection, resulting in better products with fewer warranty issues
- Fraud detection, saving billions of dollars annually
- Inventory management, allowing enterprises to become more efficient
- Medical imaging to detect abnormalities
- Identification of photos and videos to enhance security.
- Question answering
- Subject matter queries
- Chatbots and translations
- Sentiment analysis for recommendation systems
|Habana Gaudi AI Training System Specifications|
|Processor Support||Dual 3rd Gen Intel® Xeon® Scalable processors, Socket P+ (LGA-4189), up to 270W TDP|
|System Memory||32x DIMM slots, 3200/2933/2666MHz ECC DDR4 RDIMM/LRDIM|
|AI Processors||8x Habana Gaudi AI processors on OAM mezzanine cards, 350W TDP, passive heatsinks|
|Expansion Slots||Dual x16 PCI-E AIOM (SFF OCP 3.0 superset) plus single x16 PCI-E 4.0 full height, half-length expansion slot|
|Connectivity||1x 10GbE dedicated IPMI LAN via RJ45, 6x 400Gb QSFP-DD ports, 2x USB 3.0|
|VGA/Audio||VGA via BMC|
|Drive Bays||4x internal 2.5" hot-swap NVMe/SATA/SAS Drive Bays.|
|Storage||2x M.2 NVMe OR 2x M.2 SATA3|
|Power Supply||4x 3000W redundant power supplies, 80+ Titanium level
Tested power draw: 4922W*
|Cooling System||5x removable heavy-duty fans|
|Operating Temperature||10°C ~ 35°C (50°F ~ 95°F)|
|Form Factor||178 x 447 x 813mm (7" x 17.6" x 32")|
|Weight||Gross weight: 137lbs (62kg)|
|* Tested configuration:|
More hardware details are available on the product spec page.
Habana Gaudi AI Processor
The Habana® Gaudi® AI processor is designed to maximize price-performance, ease of use and scalability. Training on Gaudi AI processors provides:
Gaudi Training Efficiency
Architected to optimize AI performance, Gaudi delivers higher efficiency than traditional processor architectures:
- Heterogeneous compute architecture to maximize training efficiency
- Eight fully programmable, AI-customized Tensor Processor Cores
- Configurable centralized GEMM engine (matrix multiplication engine)
- Software managed memory architecture with 32 GB of HBM2 memory
Gaudi Scaling Efficiency
Native integration of 10 x 100 Gigabit Ethernet RoCE ports onto every Gaudi AI processor
- Eliminates network bottlenecks
- Standard Ethernet inside the server and across nodes can scale from one to thousands of Gaudi processors
- Lowers total system cost and power by reducing discrete components
Each of the Gaudi AI processors dedicates seven of its ten 100GbE RoCE ports to an all-to-all connectivity within the system, with three ports available for scaling out for a total of 24 x100GbE RoCE ports per 8-card system. This allows end customers to scale their deployment using standard 100GbE switches, thus achieving overall system cost advantages. The high throughput of RoCE bandwidth inside and outside the box and the unified standard protocol used for scale-out make the solution easily scalable and cost-effective. This diagram shows a system with eight Gaudi HL-205 processors and the communication paths between the AI processors and the server CPUs.
Plug-and-Play AI Training Cluster Solution:
Gaudi’s integration of compute and networking functionality enables easy and near-linear scaling of Gaudi systems from one to thousands. Supermicro supports full AI data center clusters including AI inferencing (leveraging Habana Goya inference processors), CPU node and storage servers, networking systems, and complete rack solutions. As an early implementation example, the Supermicro X12 AI Training server is being deployed in the San Diego Supercomputing Center on the University of California, San Diego campus build of the Voyager supercomputer, with its 42-node scale-out.
SynapseAI Software Stack for Gaudi Systems:
The SynapseAI® software stack is optimized for the Gaudi hardware architecture and designed for ease of Gaudi use. It was created with the needs of developers and data scientists in mind, providing versatility and ease of programming to address end-users’ unique needs, while allowing for simple and seamless building of new models and porting of existing models to Gaudi. SynapseAI software facilitates developers’ ability to customize Gaudi systems, enabling them to address their specific requirements and create their own custom innovations.
Features of the SynapseAI stack:
- Integrated TensorFlow and PyTorch frameworks
- Support for popular computer vision, NLP and recommendation models
- TPC programming tools: compiler, debugger and simulator
- Extensive Habana kernel library and library for Customer Kernel development
- Habana Communication Libraries (HCL and HCCL)