跳转到主要内容

Supermicro® Total Solution for Machine Learning

Supermicro and Canonical have partnered to deliver solutions that feature TensorFlow machine learning.

This solution is built and validated with Supermicro SuperServers, SuperStorage systems, and Supermicro Ethernet switches that are optimized for performance and designed to provide the highest levels of reliability, quality and scalability.

Canonical, the company behind Ubuntu, helps organizations make the most of Ubuntu. The Canonical Distribution of Kubernetes (CDK) is pure upstream Kubernetes tested across the widest range of clouds. Canonical also provides a rich ecosystem of tools, libraries, services, modern metrics, and monitoring tools to make CDK easy to consume so you can innovate faster.

Kubeflow is an open source project dedicated to providing easy-to-use Machine Learning (ML) resources on top of a Kubernetes cluster. Most prominently, Kubeflow eases the installation of TensorFlow and provides the mechanisms for leveraging GPUs attached to the underlying host in the execution of ML jobs submitted to it. TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.

Supermicro + Canonical Machine Learning Certified Platforms

Build your Machine Learning solution with Supermicro and Canonical.

Supermicro & Canonical+Tensorflow Rack Diagram

Highlights

  • Validated reference architectures
  • Certified components
  • Scale out – One rack to many racks
  • Greenest Servers for the Cloud – Save hundreds of dollars per server
  • Lowest Cost – Best Performance / Watt / $ / ft²
  • Start as a pro by leveraging expertize support and services

Enterprise support for Canonical Distribution of Kubernetes and Kubeflow is provided by Canonical in partnership with Supermicro whereby customers gain access to a global pool of knowledge & expertise. The partnership offers a Discovery and Design Service - together, we design your infrastructure to the required size and specifications.

Supermicro Total Solutions for Canonical Machine Learning

Description Canonical Machine Learning Solution
# of Cores 216 Cores
Total Memory 3072 GB
Raw Storage 24 TB
Height 19U

SKU Details

  Machine Learning SKU Qty
Components Used
Infrastructure Node SYS-6019U-TN4RT 3
Cloud Node SYS-2029GP-TR 6
Cloud Node Data Disks U.2 NVMe Drives (2 TB) 12 (2 Per Node x 6)
HDS-IUN2-SSDPE2KX020T8
Cloud Node GPU NVIDIA Tesla V100 16GB GPUs 12 (2 per node x 6)
GPU-NVTV100-16
Software Licenses
Ubuntu Advantage Advanced (3 Years) SVC-CNC-SVR-AS 9
Ubuntu Kubernetes Discoverer SVC-CNCFC-FOB 1
Services
DataCenter Design, Validation and Bootstrapping Services* 15
Supermicro Rack Integration Service** 1
Supermicro Onsite Support 12
* Consult Supermicro for pricing and quotation of this service.
** Racking & Cabling (with 3rd party switches); Racking & Cabling Engineering Drawing; Supermicro will not be responsible for 3rd party switch configuration.

Networking Options (with 10, 25, or 40GB Data Switches)

Reference configurations include two types of Ethernet switches - one for consolidation of management/IPMI traffic and another for use in networking data traffic. The 1GbE management switch is common to all three networking options. The data switch options range from a 10Gbps, 25Gbps, to a 40Gbps switch.

  10 GbE Data Network with Cumulus OS3 25 GbE Data Network with SMIS OS 40 GbE Data Network with Cumulus OS3 Qty
Management Switch SSE-G3648B SSE-G3648B SSE-G3648B 2
Data Switch SSE-X3648S SSE-F3548S SSE-C3632S 2
Infrastructure Node NICs AOC-STGF-I2S-O AOC-S25G-M2S AOC-S40GI2Q 6 (2 Per Node x3)
Cloud Node NICs AOC-STGF-I2S-O AOC-S25G-M2S AOC-S40GI2Q 12 (2 Per Node x6)
Software Licenses (for networking)
Management Switch Software perpetual license with 3 yr service /support SFT-CLSPL1G-3Y1 SFT-SMCPL1G2 SFT-CLSPL1G-3Y1 2
Data Switch Software perpetual license with 3 yr service /support SFT-CLSPL10G-3Y1 Included with switch2 SFT-CLSPL100G-3Y1 2

1 – The 10 GbE Data switch and 40 GbE Data switch options require a Cumulus OS for all switches in the solution. The Cumulus OS licenses for both the data and management switches are obtained through Supermicro using the provided SKUs.

2 – The 25 GbE Data switch option requires the Supermicro (SMIS) OS for all switches in the solution. The SMIS OS for the management switch is obtained using the provided SKU. The SMIS OS for the data switch is included with the switch.

3 – Cumulus Linux is a powerful open network operating system that allows you to automate, customize and scale:
www.cumulusnetworks.com/products/cumulus-linux/

Additional Services that can be purchased

Additional Services
Supermicro Onsite Integration Service**
Ubuntu Advantage Server - Essential / Standard / Advanced Server
Ubuntu Advantage Professional Service
Ubuntu Bootstack Program
Ubuntu BootStack Professional Services
Ubuntu Advantage Travel & Expenses
Ubuntu BootStack CEPH / Swift Storage Add-on

** Requires SOW Onsite Survey; Onsite logistics; Racking & Cabling (with 3rd party switches); Racking & Cabling Engineering Drawing

Network Component Details

(Cluster Role: Data Switch)
  • 48x 10Gb Ethernet ports - SFP+
  • 6x 40 Gb Ethernet ports - QSFP+
  • RJ-45 (for console cable)
  • RJ-45 1G Ethernet Management Port
  • USB
  • Switching Capacity: 1440 Gbps
  • Wire-speed Layer 3 Routing
  • 1:1 Non-blocking connectivity
  • Reverse Airflow option available
  • Dual redundant hot-swappable power supplies
  • 1U form factor
(Cluster Role: Data Switch)
  • 48x 25Gb Ethernet ports - SFP28
  • 6x 100Gb Ethernet ports - QSFP28
  • RJ-45 (for console cable)
  • RJ-45 1G Ethernet Management Port
  • USB
  • Switching Capacity: 3.6 Tb
  • 1:1 Non-blocking connectivity
  • Reverse Airflow option available
  • Dual redundant hot-swappable power supplies
  • 1U form factor
(Cluster Role: Data Switch)
  • 32x Ethernet QSFP28 ports – either 40Gbps or 100Gbps
  • 1x 10Gb Ethernet SFP+ port
  • RJ-45 Gb Ethernet management port
  • RJ-45 serial console
  • Type A USB 2.0 port
  • Full Duplex 3.2Tbps Switching Capacity
  • Reverse Airflow option available
  • Dual redundant hot-swappable power supplies
  • 1U form factor
(Cluster Role: Management Switch)
  • 48x 1Gbps Ethernet RJ45 ports
  • 4x 10Gbps Ethernet SFP+ ports
  • RJ-45 Gb Ethernet management port (for console cable)
  • Type A USB 2.0 port
  • Aggregated switching Capacity - 176 Gbps
  • Non-blocking, wire-speed Layer 3 Routing
  • Reverse Airflow option available
  • Second (redundant) hot-swappable power supply - optional
  • 1U form factor

Certified Nodes for Canonical Machine Learning Solution

(Cluster Role: Infrastructure Node)
  • 28 (2 * 14) Cores
  • 384 (12 * 32) GB RAM
  • 12 TB (2x 6TB) SATA HDD
  • Intel DC P4510 500GB, NVMe PCI-E 3.1 (Cache)
  • 2 RJ45 10GBase-T Ethernet ports
(Cluster Role: Cloud Node)
  • 36 (2 * 18) Cores
  • 512 (16 * 32) GB RAM
  • 8TB (2x 4TB), SATA HDD (OS)
  • Intel DC P4610 1.6TB NVMe PCIe 3.1 (Cache)
  • 2x NVIDIA Tesla V100 GPUs
  • Make sure to add data disks, GPUs and "NICs for data" separately
(Cluster Role: Cloud Node)
  • 36 (2 * 18) Cores
  • 512 (16 * 32) GB RAM
  • 8TB (2x 4TB), SATA HDD (OS)
  • Intel DC P4610 1.6TB NVMe PCIe 3.1 (Cache)
  • 2x NVIDIA Tesla V100 GPUs
  • 4 RJ45 10GBase-T Ethernet ports 
  • Make sure to add data disks, GPUs and "NICs for data" separately