HPE Cray EX Supercomputer
The Next Era of SuperComputing
Do you need a powerful solution to meet today's supercomputing challenges?
HPE Cray supercomputers enable you to tackle infrastructure challenges that require the fusion of modeling and simulation workloads with analytics, AI, and the Internet of Things to create a single business-critical workflow. Today's high-performance computing systems must be able to handle these massive and converged workloads, leading to a supercomputing sea-change.
With the imperative to navigate increasingly diverse and complex workloads, the next generation of supercomputers will be differentiated by exascale performance, data-centric workloads, and diversification of processor architectures.
HPE Cray supercomputers deliver application HPC and AI performance at scale, provide a flexible solution for tens to hundreds to thousands of nodes, and deliver consistent, predictable, and reliable performance, facilitating high productivity on large-scale workflows.
Flexible Hardware Infrastructure
HPE Cray supercomputers support multiple processor architectures and accelerator options. Additionally, they are architected for forward compatibility with next-generation blades and servers. HPE Cray supercomputers are available in two configurations.
For increased density and efficiency, the HPE Cray EX liquid-cooled cabinetry supports all components' direct liquid cooling in a highly dense bladed configuration. These cabinets can support processors up to 500W, and highly dense configurations of up to 512 processors per cabinet.
HPE Cray supercomputers are also available in a standard 19-inch rack configuration with HPE Cray software and HPE Slingshot networking, including a 19-inch Top of Rack HPE Slingshot switch. The current compute platform for the standard rack solution is the HPE Apollo 2000 Gen10 Plus System.
HPE Cray supercomputers' revolutionary design features the HPE Slingshot interconnect and delivers a high-performing interconnect solution built on high radix, 64-port switches which enable scaling to hundreds of thousands of nodes with only three hops in a Dragonfly topology.
The 64-port switch provides 12.8 Tb/s of bandwidth. Each port operates at 200 Gb/s per direction and can provide an Ethernet edge or HPC fabric functionality. Edge ports connect to supported Ethernet NIC or external routers at 100GbE or 200GbE.
The HPE Slingshot switch is available in a liquid-cooled blade form factor for the HPE Cray EX infrastructure and in a 2U air-cooled form factor for standard 19-inch rack deployments. The internal switch logic is the same for both environments.
HPE Slingshot contains several innovative features to consistently deliver reliable high performance under heavy usage, including adaptive routing that sends packets dynamically based on real-time, global information on load inside the network, and advanced congestion control mechanisms.
With a growing focus on data-centric computing and the convergence of AI and HPC workloads, interoperability has become an increasingly important consideration. HPE Slingshot is based on industry-standard Ethernet, which enables straightforward connectivity with standard datacenter environments.
Redesigned Software Stack
The HPE Cray supercomputer can maneuver the convergence of HPC, AI, and data analytics workloads, coupled with explosive data growth. Today’s supercomputers will have to handle exabytes of data in order to enable modern workloads to run in a productive, reliable, and expedient manner.
Built on decades of supercomputing expertise, the HPE Cray software stack adds the productivity of cloud and data center interoperability to the power of supercomputing to bring you a new standard in manageability, reliability, availability, and resiliency.
The stack provides a comprehensive HPE Cray System Management suite for administrators, a hardened low-jitter HPE Cray OS, as well as the HPE Cray Programming Environment software development toolchain for developers.
Integrated Storage Solution
Integrated with the HPE Cray supercomputers, the Cray ClusterStor E1000 Storage Systems is purpose-engineered to meet the demanding input/output requirements of supercomputers and HPC clusters in a very efficient way.
The parallel storage solution typically achieves the given HPC storage requirements with significantly fewer storage drives than alternative storage offerings, allowing HPC users with a fixed budget to spend more of their budget on CPU/GPU compute nodes accelerating time-to-insight.
The HPE Cray EX supercomputer is a liquid cooled blade-based, high-density clustered computer system designed from the ground up to deliver the utmost in performance, scale, and density. The basic building block of the HPE Cray EX Supercomputer is the Liquid Cooled cabinet. The cabinet is a sealed unit, uses closed-loop cooling technology, and does not exhaust heated air into the data center. Direct attached liquid cooled cold plates provide for efficient heat removal from high power devices including processors, GPUs, and switches via an auxiliary cooling distribution unit (CDU).
HPE Cray EX Supercomputer - Closed and With Trim
HPE Cray EX3000 Detail
A single cabinet can accommodate up to 64 compute blade slots within 8 compute chassis. The cabinet is not configured with any cooling fans. All cooling needs for the cabinet are provided by direct liquid cooling and the CDU. This approach to cooling provides greater efficiency for the rack-level cooling, decreases power costs associated with cooling (no blowers) and utilizes a single water source per CDU
One cabinet supports the following:
- 8 compute chassis
- 4 power shelves with a maximum of 6 rectifiers per shelf- 24 total 12.5 or 15kW rectifiers per cabinet
- 4 PDUs (1 per power shelf)
- 3 power input whips (3-phase)
- Maximum of 64 quad-blade compute blades
- Maximum of 64 Slingshot switch blades
The compute chassis is a mechanical assembly that provides power, cooling, system control, and network fabric for up to 8 compute blade slots. 8 chassis are installed in the 48U cabinet.
The features of the compute chassis are as follows:
- 8 compute blade slots
- 8 Slingshot switch blade slots
- One power/signal midplane
Blades have three basic sections: computation, memory, and I/O and consume one blade slot in the compute chassis. The following blade is designed for the HPE Cray EX Supercomputer
HPE Cray EX425
The features of this compute blade are as follows:
- 2 boards per blade. Each board contains two 2-socket nodes (total of 4 nodes per blade).
- Support for the full AMD 2nd Gen AMD EPYC™ 7002 series processor stack
- 8 DIMMs per socket (1DPC)
- Up to 64 GB DIMMs at up to 3200 GT/s
- Up to 8 Slingshot injection ports per blade
- 2 Board Management Controllers (BMC) per blade
- Cooled with cold plate Technology
The switches are in the switch chassis and mounted to the rear of the compute chassis. The purpose of the switch chassis is to provide a structure for orthogonally mounting the switch blades to the compute chassis. There is no backplane connecting the switches to the compute blades. Each compute blade directly connects to one or more switch blades in the switch chassis enabling a cableless connection. The switch chassis supports a maximum of eight switch blades.
The all-to-all Dragonfly Ethernet topology is supported in the HPE Cray EX SuperComputer: Dragonfly provides a lower cost and highly scalable alternative to traditional Fat Tree topologies. It leverages the use of high speed copper cables and reduces more expensive optical connections by up to 50%. In a dragonfly topology, every switch is connected to every other switch in a typical group size of 16 switches..
The below list provides a high level description of the Dragonfly topology including a summary of its capabilities.
- A 16 switch group can scale up to 37,120 nodes or 145 cabinets.
- A 32 switch group can scale up to 131,584 nodes or 257 cabinets.
- Low diameter network with no more than 3 switch hops between any two nodes in the network, even at scale.
- Most switch links are in the same cabinet and use low cost QSFP-DD copper cabling. Group to group links typically use QSFP-DD fiber.
- Configurable global bandwidth from ranging from ~25% to 100%.
- Built-in congestion management and adaptive routing at scale
CDU (Cooling Distribution Unit)
The cooling distribution unit (CDU) is a liquid-to-liquid heat exchanger that is used to remove heat from HPE Cray EX Supercomputer. The CDU uses a secondary loop to circulate a heat transfer liquid to the cold sinks. The heat captured in the secondary loop is transferred to the facilities primary loop via a liquid-to-liquid heat exchanger.
The CDU is designed to circulate and control the heat transfer fluid to the manifolds that are in each chassis in the cabinet. The CDU is rated for 1.2MW of cooling. One CDU supports a maximum of four cabinets
The CDU consists of a cabinet that includes a heat exchanger, circulating pump(s), control valve, sensors, controller, valves, and piping. The CDU monitors room conditions and prevents condensation by maintaining the secondary loop at a temperature above the room’s dew point.
All functions, such as switching pumps (if applicable), controlling water temperature, etc., are managed by the controller using user defined settings.
HPE Cray EX supercomputers are complete solutions with software and hardware that are tightly integrated and performance-tuned to offer the best system performance while bringing new standard in flexibility, manageability, and resiliency to supercomputing.
Cray supercomputer software stack addresses the needs of both system administrators, developers, and end-users.
HPE Cray System Management - a built-for-scale system management solution offering administrators all functionalities they need to keep the HPE Cray EX system healthy, utilized to the maximum and accommodating wide range of workload requirements via –aaS experience. The software is built to manage systems which can scale to Exascale deployments featuring:
- Comprehensive monitoring and management of all aspects of the system: CPU/GPU, network (integrated Cray Slingshot Fabric Manager), storage as well as power management and monitoring combined with provisioning for operational efficiency.
- Multi-tenancy and partitioning, batch or container orchestration enable customers to run a variety of HPC/AI/HPDA workloads the way that makes the best use of their system without logistical constraints.
- REST APIs & standard protocols enable full interoperability with existing monitoring, management, and automation toolsets.
HPE Cray Programming Environment – is a fully integrated software development suite offering programmers comprehensive set of tools for developing, porting, debugging, and tuning of their applications so they can shorten application development time and accelerate their performance.
The programming environment is designed to make porting of existing applications easier with minimal recording and changes to the existing programming models to simplify transition to the new hardware architectures and configurations, such as HPE Cray EX systems.
End User Software
HPE Cray OS is based on SLES with enhancements. The enhancements provide customers with capabilities specific to supercomputing and high-performance computing fully supported by HPE Pointnext. These modifications don't alter the ability to run standard Linux applications, but rather enhance it for performance, scale, and reliability. We integrate and test these materials together and package releases.
While HPE Cray System Management and HPE Cray Operating System are designed to support HPE Cray EX systems with HPE Slinghot, HPE Cray Programming Environment is a self-standing product which supports also other HPE and HPE Cray HPC systems (using InfiniBand interconnect). The whole software stack is supported by HPE Pointnext Services.
|HPE Cray EX Supercomputer Features|
|Operating system||HPE Cray Operating System|
|System Management and Fabric software||
|Workload Management and Orchestration||
|Software and Application Development Tools:||
HPE Cray Programming Environment
|DL/AI Tools:||Deep learning plugin|
HPE Pointnext Services leverages our strength in infrastructure, partner ecosystems, and the end-to-end lifecycle experience, to accelerate powerful, scalable IT solutions to provide you the assistance for faster time to value. HPE Pointnext Services provides a comprehensive portfolio including Advisory and Transformational, Professional, and Operational Services to help accelerate your digital transformation.
- HPE Datacenter Care: HPE’s most comprehensive support solution tailored to meet your specific data center support requirements. It offers a wide choice of proactive and reactive service levels to cover requirements ranging from the most basic to the most business-critical environments. HPE Datacenter Care Service is designed to scale to any size and type of data center environment while providing a single point of contact for all your support needs for HPE as well as selected multivendor products.
- HPE Critical Service: High-performance reactive and proactive support designed to minimize downtime. It offers an assigned support team, which includes an account support manager (ASM). This service offers access to the HPE Global NonStop Solution Center, 24x7 hardware and software support, six-hour call-to-repair commitment, enhanced parts inventory, and accelerated escalation management.
- HPE Proactive Care: Provides proactive and reactive support delivered under the direction of an ASM. It offers 24x7 hardware support with four-hour on-site response, 24x7 software support with a two-hour response, and flexible call submittal.
- HPE Foundation Care: Support for HPE servers, storage, networking hardware, and software to meet your availability requirements with a variety of coverage levels and response times.
Advisory and Transformation Services
Advisory and Transformation Services—HPE Pointnext Services designs the transformation and builds a road map tuned to your unique challenges including hybrid cloud, Workload and Application Migration, Big Data, and the edge. Hewlett Packard Enterprise leverages proven architectures and blueprints, as well as integrates with partner products and solutions. We also engage the Professional and Operational Services teams as needed.
Professional Services—HPE Pointnext Services creates and integrates configurations that get the most out of software and hardware, and works with your preferred technologies to deliver the optimal solution. Services provided by the HPE Pointnext Services team, certified channel partners, or specialist delivery partners include installation and deployment services, mission-critical and technical services, and education services.
- Pricing and product availability subject to change without notice.