News Categories

Meta’s new NVIDIA based Grand Teton AI platform to sit in hyperscale data centres soon

By Ken Wong - on 19 Oct 2022, 6:15pm

Meta’s new NVIDIA based Grand Teton AI platform to sit in hyperscale data centres soon

Grand Treton sits in a single chassis. Image source: Meta.

Announced at the Open Compute Project conference 2022, Meta’s new Grand Teton, is their next-gen platform for AI at scale.

In a blog post, Meta said that AI at scale is one of the greatest challenges facing the tech industry and that as the company moves into the metaverse, the need for new open innovations to power AI becomes even clearer.

According to Meta, Grand Teton, is another NVIDIA-based GPU-based hardware platform, and a follow-up to their previous Zion-EX platform. Grand Teton uses NVIDIA H100 Tensor Core GPUs to train and run AI models that are rapidly growing in size and capabilities, requiring greater computing power.

The NVIDIA Hopper architecture, on which the H100 is based, includes a Transformer Engine to accelerate work on these neural networks (by up to six times what was achievable previously), which are often called foundation models because they can address an expanding set of applications from natural language processing to healthcare, robotics and more.

The NVIDIA H100 is designed for performance as well as energy efficiency. H100-accelerated servers, when connected with NVIDIA networking across thousands of servers in hyperscale data centres, can be 300x more energy efficient than CPU-only servers.

Meta's timeline of AI platforms to date. Image source: Meta.

Alexis Bjorlin, Meta’s Vice President for Engineering said in the blog post:

As AI models become increasingly sophisticated, so will their associated workloads. Grand Teton has been designed with greater compute capacity to better support memory-bandwidth-bound workloads at Meta, such as our open-source DLRMs. Grand Teton’s expanded operational compute power envelope also optimises it for compute-bound workloads, such as content understanding.

Compared to the company’s previous generation Zion EX platform, the Grand Teton system packs in more memory, network bandwidth and compute capacity. This gives it multiple performance enhancements over its predecessor, Zion, such as four times the host-to-GPU bandwidth, twice the compute and data network bandwidth, and twice the power envelope. It also comes with an integrated chassis in contrast to Zion-EX, which comprised multiple independent subsystems.

The added network bandwidth and memory enable Meta to create larger clusters of systems for training and running larger AI models, Bjorlin said.

Join HWZ's Telegram channel here and catch all the latest tech news!
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.