Microsoft unveils custom AI chips for its data centres
Written by Rebecca Uffindell Mon 20 Nov 2023Microsoft
Microsoft has unveiled two custom-designed silicon artificial intelligence (AI) chips, created to address the growing demand for sustainable computing power.
The first chip called the Maia 100 AI Accelerator will power some of the largest internal AI workloads running on Microsoft Azure. It was designed specifically for the Azure hardware stack.
The second chip titled the Microsoft Azure Cobalt CPU is an Arm-based processor chip tailored to run general-purpose compute workloads on the Microsoft Cloud.
Microsoft’s Corporate Vice President, Wes McCullough, said Microsoft’s new chip architecture and implementation were designed with power efficiency in mind.
The company aims to optimise ‘performance per watt’ across its data centres, which means acquiring more computing power for each unit of energy consumed.
“We are making the most efficient use of the transistors on the silicon. Multiply those efficiency gains in servers across all our data centres, it adds up to a pretty big number,” said McCullough.
Microsoft’s Transition to Silicon
Before 2016, Microsoft bought mass-produced, uncustomised components for its data centres.
Following 2016, the company adopted custom-built servers and racks to cut costs, enhance user experience, and keep up with the industry shift to cloud computing.
Custom silicon enabled targeted performance for critical workloads, with a testing process replicating real-world conditions in Microsoft data centres.
Microsoft said the new architecture improves cooling efficiency whilst optimising current data centre assets. This maximised server capacity within the existing data centre footprint.
The ability to build its own custom silicon allows Microsoft to target certain qualities and ensure that the chips perform optimally on its most important workloads.
The company’s testing process included determining how every single chip will perform under different frequency, temperature, and power conditions for peak performance. Testing also took place in the same conditions and configurations that it would experience in a real-world Microsoft data centre.
“Microsoft innovation is going further down in the stack with this silicon work to ensure the future of our customers’ workloads on Azure … We chose this innovation intentionally so that our customers are going to get the best experience they can have with Azure today and in the future,” said McCullough.
Plans for Microsoft’s new Chips
The new chips will debut in Microsoft’s data centres early next year, initially powering services like Microsoft Copilot AI assistant and its Azure OpenAI Service. The chips will be installed onto custom server boards and placed within tailor-made racks within existing Microsoft data centres.
“Microsoft is building the infrastructure to support AI innovation, and we are reimagining every aspect of our data centres to meet the needs of our customers,” said Scott Guthrie, Executive Vice President of Microsoft’s Cloud and AI Group.
Microsoft plans to expand its chip offerings in the future. The company is already designing second-generation versions of the Azure Maia AI Accelerator series and the Azure Cobalt CPU series.
Microsoft said its partner OpenAI, the creator of AI chatbot ChatGPT, has given feedback on Azure Maia, and Microsoft is gaining insights into how OpenAI’s workloads operate on infrastructure designed for large language models. This information is guiding the development of future designs by Microsoft.
Sidekick for new Microsoft Maia Chip Announced
Microsoft had to custom build its own servers and racks to house its new Maia 100 server boards, as well as drive down costs.
These racks are wider than what typically is found within Microsoft’s data centres, providing space for both power and networking cables critical for the demands of AI workloads.
Microsoft’s current data centres were not designed for large liquid chillers, so Microsoft developed a ‘sidekick’ to sit next to the Maia 100 rack. Working similarly to a car radiator, cold liquid flows to cold plates attached to the surface of the Maia 100 chips.
Each plate has channels through which liquid is circulated to absorb and transport heat. The liquid then moves to the sidekick which is responsible for extracting heat from the liquid. The liquid is then sent back to the rack to absorb more heat, creating a continuous cycle of heat absorption and removal.
The interplay between the data centre rack and the sidekick underscores the necessity of a systems approach to infrastructure. Microsoft has shared its design learnings from its custom rack with industry partners.
“All the things we build, whether infrastructure or software or firmware, we can leverage whether we deploy our chips or those from our industry partners,” said McCullough.
Microsoft Expands Infrastructure Options
Microsoft also announced it will expand its industry partnerships to provide more infrastructure options for users.
Microsoft launched a preview of the new NC H100 v5 Virtual Machine Series built for NVIDIA H100 Tensor Core GPUs. This offering provides greater performance, reliability, and efficiency for mid-range AI training and generative AI inferencing.
Microsoft will also add the latest NVIDIA H200 Tensor Core GPU to its fleet next year to support larger model inferencing with no increase in latency.
The company also announced it will be adding AMD MI300X accelerated virtual machines to Azure. The ND MI300 VMs are designed to accelerate the processing of AI workloads for high-range AI model training and generative inferencing. It will feature AMD’s latest GPU, the AMD Instinct MI300X.
By adding first-party silicon to a growing ecosystem of chips and hardware from industry partners, Microsoft said it will be able to offer more choices in price and performance for its customers.
“Customer obsession means we provide whatever is best for our customers, and that means taking what is available in the ecosystem as well as what we have developed,” said Rani Borkar, Corporate Vice President for AHSI.
Borkar added that Microsoft will continue to work with all of their partners to address user needs.
“This is a choice the customer gets to make, and we are trying to provide the best set of options for them, whether it’s for performance or cost or any other dimension they care about,” said Pat Stemen, Partner Program Manager on the Azure Hardware Systems and Infrastructure (AHSI) team.
Hungry for more tech news?
Sign up for your weekly tech briefings!
Written by Rebecca Uffindell Mon 20 Nov 2023
Most Viewed News
February 21, 2024Ransomware group LockBit disrupted by the UK’s NCA along with FB...
February 20, 2024Virgin Media O2 reveals NetCo, a new company to rival BT Openreach
February 19, 2024Microsoft invests £2.7bn in Germany for AI expansion and training
February 20, 2024Echelon Data Centres receives £673m investment from Starwood Capital