A deep dive into liquid cooling
Thu 9 Jul 2020 | Robert Bunger
Liquid cooling is becoming increasingly viable for mainstream applications, but there are a number of options out there
It was once the preserve of older-style legacy mainframe computers, and until recently was considered by many as only applicable for high-performance computing (HPC) requirements.
However, liquid cooling is today becoming a serious contender for mainstream applications, especially those emerging at the edge of the network.
Deployed in unmanned, remote sites where high levels of reliability and low maintenance are key considerations, edge computing environments must remain as secure and resilient as their larger counterparts.
Furthermore, energy consumption at the edge is fast becoming another key consideration, therefore edge data centres must also be powered and cooled both as cost effectively and efficiently as possible.
Many data centre operators strive to build facilities that are energy-efficient and increasingly “green” as concern for the environment, combined with ever more stringent regulations encourages operators to minimise their carbon footprint.
However, at the rack and server level, many conventionally designed data centres will rely heavily on airflow for cooling. Fans blow cool air over the components that generate the most heat—typically the processing units—before the heated exhaust air is then removed from the white space.
This presents a new point of discussion in the industry, fans are themselves great consumers of power and their excessive use will not only incur costs but may also negatively affect a data centres PUE (Power Usage Effectiveness).
Like any electro-mechanical device, fans can fail, impacting the overall effectiveness of the cooling system and increasing both maintenance and service costs.
As processing units become more powerful, and the use of high-powered chips such as graphical processing units (GPUs) become more popular, the challenge of cooling data centres in a cost-effective, electrically efficient and environmentally sustainable way is leading many in the industry to consider the options of liquid cooling.
In a typical server, between 70 and 80 percent of the heat is generated by the CPU, with devices such as memory, power supplies and hard drives consuming most of the rest.
Some GPUs can generate over 400W, but the increasing popularity of multi-core processors means that many “ordinary” CPUs are approaching similar power dissipation levels.
For these types of applications, there are a number of liquid-cooling options to consider, essentially divided between Direct-to-chip (DTC) cooling, which cools just the hottest parts of a server, i.e., the CPU; and immersive liquid cooling in which either a rack-level chassis, or the equivalent of an entire rack, is submerged into a dielectric coolant.
Types of Liquid Cooling
Direct-to-chip (DTC) cooling, also known as “Cold Plate” cooling, uses a piped circuit to bring liquid coolant — which may be water — to a heat exchanger directly on top of the CPU or memory modules.
IT fans are still required to capture all of the heat generated by the server, so although the airflow infrastructure is reduced, it is not eliminated completely.
Manifolds installed at the back of the rack are used to distribute the fluid, in a manner analogous to a power distribution unit (PDU).
Here the electronic components will never come into direct contact with the liquid and care must be taken to ensure that the cooling medium is not spilled or dripped on to the circuit boards, unless of course, the coolant is a dielectric type fluid.
In the simplest “single-phase” direct-to-chip systems, the liquid remains liquid throughout the cycle. A variant on this theme is “two-phase” direct-to-chip cooling in which the coolant changes phase from liquid to gas during the cooling process.
Such systems must use a properly engineered dielectric fluid instead of water, which is not suitable due to the risk of evaporated steam escaping, and later condensing, on the server boards, causing potential short circuits or component corrosion.
Overall, two-phase systems provide better heat rejection than single-phase, but require additional controls to ensure proper operation.
Chassis-based immersion cooling
For immersive cooling there are also several options to consider.
A single-phase chassis-cooling system will see a server card encapsulated in a sealed chassis and fully, or partially, immersed in a dielectric liquid coolant.
All fans inside the server can be removed, thereby reducing the associated energy costs and producing a far quieter environment, ideal for edge computing applications deployed in densely populated areas.
The sealed, fan-less nature of the system also makes it perfect for harsh, ruggedised or dirty environments, meaning service call-outs are reduced.
The chassis can also be configured as a normal rack-mounted system or as a standalone unit.
Here cooling via the dielectric fluid occurs either passively via conduction and natural convection, or via forced convection, in which the liquid is actively pumped within the servers. Heat exchangers outside the server will also reject the heat trapped by the liquid.
Another route to adopting the technology is ‘Tub cooling’, where instead of just immersing a single chassis in a rack, it is possible to immerse an entire rack in a cooling fluid.
This type of liquid cooling can be delivered in single or two-phase configurations and sees a rack laid on its side, immersed in a tub of dielectric coolant. Once again, all components are covered by the liquid from which the heat captured is transferred to a water loop via a heat exchanger for cooling.
Considering the trade-offs of liquid cooling
When choosing a liquid cooling architecture several trade-offs must be considered. Deployment in a green-field site designed from the outset to incorporate liquid cooling will be far less costly in terms of capital expenditure (CapEx) than retrofitting an existing air-cooled facility.
Direct-to-chip cooling can account for between 50 and 80 percent of IT heat capture, however, there is an increased cost to bring water to each rack. This is offset by a reduction in traditional cooling gear such as chillers and computer room air handlers (CRAHs).
For immersive cooling systems, more than 95% of heat can be removed by the liquid, resulting in a major reduction in the need for traditional cooling systems. There is an offset owing to the cost of the fluids themselves, which can be considerable in some cases.
Operating costs can also be greatly reduced owing to the reduction in fan energy needed. In Chassis-based immersion cooling no fans are needed, or indeed possible, in the racks and less ambient cooling is needed from CRAH units.
Servicing liquid cooled systems will present certified data centre service partners or Managed Service Providers (MSP’s) with a new set of challenges.
Handling procedures will become more intricate as care must be taken not to contaminate IT equipment with water, if it is being used as the cooling medium, or to lose more expensive dielectric than is absolutely necessary through careless handling.
Additionally, IT equipment immersed in fluid is inherently better protected from contamination, vibration and noise than is air-cooled equipment so overall system reliability is likely to be increased, especially in harsh environments.
This also applies to unmanned edge computing facilities deployed in densely populated or urbanised environments, as the requirement for data, digital services and technology increases.
In smart-cities, applications including 5G, real-time public transport information, service announcement kiosks and in-time, autonomous driving systems, will require a preponderance of resilient, ultra-low latency and unobtrusive IT deployments, that must continue to work reliably without being seen or heard. For these environments, liquid cooling systems present a near perfect solution.
Although servicing liquid cooling systems can be expensive, one should remember that their inherent reliability due to immersion will result in longer times between service calls and fewer overall unplanned maintenance events. This will have a significant impact on long-term operating expenses (OpEx).
As the sheer demand for digital services continues to increase, liquid cooling it seems, becomes a highly valuable, efficient and cost-effective way of meeting the demand for critical applications hosted in the most remote of edge computing environments.