Optimising data centre performance
Wed 28 Oct 2020 | Marc Garner
Want to optimise data centre performance? Start with resilient power, efficient cooling and real-time visibility, says Schneider Electric’s Marc Garner
Optimising data centre performance is a complex process and requires achieving the best possible balance between speed and capacity, availability, resilience, efficiency and cost.
In order to achieve this, a number of key factors must be considered. They include the design, density and infrastructure; the cooling architecture, including containment, aisle layout, or use of liquid cooling; the choice of UPS to ensure uptime, availability and resilience; and the integration with management software. Each of these factors are interlinked and poor attention to one area can impact the others, so to begin, it’s important to define the main objectives of the data centre and the applications it supports.
For any data centre, whether the largest hyperscale provider, a colocation facility or an edge computing installation, uptime remains crucial. Recent research published in the 2020 Uptime Institute Data Centre Survey found that data centre outages continue to occur with disturbing frequency, and the bigger outages are becoming more damaging and expensive. Of the organisations surveyed, 75% stated that downtime was preventable with better management, processes or configuration, so it’s paramount to ensure that systems are designed to optimise power, cooling and offer visibility to the end user.
For business, or mission-critical applications, remote monitoring and data centre infrastructure management (DCIM) software are key to maintaining reliability and optimising performance. DCIM software offers the user essential insight into the operating environment and helps to streamline maintenance, balance loads according to changing needs and manage operations efficiently in order to maximise energy efficiency and reduce costs.
Power and cooling considerations
With greater demands now placed on data centres through increased data consumption and requirements for more compute-intensive applications, increasing processor power, chip densities and GPUs are becoming commonplace. This, in many cases, requires a rethink of the design, the components used and the considerations for how to best power and cool such systems. Power ratings of 10kW – 14kW per rack, and much, much higher, are becoming the norm in larger data centres as more computing power is concentrated on fewer servers. The performance demands of CPU chips is driving power consumption, bringing with it the inevitable need to cool such racks efficiently.
One consideration for driving efficiency with traditional air cooling involves a multi-tiered approach, comprising a lowering of ambient temperatures using chillers, and careful arrangement of the racks so that they are aligned in hot or cold aisles. This allows energy to flow in a streamlined and cyclical manner to convey heat away from critical areas, and with greater automation, fans can operate variably according to the load, providing pinpoint cooling where it is most needed.
Greater rack densities also produce more heat from IT equipment, which places a greater burden on the cooling effort. Simply increasing the speed of fans and boosting the air-cooling efforts can provide cooling but at the expense of much greater electrical power consumption, lowering efficiency. For many operators, especially those accommodating Nvidia GPUs and Intel processors, air cooling no longer offers the required performance capabilities.
PUE and cost savings
In an era when the environmental effects of inefficient power consumption are attracting greater scrutiny, and inevitable regulation, it is essential for data centres to operate as efficiently as possible. Metrics such as PUE (Power Usage Effectiveness) guide operators towards greater efficiency, but as cooling accounts for the second largest share of electrical power after the IT equipment itself, any approach that reduces its electricity consumption is to be welcomed. According to the recent Uptime Institute report, global PUE’s are averaging 1.59, but in Europe, the average is 1.46 – the lowest of any region.
An option that is re-emerging and becoming more commonplace for power-intensive computing is liquid cooling. Once thought of as a solution for niche applications, liquid cooling is being used in edge computing systems and in colocation data centres, with organisations such as EcoDataCenter publicly announcing plans to leverage the technology most recently.
Studies in Schneider Electric White Paper #282, “Capital Cost Analysis of Immersive Liquid-Cooled vs. Air-Cooled Large Data Centres”, found that the capital expenditure (CapEx) of air cooling versus liquid is comparable at similar power densities of 10kW/rack. However, applications utilising greater compaction and higher power densities of up to 40kW per rack, possible only with liquid cooling, resulted in far greater cost savings. At 20kW/rack the space savings result in an overall reduction of 10%, whereas at 40kW/rack the CapEx savings were 14%.
To ensure availability and resilience, proper care must also be taken in the optimisation of the powertrain. Uninterruptible Power Supplies (UPS) systems ensure that any disruption to mains power immediately switches the load over to a battery backup so that power is maintained. In critical situations where no downtime can be tolerated, a backup generator may also be deployed as a further line of defence.
In terms of UPS performance itself, Lithium-ion batteries offer many advantages in terms of total cost of ownership (TCO) and energy efficiency. They have a much smaller footprint, longer lifecycles that incur lower servicing costs, and offer greater power density. To ensure uptime, larger facilities may even choose to deploy three-phase UPS in parallel or in an N+1 configuration, optimising power performance in the event of an outage.
Visibility remains key
Underpinning all efforts to optimise the efficiency and performance of a data centre, no matter the type of hardware or assets deployed, is the Data Centre Infrastructure Management (DCIM) system, offering the user visibility into performance and availability. Recent technological developments in Internet of Things (IoT), Artificial Intelligence (AI), data analytics and machine learning, help to produce deeper insights into how a data centre behaves under a variety of conditions. Increasingly, racks, power distribution, UPS, cooling systems and IT, are manufactured with these capabilities built in.
For optimising performance, DCIM becomes essential and offers operators insight into the health status of critical components, while highlighting potential issues so that proactive maintenance or servicing can occur quickly and efficiently. Furthermore, it facilitates continuous improvements so that efficiency can be maintained, and potentially operating costs lowered.
Outside of the traditional white space, intelligent building management software (BMS) offers many performance benefits to facility operators. A BMS, or building automation system (BAS), acts like a central nervous system for the building in which the data centre resides, and is a critical tool for operating a building safely, efficiently, and reliably. With greater focus on energy efficiency and sustainability, combined with fundamental changes in user needs, BMS are evolving in a similar way to DCIM, utilising IoT, AI, and cloud computing to optimise building control, lighting and power management; thereby improving performance and efficiency while optimising resources, lowering energy usage and cost.
Overall, there are many key considerations when looking to optimise data centre performance, but ensuring resilient power, efficient cooling and gaining real-time visibility for mission-critical environments is crucial for any operator.