Cardiff University Completes Energy Efficiency Upgrade High Performance (HPC) Data Centre
Mon 25 Jan 2016
Comtec Enterprises provides Schneider Electric hardware and software data centre solution for major PUE improvements at leading UK University
Cardiff University’s HPC data centre fulfils a number of disparate roles, from housing the servers that provide applications and storage for the university’s general IT needs through to hosting a high-performance computing cluster, called Raven. The cluster is operated by the Advanced Research Computing at Cardiff (ARCCA) division and supports computationally intensive research projects across several Welsh universities. In addition, it houses the Cardiff Hub of the distributed High Performance Computing Wales service (HPC Wales).
The differing computing needs of the Cardiff data centre impose challenges on its support infrastructure including the power supplies and their backup UPS systems, the necessary cooling equipment and the racks containing IT and networking equipment. To help keep the significant energy costs associated with running such a state-of-the-art data centre to a minimum, Cardiff University has installed advanced StruxureWare Data Center Operation: Energy Efficiency management software from Schneider Electric, so that it can tightly monitor all the elements of its infrastructure to ensure maximum efficiency.
Founded in 1883, Cardiff University is independently recognised as one of the UK’s leading teaching and research universities, coming 5th in the UK for Research in the UK-wide 2014 REF assessment. Its breadth of expertise encompasses the humanities, sciences, engineering, medicine and technology. The University is also home to major new Research Institutes, which provide radical new approaches to, amongst others, neuroscience and mental health.
ARCCA was originally established to help the University maintain and build upon its position as a global centre for research. It is also a participant in a collaborative venture called High Performance Computing (HPC) Wales, a project funded by WEFO and the UK BIS funds, supported by several Welsh universities, and which supports major research projects in collaborations with Small to Medium Enterprises (SMEs).
The University has worked from the outset with Comtec, an Elite Partner to Schneider Electric, to design and populate its data centre. To maximise the efficiency of the system the strategy has been to co-ordinate rack density and layout with a ‘close-coupled’ chilled-water cooling solution and hot aisle containment (HACS). This has entailed using components from Schneider Electric’s InfraStruxure (ISX) data-centre physical infrastructure solution in conjunction with high efficiency chillers.
A critical element in managing the efficiency of the Cardiff data centre is Schneider Electric’s StruxureWare for Data Centers software suite (DCIM). In use, DCIM gives insight into the power use by the data centre and the cooling capacity utilisation, allowing management to respond to changes and also to calculate metrics such as Power Usage Effectiveness (PUE) in real time. This metric is a ratio of the total power consumed by the data centre to that consumed by the IT equipment alone. The closer a PUE ratio is to 1.0 the better, from an efficiency point of view.
High performance requirements
Hugh Beedie is Chief Technology Officer for both the ARCCA and the general IT Services department at Cardiff University. He works closely with the ARCCA team – notably Christine Kitchen (ARCCA Associate Director) and Wayne Lawrence (ARCCA System Administrator), and Keith Sims in the Estates Department – in ensuring the HPC Datacentre infrastructure for Cardiff University and Welsh researchers in general is truly fit for purpose.
Hugh has a long-standing interest in green computing and has spoken at conferences on the subject, and was part of the team responsible for ensuring that part of the performance specifications for the HPC Infrastructure was that it was to be designed from the outset to be as energy efficient as possible, while also being functionally advanced from a computing viewpoint.
This approach has paid off quickly as, quite soon after its opening, the data centre became part of the HPC Wales initiative. This meant ARCCA was required to take on additional computing equipment which saw the utilisation of the two contained server racks in the data centre increase from 40 per cent to 80 percent of their capacity. The cooling systems were upgraded at the same time.
“We originally had three identical 120kW chillers outside which provided us with a well-balanced cooling system,” said Hugh Beedie. “With the first power and cooling upgrade, we replaced one of the original chillers with a high efficiency 300kw cooling unit. While this increased the overall cooling capacity to the data centre, it seemed to cause an operational imbalance in the system.”
Subsequent to the upgrade, the operators noted from their initial energy monitoring that the PUE of the data centre was deteriorating although they could only surmise what was causing this. A compounding problem was that there was not enough insight into how each element of the system was performing, to pinpoint the reasons for the decline in efficiency.
“We didn’t have sufficient instrumentation to tell us whether the component parts of the cooling system were operating well or poorly,” said Hugh Beedie, “The ISX’s instrumentation inside the room monitored the power feeds to the main pumps, but we had very little instrumentation outside the room. So we didn’t know what was happening in the chillers, or about coolant flow rates or water temperatures. The instruments monitoring these were part of an entirely separate Building Management System (BMS) and there was no link between that and what we could see with the DCIM.”
The prospect of further multi-million pound research projects coming to the University requiring ARCCA’s high-performance computing included one attempt to verify Einstein’s theory of gravity waves and another on genomics. To meet these additional compute requirements Beedie knew that the infrastructure would have to be improved: “We could see that with the new power demands we would rapidly get to a point where we didn’t have any resilience in our cooling,” he said.
When making the business case for a second upgrade to the cooling system, Beedie realised that improved power efficiency, as evidenced by a better data centre PUE, could also result in energy savings that would offset the additional investment cost over time. However, essential to proving the business case would be an improvement to the monitoring and analysis of all elements of the infrastructure.
“The data centre had an estimated annual PUE between 1.7 and 1.8 at that time but they weren’t precise numbers and they certainly weren’t being generated in real time. We were just making calculations based on performance over selected periods,” said Beedie. Assuming an annual PUE of 1.7, which was very much a best-case scenario, Hugh Beedie calculated that reducing the PUE to 1.4 would see the cost of the cooling upgrade pay for itself easily over the working lifetime of any new equipment.
Calculating PUE accurately for a data centre with such mixed functions as Cardiff’s presents its own challenges. Cooling provision for the systems supporting general IT needs remains reasonably consistent, whereas for the high-performance work that ARCCA performs the systems, when they are running, tend to be operated at peak power.
“It’s quite a complicated picture,” said Beedie, “but we could only ever see the big-number totals. We couldn’t see down to the rack level so we had to make it part of the business case to demand more monitoring so that we could fine tune operations to get a better PUE rating. This would also give us a much better understanding of how everything was performing and that would inform all our future designs.”
Energy Efficiency upgrade
As part of the power and cooling upgrade managed by the University’s Estates division, Comtec deployed Schneider Electric’s Data Center Operation: Energy Efficiency module as an additional component to the previously installed StruxureWare for Data Centers. Working with data inputs from extensive instrumentation that Comtec had previously installed, the new software module provided a much more comprehensive picture of power and cooling consumption throughout the data centre infrastructure and presented it on a centralised console where it could be easily viewed and analysed by Hugh and his team.
It allowed them to get much deeper, more granular insights into energy usage, not just at overall site level but also at subsystem level and, critically, it did so in real time. This enabled Cardiff to, for example, monitor the effects on energy consumption of changing fan speeds, or of CPU utilisation on a server rack, or of raising the temperature of the chilled water supply.
The new cooling services design had some specific elements aimed at improving energy efficiency. For example, the replacement of all three existing chillers with new high efficiency 300kW models to provide a symmetrical system also saw the introduction of a secondary cooling circuit. Individual high efficiency Variable Speed Drive (VSD) pumps were also fitted individually to each chiller to give better “turn-down” ratios.
“Originally we had a primary circuit which pumped cold water from the chillers directly into Schneider Electric’s InRow RC units,” said Beedie. “With this upgrade, a primary circuit connected the chillers to a large heat exchanger and another set of pumps drove water in the secondary circuit from there into the room. The new pipework and pumps allowed the extra degree of control needed to make the system more efficient in practice.”
The new cooling equipment upgrade together with the new monitoring software, has seen major improvements in the energy efficiency of the data centre, despite the additional HPC servers. Depending on ambient heating conditions, the real time PUE rating has been as low as 1.2, according to Beedie. “The additional information about energy consumption has enabled me to see the real-time effect of warm weather as the day heats up. As the day starts, our PUE figure is running at about 1.20 but by the end of a hot afternoon it’s up to about 1.25, 1.27 and then drops down again as the evening cools. This degree of detail enables me to be confident that we are adopting an appropriate operating regime for the cooling system.”
Monitoring in real time the effects of small changes to operations enables clarity about the interaction of the control of the chillers and the operation of the HPC servers. For example, if the chillers are running below maximum power, it can be seen immediately whether it is more efficient to run all three at reduced load, or turn one off and run the remaining two at part load, etc. “The only way you can clarify this is by having this instrumentation and trying it under varying operational conditions,” he said.
Dr Beedie has long contended that “going green saves you money,” so he’s pleased that the Comtec software solution is helping to verify this assertion. “The additional investment for the Energy Efficiency module enabled me to prove firstly that the business case for the cooling upgrade was vindicated, and secondly, to fine tune the services operation to achieve the best PUE possible for the mix of climate, HPC server demand and services efficiency. The key point is that I can now make a change and see whether it has had a positive or negative impact on the energy use of the supporting services.”
“The University has a longstanding relationship with Comtec,” continued Beedie. “Having completed our original data centre design and build, plus the previous upgrade requirements, they were able to deploy the Energy Efficiency DCIM module quickly and without causing disruption to ARCCA. This latest project has provided immediate energy efficiency improvements, and has repaid our trust in both them and Schneider Electric.”