Intelligent infrastructure driving IT facility costs down
Mon 12 Mar 2018 | Michael Akinla
The monitoring and analysis route to efficient energy usage within the data centre white space
The needs of the data centre operator and IT Facilities controller are changing. Previously, it was a situation of ‘up at any cost’ and hardware over-capacity and high power use were the default position. High energy use was a necessary weakness in this strategy, costing operators and end-users large sums of money in wasted energy, system downtime and maintenance. Today however, advances in server, storage, switch and infrastructure technology together with intelligent monitoring systems and millions of hours of facility analysis allows energy efficient systems to dramatically reduce operational expenditure benefiting the operator and user with cheaper energy costs, and higher performance solutions.
Optimising the White Space
Understanding the utilisation of the cabinets within a white space is essential in maximising the effectiveness of the layout and the efficiencies that can be generated in respect of power, cooling and connectivity. When we discuss data centre white space, we mean the server, switch and data storage area, which also house the air handling equipment to those cabinets.
What is evident from recent industry research is that the cost of outages, and their frequency, is rising. Recent outages at a data centre used by Wikipedia shut down the site for hours, whilst Microsoft’s Outlook email service was shut down for 16-hours due to an unplanned outage. Both these incidents were caused by data centre servers overheating and the servers’ automated systems shutting down.
Graphic 1. Cost of Partial and Total Shutdown
As data has become an increasingly valuable corporate asset, the requirement to develop systems that guarantee data availability and delivery have led more board-level IT decisions toward standards-based solutions. In parallel, most major IT hardware systems providers have increased their hardware operating temperatures, guaranteeing maximum performance is maintained at higher equipment temperatures. This is one way that the industry has reacted to reduce the numerous cases of overheating and providing superior performance characteristics within the system.
International standards such as ASHREA TC 9.9, ETSI EN 300 and EN 50600-2-3 are driving acceptance of best practice in white space and data centre environments. For example, ASHRAE TC 9.9, provides a framework for compliance and determining suitable Information Technology Environments (ITE), with policies and guidelines for data processing environments.
Airflow Management & Cooling-System Control ASHREA TC 9.9 Guidelines
- Cabinet/Rack level – Instrument and monitor the inlet temperature and Relative Humidity (RH) for racks and cabinets at the bottom, middle and top of the cabinet, maintaining a specified recommended (18-27°C) as well as allowable (15-32°C) thermal ranges
- Containment level – in addition to 1. – With a cold aisle containment system, the hot aisle temperature can be in the range up to 50°C; instrument and monitor the outlet temperature at the top of the rack and cabinet. When using hot aisle containment system, then temperatures across the room must be monitored
- Data Hall level – in addition to 1. and, or 2. – Humidity and temperature needs to be monitored near each CRAC/CRAH at the Supply and Return. Relative humidity is recommended at 60% RH and allowable at 20% to 80% RH
- Airflow Management & Cooling-System Control – Airflow management and cooling-system control strategy should be implemented. With good airflow management, server temperature rise can be up to 20°C; with inlet temperature of 40°C the hot aisle could be 60°C
Air pressure management is also an essential component in a robust and effective airflow management and cooling system within the data centre white space.
Environmental monitoring across the data centre is providing data, analysis and actionable intelligence allowing operators to drive up efficiency and reduce energy consumption. Modular thermal efficiency DCIM solutions which utilise temperature and pressure to control the white space environment provide clear targeting criteria to achieve operator goals and can offer potentially cost neutral systems, together with on-going savings.
Industry and customer requirements are driving data centre operators towards implementing environmental monitoring capabilities. Currently there are significant differences in implementation between ‘strategic’ and ‘basic’ data centres. However, the cost and environmental benefits for employing an intelligent monitoring system, especially within modular asset and connectivity management solutions, is providing an engaging argument.
Graphic 2. Reasons Data Centre Owners Invest in Environmental Monitoring
Source IBM Estimates
Data centres are energy intensive undertakings. Operating a white space with hundreds or thousands of servers uses vast amounts of energy and generates a great deal of heat, that in turn, must be dealt with. It is not unusual for the cooling system of a facility to use as much, or possibly more, energy than the white space it supports. Today, a well-designed white space with a monitored and controllable cooling system may use a greatly reduced level of energy. In many cases, the latest developments in thermal planning, monitoring and cooling optimisation are saving hundreds of thousands of pounds in energy costs, as well as pre-empting problems and providing a more resilient and reliable data centre.
The previous widespread concept was to create a cool environment where hot equipment (servers, storage and switches) would have cold air passed across the live surface and the hot exhaust drawn away. This HVAC solution required a great deal of energy to reduce the ‘Air Inlet’ temperature, to that needed to lower the temperature across the hot equipment, whilst the hot exhaust is then often expelled and wasted.
Today’s white space processing equipment has higher operating temperatures, therefore this has allowed the data centre industry to develop alternative cooling methods which take advantage of intelligent environments. The warmer the white space operational temperature, the less energy is needed to equalise the ‘Air Inlet’ temperature. Device Inlet temperatures between 18-27°C and 20-80% relative humidity (RH) will usually meet the manufacturers operational criteria. What does become increasingly important is the capability to monitor and control the Recommended Environmental Range, including temperature and relative humidity (RH) and to maintain an Allowable Environmental Envelope, where the systems are operating at optimum performance.
Operating within a higher temperature environment means that the HPC (high performance computing) servers are working closer to their maximum operational characteristics. If for example, a massive spike in processor activity were to take place, and many more servers are brought online to cope with the capacity, while at the same time a generator fails and the UPS back up is not 100 percent efficient, this could lead to cooler fans not coming online quickly enough and the servers overheating and shutting down. As we discussed above, unplanned outage can cost the data centre revenue in terms of customer compensation, damaged reputation and future customer contracts.
What is required to safeguard optimal performance is the capability to intelligently monitor the white space thermal environment and analyse the data generated in real-time to provide actionable intelligence to maintain effective white space operations.
There are three distinct levels that data centre operators need to consider and implement, in order to deploy an environmental monitoring and cooling optimisation solution.
- Monitoring – alarms and notification – ASHREA provides guidelines for sensor distribution within the ITE white space. The latest wireless sensors offer thermal, thermal with humidity, and pressure nodes. These are easily configured on a wireless mesh network and allow for simple, fast and secure device deployment, offering a highly resilient self-healing, scalable and efficient sensor network.
Image – SynapSense wireless monitoring sensors
- Cooling Optimisation – airflow remediation and floor balancing – Designing the airflow metrics, employing CFD allows modelling of environmental scenarios in line with the operator’s goals. Utilising blanking plates and evaluating containment options and installing perforated floor tiles to ensure optimised air pressure across the critical pathway. Using real-time cooling mapping software to provide heat maps of the white space.
- HVAC control and dynamically matching cooling to IT load. Real-time data analysis across the system allows for actionable intelligence. The system constantly analyses data to improve airflow management to reduce energy use. Real-time control allows the system to dynamically maintain the optimised state through airflow management via fan speed adjustment, using pressure nodes for readings, and air temperature management via temperature set point adjustment using temperature monitoring.
A Customer found itself in the situation where it was reaching its power and cooling constraint limits and was faced with the prospect of building a new data centre to meet growing demand. When the white space utilization was analysed with a DCIM system, it revealed that cooling capacity was being wasted, and that by reorganizing perforated floor tiles, it reduced the cooling capacity across a number of pods and provided enough cooling capacity to increase the operational life of the data centre by over two years.
Appropriate Changes Large Saving
The energy needed to cool the world’s data centres will triple over the next decade, in environmental impact and direct cost. The industry is being driven to develop efficient processes in the way it uses energy. Operators are investigating and implementing intelligent systems to initiate the path to continuous analysis, data assessment and dynamic optimisation. Pursuing specific energy reduction and efficiency goals the latest environmental and cooling management systems often incorporated within DCIM systems, can provide actionable intelligence that will greatly reduce ROI timescale, by using simple and efficient asset utilisation. If we consider the customer example above, the cost of a new data centre is £10 million per MW of electrical capacity. Given the DCIM system’s findings the customer achieved major CapEx savings and large OpEx savings, and continues to gain real-time operational information to maintain energy efficiency and equipment optimisation across the site.