Maintaining uptime with environment monitoring
Thu 15 Oct 2020 | Richard Grundy
Richard Grundy discusses improving data centre performance and uptime with environment monitoring
Data centre uptime is one of the most critical aspects to your organisation and to your customers. One of the first key metrics users look at when choosing a data centre is their uptime statistics.
When you proactively monitor the environment in your data centre, you show your commitment to providing reliability while also improving your data centre performance and operational costs.
Data centre operating temperatures
Most data centres look to follow ASHRAE standards to provide the most optimal environment for their servers and other sensitive equipment. Keeping your environment within their recommended range of 18° to 27°C (64° to 81°F) helps you maintain your equipment and improve hardware lifespans, while also helping you to optimise your cooling costs.
Monitoring the overall temperature in your data centre, along with the temperatures in your hot and cold aisles, is important to maintaining your reliability and uptime. High temperatures that exceed recommended standards can quickly cause data centre downtime, and many of those instances are caused by HVAC failure. Unexpected power loss is a concern to losing your cooling and generating high heat, as backup generators often power the servers and connectivity but cannot handle the power load of also supporting your HVAC system.
When this happens, servers and appliances continue to run on backup power and generate heat without HVAC cooling to offset the temperature generated by the devices. Installing temperature monitors and sensors at various locations within your data centre allows you to monitor temperature throughout your facility.
If temperatures begin to rise to unacceptable levels, alerts will be sent to let you know that you are potentially on your way to an outage. By constantly monitoring the environment conditions, you will be able to see temperatures gradually rising, which could indicate an HVAC system in need of maintenance. When you can proactively address a potentially failing HVAC, you help prevent downtime, which saves you thousands of pounds in damages and lost productivity.
Many data centres focus on the temperature within their facility and humidity is sometimes an afterthought. Humidity can often be just as damaging as high temperatures and it is vital to manage it as closely. Too much humidity in your data centre will cause condensation to form on sensitive electric devices. If there is too little humidity within your data centre, you run the risk of static discharge, which is just as damaging to hard drives and circuit boards within your servers.
Monitoring your facility for humidity fluctuation will alert you when factors exceed acceptable thresholds that could cause costly damage and downtime. Excess moisture within your data centre due to weather conditions, geography, or even something like a faulty pipe or unseen water leak will ultimately lead to failure.
Data centre construction usually takes place in locations that have natural water leak prevention due to not being located near larger bodies of water. However, there are still many causes of unexpected leaks that can create immediate downtime and performance drop-offs. Unexpected extreme weather, clogged pipes, a leaking roof, construction accidents, or any other number of unexpected issues could introduce water into your data centre when you least expect it.
Organisations don’t think they need to monitor for ‘flood’ conditions within their data centre, thinking that their proximity (or lack thereof) to bodies of water and building construction will keep them safe from water damage.
What many data centres don’t recognise is that they are almost ten times more likely to suffer from water damage than they are from fire damage, for the unexpected causes noted previously. Smoke detectors and fire alarms are nearly universal, yet water leak detection is often given very little to no thought when it comes to causes of data centre downtime.
Installing water leak sensors can easily help monitor multiple locations within your data centre and will alert you to the presence of water as soon as it’s detected. The small expense of installing leak detection will more than pay for itself if a pipe leaks near your servers, an HVAC condensate pump fails, or an unexpected accident near your data centre causes water or sewer backup to enter your facility.
Unexpected power loss to either your building or parts of your data centre can quickly cause downtime and performance degradation. Even if you have UPS units installed to protect your equipment in the event of power loss or fluctuation/ spikes, they often are in place to provide temporary power while your back-up generator is starting up.
Sometimes smaller data centres may not have the ability to install their own back-up generator due to budget or facility location, or they may need to manually engage the generator when power is lost. It’s beneficial to install power monitoring within your racks and cabinets as well as individual servers to notify you power is lost at the appliance level. The more levels of monitoring you install, the better your performance and uptime will be.
Sometimes a few additional monitors and sensors will mean the difference between a minor blip that’s easily recovered from, and a catastrophic outage that causes irretrievable data loss.
Surveys from the Uptime Institute noted that nearly 50% of data centres reported having downtime over the past two years. Nearly 30% of downtime is caused by environment factors that are critical for you to monitor to improve your reliability and uptime.
When you proactively monitor your environment, you will help gain the advantage of spotting issues before they become problematic and increasing your response time in the unfortunate event that they do occur. The time to install environment monitoring is now, not after your data centre has suffered catastrophic downtime caused by high heat, humidity, power loss, or a water leak you could have prevented.