What constitutes a well-run data centre?
Thu 15 Aug 2019 | Steve Bowes-Phipps
Steve Bowes-Phipps, senior consultant, PTS Consulting, looks at what makes a high-quality data centre stand out
The Uptime Institute espouses its tiering structure to describe a data centre’s physical resilience; however, this is a rather narrow scope when considering ‘availability’.
Building a resilient, highly-available data centre is only the initial, embryonic phase of its lifetime. For the following 15-20 years, how one operates and maintains a data centre is far more important than how it was built (that said, it does need to be built right in order to be operated to its maximum potential).
So, what is the difference between a well-run data centre and one that is not? Let’s take a look at what makes a high-quality data centre stand out.
Determine your strengths and weaknesses. Concentrating on mitigating and removing the weaknesses is crucial. Here are a couple of areas to consider:
Site Risk Register – At least once a year, a full risk assessment should be undertaken by walking through the data centre and inspecting the incident and problem management system to discover negatively impacting trends.
Physical security – The best data centres will undertake a full, wide-ranging threat and vulnerability risk assessment (TVRA) annually. This may identify a number of issues that would then be held on the risk register for action and closure or acceptance, if the cost of elimination is too high or too impactful on operations.
A fully compliant ISO20000 Incident, Change and Problem Management System should be available and in active use to ensure that tickets are auto-escalated, based on business rules aligned with the SLAs agreed with your clients. Integrating this system with your FM supplier, or OEM suppliers’ support systems can bring the benefit of rapid response to service-affecting incidents.
Energy efficiency best practice
Adhering to the best practices in the EU Code of Conduct (EUCoC) for data centre energy efficiency has a double benefit – providing reduced power consumption, that can translate through to lower costs for your customers, while also bolstering your sustainability credentials.
If managed through ISO14001 Environmental Management and ISO50001 Energy Management, the real benefits can be realised in a formal process-oriented way.
DCIM, BMS and EMS
Data centre infrastructure management (DCIM), building management system (BMS) and environmental management system (EMS) are all available to keep tabs on how your data centre is performing.
Used well, these can provide critical information on issues before they become service-affecting and can also provide customers (whether internal or external) with valuable operational data about the performance of their installed critical environment.
The top data centres can also use these to provide virtual feedback and control loops that constantly monitor and adjust the critical environment to minimise power usage whilst maximising availability.
Whether a data centre employs an FM supplier to provide all the proactive maintenance and reactive support, or contracts out every type of plant and equipment to the OEM, managing the suppliers’ behaviours and responsiveness is critical to keeping the lights on – literally!
SLAs should ideally be leveraged to encourage positive behaviours rather than be penal and should be built into the ISO20000 system to ensure that auto escalation occurs when the response is outside of contractual limits.
Assets should be maintained in accordance with manufacturers’ defined maintenance periods and condition reports should be recorded every time an asset is inspected as part of the planned, preventive maintenance (PPM) schedule.
Don’t forget that maintenance checks will differ depending on the frequency type of the visit (monthly, quarterly or yearly). Many FM suppliers are now integrating permits to work (PtW), risk and method statements and engineer reports into a digital solution that reduces or even eliminates the need to shuffle paperwork around.
Assets that are approaching their end-of-life, should be in a plan for replacement and disposal at least one year in advance of that date occurring.
Black building tests
The data centre will have an in-built level of resilience and redundancy but that is worthless unless it is tested and proven on a regular basis. Too many data centre owners believe that load bank testing is sufficient, but unless the infrastructure is stressed against a live load, in my opinion, your testing counts for nothing.
Based on your risk assessment, black building testing should be undertaken at periods that support your business need. Ensure your customers are aware why this is undertaken and how important it is and make the case that having an outage during a planned exercise is safer than during a power brownout in the middle of a thunderstorm at 2am.
Of course, if you’ve been doing everything else right, then the chances of an outage are always going to be very low.
Processes and procedures
At the end of the day, your data centre will run well only if it is well understood by everyone responsible for its operation and upkeep. This implies that the processes and procedures of the data centre are documented, maintained and complied with by everyone. Regular audits (plan, do, check, act) will keep processes alive, adapted to current business needs and controlled.
- Photo: rawpixel.com
This article appeared in the Summer 2019 edition of DCM Magazine. Click here to see if you qualify for a free DCM subscription.