Taking a proactive approach to data centre maintenance
Mon 16 Oct 2017 | Cliff Ebsworth
Cliff Ebsworth, Maintenance Director at Prism Power discusses the business value of establishing a robust maintenance and lifecycle strategy in the data centre
The critical power infrastructure in a data centre is the most vital system for ensuring continuity of business operations. It is the backbone of everything that takes place within the facility, from the security and mechanics, to cooling and driving the overall IT systems.
When building a new data centre, critical infrastructure must be considered at the early design stages so that maintainability, physical and redundant capability, as well as continuity of service and optimum reliability and performance of equipment are all taken into account.
Infrastructure needs to be robust, resilient, and work right the way through from Day one to Day ‘n’. When companies roll out expansion projects, for example, it is vital that there is minimal interruption to the day-to-day operations of the data centre.
Companies that do not plan where their facility is headed are painting themselves into a corner. They will struggle to maintain equipment adequately without major disruption and will be unable to meet the future demands of their own business, or those of third-party organisations in the case of colocation providers.
Staff changes and engineering shifts can mean that the right people are not available when an emergency arises
The role of ongoing training
The people who have designed, built and installed the data centre infrastructure are very rarely the people who are responsible for operating it on a daily basis.
At the point of installation, basic onsite training is conducted for anyone that is going to operate the equipment. However, maintaining such a complex system after a day or two of training may not be sufficient in many situations. Staff changes and engineering shifts can mean that the right people are not available when an emergency arises. What’s more, even those that have been trained can forget, or may not be confident about their decisions, as they are not required to carry out switching activities on a regular basis.
When a failure occurs, it is a common issue that the operators onsite do not know what to do or they have forgotten. Ongoing training, especially on systems that are seldom operated, is therefore very important. This can be as simple as a few hours of training every year when the equipment is maintained.
Annual training coupled with simple ‘step-through’ instructions, such as Stand Operating Procedures (SOPs), can help all operatives use equipment correctly and with confidence. Having this straightforward guide is a great way of ensuring that human error does not come into play and that the next steps are foolproof.
Planned and intelligent maintenance
Beyond training, data centres should have a high-level plan for the lifetime of the equipment, not just the next 12 months. This will allow clear and visible guidance for ensuring all assets are operating as effectively, efficiently and as reliably as possible.
An annual contract where the equipment is checked every 12 months should be supplemented by maintenance guides so that if there is a problem, operatives can follow the expected actions step-by-step.
Traditionally mechanical components are the most vulnerable and prone to failure, even more so than electrical systems
If you have sub-standard strategies, equipment is going to fail and it will almost certainly fail a lot quicker than with a planned approach. This is particularly pressing in a colocation data centre where there are a lot of different clients relying on your data centre’s performance.
All data centres have their own internal Building Management System (BMS), but in addition to this many operators are installing energy monitoring equipment. These systems produce valuable data and the industry has to educate data centre owners on how they can best use this data to understand what is happening with their critical infrastructure.
Many operators have all the data but are not using it in a proactive way. They are missing a trick in terms of what they can learn from how their equipment is performing. Slight changes in room temperature, or a burst in power consumption, for example, can indicate potential failures and help identify equipment in need of maintenance.
As there is such a huge mass of data available, the important thing is to focus on the areas where your business is going to see the most benefit. Pinpointing the critical systems and weak areas in your operations will provide a strong base before you go on to expand your data capture and data evaluation. Traditionally mechanical components are the most vulnerable and prone to failure, even more so than electrical systems.
The use of big data in this way is still in the reasonably early stages of adoption. Early adopters have caught onto the technology and are already benefiting from the derived trend analysis. Every customer has slightly different drivers; we have seen successful examples of data applied to mechanics, but also to power, with operators looking to maximise what they can achieve without getting into a dangerous situation in terms of overload.
Inventory of spare parts
Effective strategies take account of a data centre’s appetite and also its exposure to risk
Locally kept spare parts, especially consumables such as fuses, circuit breakers, and metres, can further aid ‘first-time fix’ maintenance and MTTR (Mean Time To Repair) for when a piece of equipment within the critical infrastructure path fails.
As with all mechanical parts, critical power equipment does fail from time to time. If the data centre is constructed correctly, it has resilience built into it but once there is a failure any resilience is lost. The ideal situation is that a spare piece of equipment stored onsite is prepared and installed, with services back online and resilience restored in a matter of hours.
Having the right spare parts immediately available is a critical maintenance consideration. If a service engineer comes out to look at a piece of equipment, identifies the problem, has to leave and order the part in, it is simply not as efficient as if the spare part is already onsite and can be fitted straight away.
Maintenance planning should be flexible to a company’s requirements. Effective strategies take account of a data centre’s appetite and also its exposure to risk – i.e. what equipment is involved and what clients are relying on it. Investing in quality, tailored processes can eliminate unnecessary maintenance complications and simplify lifecycle management for today’s complex data centre environments.
Prism Power Group is a global power solutions provider specialising in switchgear, power distribution, power management, critical systems and power maintenance. With a highly experienced collaborative team, Prism Power Group has the expertise to deliver reliable, efficient and high-integrity solutions that can satisfy the most complex of specifications and give lasting value. Prism Power Group is a comprehensive one-stop-shop supplier for any given critical power infrastructure requirement.