An NHS review of a data centre outage caused by the July 2022 heatwave found that the incident cost London hospitals £1.4 million.
Record temperatures in July caused the cooling systems at two data centres to malfunction, leading to an interruption of digital services. The data centre that supports St. Thomas’ Hospital overheated because of a broken hose connector, while the facility supporting Guy’s Hospital overheated when the staff found it difficult to locate the water supply.
The issues at these data centres were compounded by the fact that the data centres were designated as backups for each other, so an outage at one facility would be covered by the other. Unfortunately, the HVAC systems malfunctioned simultaneously, rendering the backup plan useless.
The NHS review determined that the data centre outages were both preventable, and could have been avoided had the data centres been adequately prepared for managing cooling systems in the record-setting heatwave.
Concerns with the cooling systems were first flagged in August 2018, when a vendor reviewed the systems and recommended that condensers be moved to improve air flow. The same vendor noted that the air handling units at the data centre serving Guy’s Hospital would need to be replaced in 2021 or 2022.
The review found that there wasn’t a single point of failure that was the root cause of the incident. Rather, there was a combination of factors at play, including legacy infrastructure and systems, overly-complicated architecture, and sub-optimal cooling systems.
When this assessment was made in 2018, the NHS submitted a request for £195,000 to replace the cooling systems for the data centres. At the time of the outage, this request had not been approved. After the outage, the funding request was increased to £360,000 and was subsequently approved.
In the six weeks it took for services to be fully reinstated, hundreds of appointments were delayed or canceled, and patients were unable to access critical care.
The outage meant that staff were forced to use paper-based tracking and record-keeping systems leading to additional stress, fatigue, and lower morale for caregivers.
The £1.4 million cost assessment includes unbudgeted IT expenses related to retaining a third-party data recovery service that helped pull information from servers which was damaged in the outage. Additionally, the hospitals needed to institute a cloud-based data backup system to replace the legacy solutions that were in operation during the outage.
Image Source: AccessAble