The Optus Outage: A lesson in crisis management
Wed 20 Dec 2023Credit: Reuters
The recent 12-hour outage experienced by Optus on November 8 not only disrupted the lives of about 10 million Australians and 400,000 businesses, but also brought to light crucial lessons in crisis management and contingency planning.
The outage of Australia’s second-largest internet service provider left customers unable to make calls, process payments, or access the internet.
This was caused by changes in routing information following a software upgrade. The incident activated safety protocols in key routers, disconnecting them from the Optus IP Core network to protect against a perceived threat.
In 2022, Optus experienced a significant cyber attack, which resulted in the details of 10 million customers being stolen. With the more recent outage, customers were concerned the same had happened again.
How did Optus respond to the outage?
Following the outage, Optus’ CEO, Kelly Bayer Rosmarin, stepped down, underscoring the severity of the situation.
Optus’ response to the crisis involved a public statement of apology and compensation for affected customers. However, details about the outage were only written in a FAQ section.
On social media, Optus kept customers informed but disabled comments on posts.
James Watts, Managing Director Databarracks, said: “Keeping customers informed over social media like this is a critical part of crisis communications. However, despite its salient advice to those trying to reach emergency services to use another carrier, it does not exactly build trust for the future.”
Watts also noted the inherent expectation of resilience in telecom and cloud providers, where issues should not spread across regions or the entire network.
“Though the company had outlined different scenarios following a cyber attack from the previous year, not having a plan for what to do if all its routers fail at once is an oversight that cost Optus its reputation,” he added.
Watts pointed out the necessity of having manual workarounds ready in case of a systems outage, emphasising that such a plan should account for incidents affecting all routers simultaneously.
“The lesson from this Optus case is to consider and plan for these low-likelihood and high-impact risks,” he advised.
This case highlights the critical importance of comprehensive planning for worst-case scenarios. According to Watts, testing plans against software updates or cyber incidents on a scale that could shut down all routers simultaneously is vital for mitigating impact and maintaining customer trust.