How much of a data centre can be controlled remotely?
Wed 15 Apr 2020
We talk to leading providers about the options available for operators
The coronavirus pandemic is testing the theory that data centres are completely manageable remotely. Although a large number of governments, including the UK’s, have recognised data centre workers as key workers, operators are ensuring they minimise risk to employees by keeping non-essential staff at home, establishing rotas, and firing up remote systems where possible.
Those who built remote management capabilities preceding the pandemic are now putting them to the test at a time when businesses and society as a whole need mission-critical infrastructure more than ever. But what are the limits of remote management, and what are the best practices to guarantee uptime?
What can be controlled remotely?
“Pretty much anything that does not require a screwdriver can be controlled remotely,” James Giblette, director of Legrand Digital Infrastructures, UK&I, told me via email, adding that in some best-in-class data centres, servers can even be replaced by robots that are, in turn, remote-controlled. Legrand itself offers “intelligent” power distribution units (PDUs) that allow data centre managers to remotely manage PDUs at the outlet level and perform maintenance check-ups.
“Today, almost every IT device is capable of being controlled remotely,” said Michael Akinla, senior business manager, North Europe, at Panduit EMEA. “There is a range of automated and semi-automated devices and systems available that provide access to a new level of data for analysis and predictive responses.” The only equipment that can’t be controlled remotely, he says, are large power equipment units such as automatic transfer switches or legacy power equipment devices.
Sensed and networked data centre components are typically integrated into data centre infrastructure management (DCIM) systems, which can provide “at location” physical responses to specific sensor readings and action commands from external operators – or even a data centre’s customers, if necessary.
Panduit’s own DCIM, SmartZone, adds physical security and verification tools for unlocking cabinet doors. Security functionality is more useful than ever when workforces are squeezed. A data centre manager can grant access to personnel on a strict rota remotely when they need access to data centre areas they didn’t previously have access to.
It’s not just data centre managers that require remote access to servers, but IT administrators. With IT admins remote working, many more tasks need to be conducted via “out-of-band” via KVM-over-IP remote server access switches. Giblette said Legrand has experienced “a surge in demand” for its KVM-over-IP switches in recent weeks. “The increased burden on (additional) mission-critical applications means that out-of-band access is more alive than ever,” he said. “IT admin tasks such as server rebooting in case of failure, firmware upgrades, software installations, device analysis, also now need to be done remotely.”
As remote access is enabled by networked equipment there is a raft of potential network issues that can derail remote management efforts. Akinla identifies three: latency, bandwidth and reliability. If a wireless network is used inside the data centre, for instance, the risks of the network dropping or shutting down altogether need to be determined.
“As more IoT devices are rolled out, bandwidth could become saturated, which might increase the risk of failure,” he says. He adds that the bandwidth being used by home workers for data centre management was never intended for mission-critical traffic. If this new and untested link in the chain fails, it would a would require an emergency “truck roll” to the data centre.
It’s not just about having the gear, but the means of leveraging it – trained staff, established processes and thorough integration with IT assets. Now stay-at-home measures are largely in place, the latter part of that challenge is particularly complex for companies wanting to expand remote management capabilities.
“The biggest challenge nowadays will be the ability to install infrastructure components at sites that are difficult to physically access,” said Giblette. “Having said that, what we’re seeing, like in distribution centres and warehouses, mission-critical data centres increasingly work in staff rotas to ensure business continuity throughout.” Due to limited time and resource available to oversee fresh installations, plug-in-play solutions are a popular choice, although they are often more costly.
The challenges aren’t uniform across the data centre. Akinla notes the likes of Google, Amazon, and Facebook and co. have moved towards “lights out” data centres for a number of years – driven by the fact that their massive data centres are simply too big for hands-on management. Colocation customers, on the other hand, often operate legacy infrastructure which complicates efforts to integrate a connected remote system. Your average enterprise operator often has a “blend of these issues”.
Where should operators begin given time constraints? “In respect of short-term efficiency gains a good place to start is power and temperature monitoring with an automated alert system,” says Akinla.
“Determine the daily key operating points of your data centre and start there,” he adds. “Ensure you have the tools in place to manage your key resources remotely before they are needed. Second, plan to be in the data centre for maintenance activities. Arranging them all together gives maximum effect with minimum time on site. Finally, reach out to your vendors to see what is available.”
By most accounts, stay at home measures won’t be fully lifted in Europe and the US for many weeks. In theory, data centres could be controlled remotely indefinitely, or at least until a physical asset requires replacement. DCIM can make these replacements painless by alerting operators in advance if components are on their last legs. “All devices will fail, and at that point intervention is required. However, with sensors and monitoring across systems, and new levels of data analysis providing heightened capability to predict possible incidents, remedial action can be taken,” says Akinla.