IT disaster recovery provision – don’t make a crisis out of a disaster
Tue 13 May 2014
Jules Taplin has a long experience of dealing with users over their IT disaster recovery provision. What’s his answer to the common question of should I do my own or give it to my IT service provider?
In a word: ‘Neither’. On both counts. If you’re serious about a disaster recover (DR) provision that will actually save your arse when it’s in need of saving, then the only sensible option is to give it to a specialist. Now, before you go off in a big cloud of ‘he would say that, wouldn’t he?’ (my day job is at a specialist DR provider, after all), then here are a few things to think about:
If your existing IT service provider or internal IT department manage your DR then they’re unlikely to give enough priority to the work required to support the DR provision for it to be reliable. For now, though, I’ll assume you did all that perfectly, and that you’ve got a well exercised plan, the pieces are all in place and regularly tested, you’ve got the right guys for the job (which means that you will only have a disaster when all your best staff aren’t on holiday), and that you’re feeling pretty chipper about it all when it goes wrong.
Take it from me – when flames are leaping from the top of your building, your internal guys are exactly the WRONG guys to be trying to bring servers back. Firstly, flames leaping out of a building are not good for one’s state of mind –jobs that seemed easy when you were sat down with all your gear, tools, and documentation around you suddenly seems difficult.
Secondly, are you seriously trying to tell me that under those sorts of circumstances, there aren’t other more useful things that your teams could be doing than trying to restore servers? Far better for a specialist team to be 100% focussed on recovering your IT systems; inevitably it will get you quicker results. They will be USED to helping people in these circumstances, have all documentation and systems to hand, none of the stress, and with practice comes perfection.
If you outsource your IT to a partner who is looking after your equipment on-premises, then you’re probably slightly better off, but even then… your partner will be more use to you right now helping with the other legion of problems you have (where are people going, how do we get that networked in, what about phones, etc, and endless new desktop issues). They’re often not the best person to handle the one call that I can guarantee they never want to receive.
Finally, there’s only one scenario that’s even worse than that, and that’s if you’ve out-sourced the hosting of your infrastructure to a data centre provider. That’s normally a good move –better power, more reliable comms, and much better operational procedures, as well as a potentially deeper talent pool and better engineering expertise. But (and it is a BIG but), that means that your most likely need for IT DR now is if there is a total data centre facility failure (normally involving power provision, in my experience).
And on THAT day, your data centre provider isn’t your friend. Their teams aren’t trying to deal with the failure of your systems, they’re trying to deal with the failure of HUNDREDS of customers’ systems. It takes a LONG time to power back on cabinets and deal with what got fried or just plain isn’t working. I’ve lived through unplanned data centre power-downs in my past life, and the sad fact is that somebody is first to bring up (probably the largest customer that the organisation gets on well with), and somebody is last (probably somebody who’s either got a difficult architecture, or the difficult customer who punishes every transgression). If your DR provider is independent, then you’re in clover at this point.
All of that was true years ago, but the spectacular 2e2 failure of last year has focussed the mind even further. The nature of the failure meant that many organisations hosting their production environments with them were faced with a deeply unpalatable choice – either pay extortionate amounts of money to continue service on a week-by-week basis, or simply lose service, with no opportunity to gain access to that facility to move equipment or even bulk quantities of data. An independent DR provision would have got customers out of that lock in a way that no other option could have done.
Jules Taplin is Technical Director with Plan B Disaster Recovery