Why data centre growth demands an operational overhaul
Mon 14 Mar 2016
There are huge opportunities for the data centre industry to become more focused on operational efficiency and best practice, rather than engineering and product or technology expertise, as is currently the case. The data centre industry is engineering-driven – its entire supply chain is very focused on getting infrastructure built on spec, on time and on budget. Very little thought is given to the full life-cycle cost and performance of infrastructure. Traditionally the goal tends to be more about getting the infrastructure built rather than worrying about what the next 15 years might look like.
There are currently two groups of people operating within the industry: one group to build the infrastructure and whose job ends when the infrastructure is delivered, and another group to run the data centre. In other words, the department providing the infrastructure is not the department responsible for operations. If the cost to run a data centre doesn’t come out of your budget and you only have to focus on the capital cost to build it, your internal customer might end up quite disappointed.
Building fast – but to last
But this is just a symptom. The real reason for the problem is speed. People would like infrastructure to be built as quickly as possible because the demand for that requirement is usually quite immediate, meaning the priority is to get the project up and running. This is where prefab and micro data centres begin to solve problems like speed of deployment.
A conversation needs to be established with end users to say there is a better way and potentially a very clear way of building infrastructure in which both the cost of the build and the cost of operations are known, understood and therefore predictable.
The industry has a couple of choices: Should the suppliers in the industry simply produce the cheapest possible product or should they continue to innovate so that the product life-cycle is the overriding concern either from a reliability or from a cost point of view? If you ask 100 people that question you’ll get the same answer, which is of course that we should think about the long term and the life-cycle.
The industry should be engaged in this conversation and understand that if we’re going to build infrastructure, and if it becomes possible to do so on spec, on time and on budget while at the same time delivering a very predictable outcome as far as life-cycle performance is concerned, this must be a good thing for everyone. If we can give customers a blueprint of how to achieve predictability from an engineering and a project perspective while providing predictable performance over the life-cycle then that must be a conversation worth having.
Cheap, fast and durable – pick two
If companies want to build infrastructure at the lowest possible cost, they’re almost certainly only looking at one part of the cost – CAPEX. Isn’t it reasonable then to say to them that they shouldn’t expect miracles to happen throughout the life-cycle? Actually what customers are really saying is that they want to buy infrastructure cheaply, but they still expect it to be extremely reliable, very efficient and easy to maintain and operate.
If things are going to get cheaper then maybe you should start to expect that stuff is going to require more maintenance. You buy it more cheaply but it’s more expensive to run.
That’s a little bit like saying, “Well I want to buy a paperclip for the price of a paperclip; but I’m really expecting the paperclip to automatically arrange all of my papers, file everything nice and neatly and take care of a lot of admin for me.” Well I’m sorry. You just bought a paperclip. It doesn’t do those things.
I think it’s the service providers who have felt the pain of this sort of CAPEX model most. We can’t blame them for this; they have simply reacted to market pressures in which people are expecting highly reliable data centres in terms of power and cooling but which could then be filled with low-cost product. Part of that is down to a lack of appreciation of the value of IT applications.
In other words, many parts of the market are being commoditised. If the marketplace is commoditised, it is even more important to understand life-cycle performance for TCO. If things are going to get cheaper then maybe you should start to expect that stuff is going to require more maintenance. You buy it more cheaply but it’s more expensive to run.
What is even more of a problem for a service provider is that an enterprise might have a real-estate facility with a shared-services function that looks after all the buildings and infrastructure. And then an IT organisation that looks after all the applications which the various business units pay to use. There is a slight conundrum in this in that up to fairly recently, a lot of data centres, particularly in the financial sector, were highly over-engineered and expensive to build without taking into account the life-cycle cost.
The price of failure
Of course, in the financial institution the cost is not the overriding concern so much as the loss of reputation in the event of a failure. A data centre does not typically cost $500m but if you know that the cost of a day’s downtime will be more than that, then it becomes a very simple decision to spend whatever you think is necessary. Today banks are becoming much more cost conscious.
In the case of a service provider, all of that cost sits in one place and the Finance Director can see it all but then he or she doesn’t have either the ability from an engineering perspective to understand the impact of the capital decisions or what can be done to improve them. They don’t realise that this market is not different to other infrastructure markets in which the full cost-of-ownership model is understood.
For example, in the case of an oil or gas refinery the issues are well understood. People say: “I’m going to build an oil and gas refinery that is capable of refining a barrel of oil at say £7.00 per barrel throughout the life-cycle.” This means that the people who build the refinery need to figure out the right design, the right product and technology choices and the right selection of partners to build the infrastructure with a view of how it will perform for the next 10 years. They know they have to take into account the inflationary rises of energy costs, labour costs and so on. In short, they’re looking at the whole cost model, not just the line-by-line project delivery cost model.
So what we’re ultimately going to say to the market is: “We can design infrastructure, build it and also service it in such a way that your entire life-cycle becomes predictable.” So the conversation we want to have is really with the people in the industry for whom this matters the most, namely the owners and operators of infrastructure.
Sometimes local is better
The key is that if someone knows how much it costs for them to run their infrastructure they might decide that they don’t actually need to be in a co-lo facility or to move stuff to the Cloud. Perhaps they would realise there are some things they could do better themselves. But they would also know for sure that there are other things they can’t do better themselves.
So this conversation helps everyone in the industry. It helps people make choices about what they should and shouldn’t outsource, it helps the outsourcers to decide how best to build and run infrastructure. It helps the telcos understand how better to modernise their infrastructure.
Infrastructure is a thing that needs not only to be built but to be run for some considerable amount of time and for that considerable amount of time we have a very predictable view of what that looks like so there are no surprises.
As a result of not having predictable data centre performance what tends to happen is that infrastructure is over-engineered to essentially be resilient in every conceivable scenario. It is easier to over-engineer and potentially over-spend when you don’t have to give too much thought to what the operational efficiency looks like. So you mitigate the risks of the unknown performance by over-engineering the infrastructure. Now if the performance was known you probably wouldn’t over-engineer, or certainly not to the same extent. The excuse for that is that uptime trumps operational cost every time.
But again, drawing an analogy with the other industries we mentioned like oil and gas, they have figured out TCO and they’ve figured out the life-cycle costs. However, none of them have said, “Oh and by the way we’re going to build significantly less resilient infrastructure now, because cost is something we’re worried about!” All of these gas refineries still have to be super mission-critical, because you can’t afford for one to blow up.
So they’ve solved both of those problems. And that’s really what we’re saying: we can do the same as an industry. We can solve resilience, we can solve availability, we can solve reliability – and we can deliver life-cycle predictability. It’s absolutely not an “either or”; it’s an “as well as”.
Return for part two next Monday to read Arun’s opinion on the dawn of Edge networking…