Study argues for re-think of server consolidation in data centres
Tue 5 Sep 2017
A study by a group of international researchers has been released arguing that server consolidation may be costing data centres money rather than saving it.
The study notes that demand for servers fluctuates and is unpredictable. Often, as an energy saving method, idle servers are turned off. All of the virtual machines (VMs), are placed on to as few physical machines (PMs) as possible, in a process known as consolidation.
It notes that ‘the average utilisation of physical resources in cloud data centres is between 10% and 50%.’ When demand spikes and the idle servers are required again, the ‘on-off’ cycles affect hardware reliability and cause wear and tear.
This in turn increases maintenance costs or potentially causes downtime, which is extremely costly. The paper proposes a mathematical model to measure the efficiency of having servers idle, in order to minimise total data centre costs.
When looking at reliability, the study notes that 70% of server failures come about as a result of disk faults – and that start-stop cycles are recognised as the most important factor in reliability degradation in disks.
The study also notes that there is an increasing difference in temperature due to on-off cycles – therefore meaning there is an additional cost to switching servers off. Balanced against these two costs, the paper notes that there is some reliability and cost saving that comes with turning physical machines off.
The researchers tested the theory by using different ratios of PMs to VMs. They found that different proportions of VMs to PMs has a different effect on energy costs and reliability costs. Effectively, the study illustrates that consolidation is not necessarily the best way to achieve cost-efficiency and that a more nuanced assessment of consolidation and idle servers can help to save money in the data centre.
The study argues that there are important considerations to take into account when consolidating: ‘performance, network traffic, rack inlet temperature, and more recently, hardware reliability.’ Its conclusions show that hardware reliability is an equally important factor to consider.