If you’re an IT or data professional, it’s likely you’ve heard of the term DataOps. But it’s also a term that’s been met with a fair amount of confusion and misunderstanding. One of the reasons for this is because it’s not a tangible technology solution per se, but a way of working.
Put simply, DataOps is an approach that helps improve the speed and accuracy of analytics by using agile working practices and methodologies – with a lot more collaboration in cross-functional teams. In this sense it’s also heavily intertwined with the concept of DevOps – itself a discipline focused on bringing agility to the software development and deployment process.
DataOps helps get more bang from your analytics buck
DataOps’ birth has come about due to challenges in getting data analytics projects into operational production environments where they can deliver value to the business. In a fast-paced environment characterised by more data, more IT complexity and rapid business change, it’s perhaps no surprise that professionals involved in building and deploying data pipelines often find it difficult to keep up.
The promise of DataOps is that it provides them with a way of working that can facilitate organisational change, and provides a better and faster way of managing data pipelines (from develop to test to deploy) – so they can surface reliable and relevant data to the business, while also taking into account data and privacy regulations.
So it promises a lot, but is it really delivering results?
Data science teams are at the forefront of early DataOps push
While DataOps adoption levels are very low, the discipline is gaining traction from the data scientist community, who see it as a way of addressing some of the inefficiencies in the data lifecycle. According to one data analytics leader, these inefficiencies become apparent when data science teams face a lengthy IT process for requesting access to data, negotiating compute resources, and then waiting for these resources to be provisioned. All of which can elongate any model development process.
By turning to DataOps’ agile and collaborative working principles, data science teams are better equipped to instrument data pipelines, to produce code rapidly and automate model testing. For example, through the ability to download a working environment quickly where they can start working straight away without the need to configure everything.
It’s not only in the development process where DataOps aids agility – trained models that need moving from lab environments into production are also target areas. Some examples of DataOps in practice might include formalising the detection of changing data streams entering analytics workflows before they trigger downstream errors or conducting A/B testing of different data sources on analytical models.
Similarly, tools and technologies can help the advancement of DataOps practices. Software version control solutions like GitHub can be used to track code changes. Container technologies like Docker and Kubernetes can be used to create environments for analysing data and deploying models into production that help improve the interchangeability between development and operations.
Although DataOps is not a pure technical competency it’s likely that these early audiences will be the first to see the benefits, given that data pipeline development is a highly technical endeavour. But delivering benefits at an organisational scale requires a bigger effort and as yet it appears that very few organisations are taking on this challenge. Instead, most organisations are favouring to start small and build from there.
Scaling depends a lot on cultural fit
As Peter Druker famously said, ‘Culture eats strategy for breakfast’ and this sentiment also speaks to DataOps and the ability to disseminate practices throughout the organisation. More than anything else, DataOps requires a culture of collaboration, rapid and lean improvement and a willingness to try what’s sometimes hard – faster data pipeline delivery.
Ultimately, DataOps is an exercise that needs to foster the right collaborative culture, one that emanates across IT and the business. At the same time, it should also be remembered that DataOps change doesn’t happen overnight; it needs to be smooth, iterative and gradual, so everyone can embrace the DataOps culture in a coordinated fashion.
Getting DataOps off the ground – and making it work in your organization
Despite all these challenges, there are some quick and pragmatic things you can do to get a DataOps initiative off the ground. Top of the list is the need to identify places in the development and deployment lifecycle where things are already breaking down, or where bottlenecks exist. So, for example if data format changes in operational systems aren’t communicated to downstream teams it can cause pipelines to break.
Equally, identifying stages in the lifecycle that can benefit from automation, through the use of self-service data preparation tools for example, can be a useful way to kick start DataOps efforts. The right automation and orchestration platform can also facilitate a DataOps approach by enabling data scientists to gain access to data, stand up and environment and then disguard it when needed.
Setting standards will also play a critical role in getting DataOps started. As a new discipline, there are still no standards or frameworks that organisations can align to. This may in the short term, put off companies from investing in DataOps at least in the shorter term. That said, DataOps is about improving the use and value of data through better and enhanced communication and collaboration, so building a team could provide a valuable starting point. So even if your organization is still finding its feet in terms of bringing the concept to life, there’s plenty of ways to start small and deliver value incrementally.