From data mess to data mesh: a shift in data architecture
Tue 11 May 2021 | Mathias Golombek
Mathias Golombek, CTO of Exasol, breaks down the data mesh and what it means for organisations and enterprises alike
If you speak to anyone working in the data industry about what’s “hot” at the moment, the chances are “data mesh” will be on their list. It has even made its way into the trends that are predicted to disrupt and dominate the data market in 2021.
But what exactly is data mesh and why are more and more companies looking to implement this contemporary data concept?
Getting to grips with data mesh
The cloud is one of, if not the, most disruptive driver of radically new data architecture approaches. But to fully understand what’s driving the need for data mesh, we need to appreciate the mess many organisations find themselves in when they try to scale their data.
Data Engineering Weekly provides a great analogy for the sad state of data infrastructure in many organisations today. It compares the modern data generation process to the equivalent of writing a dictionary without any definitions – shuffling the words up randomly and then hiring expensive analysts to try and make sense of it all. While this analogy certainly doesn’t apply to every organisation, it definitely resonates with a lot of businesses and is at the core of why the data mesh principle has gained such a following over the last few years.
Solving scalability challenges
The concept of data mesh was first shared by Zhamak Dehghani of ThoughtWorks as a reaction to seeing large customers spending far more on big data platforms but failing to see value from the investment.
Dehghani’s data mesh theory argues that data platforms based on traditional data warehouse or data lake models have common failure modes that mean they don’t scale well. Instead of centralised data lakes, or warehouses, data mesh advocates the shift to a more de-centralised and distributed architecture that fuels a self-serve data infrastructure and treats data more as a self-contained product.
As your data lakes grow, so too does the complexity of the data management involved. In a traditional data lake architecture, you’ve typically got producers of data who generate it and send it into the data lake. However, the data consumers down the line don’t necessarily have the same sphere of knowledge as the data producer and therefore struggle to understand it. The data consumers then have to go back to the data producer to try and understand the data and may still not have the required expertise available to comprehend it.
By treating data as a product, data mesh pushes data ownership responsibility to the team with the domain understanding to create, catalogue and store the data. The theory is that doing this at the data creation phase brings more visibility to the data and makes it easier to digest and consume. As well as stopping any human knowledge siloes forming, it helps to truly democratise the data because data consumers don’t have to worry about the data discovery and can focus on experimentation, innovation and producing more value from the data.
Data mesh in action
Netflix processes trillions of events and petabytes of data a day. As it has scaled up original productions, data integration across the streaming service and the studio has become a priority. So, Netflix turned to data mesh to integrate data across hundreds of different data stores in a way that enables it to holistically optimise cost, performance, and operational concerns, which had presented a significant challenge.
While Zalando moved from a centralised data lake to a distributed data mesh architecture and is working towards making the creation of true data products in a matter of minutes that guarantee quality and acknowledgement of data ownership.
The retailer realised early on that data accessibility and availability on a large scale can only be guaranteed if primary responsibility lies with those who generate the data and have the appropriate domain knowledge, keeping just data governance and metadata information centralised.
Approach with caution
Despite data mesh architecture gaining a lot of traction in more recent times, there are concerns in the industry about its application. And of course, there are plenty of strong advocates for the benefits of centralised data warehouses and lakes.
To be clear, data mesh isn’t a silver bullet to all the issues people experience with data lakes. It’s a concept that works for some, not everyone. If organisations do go down this route, then getting the tech stack right – or as right as possible – will be crucial to data mesh efforts. A powerful, high-performance, tuning-free analytics database that can scale with this diverse access from various data consumers will be key.