Photo by Ricardo Gomez Angel on Unsplash
Many companies have adopted the centralised data architecture, typified by large multi-domain centralised monolithic data stores and a central data team. In contrast the Data Mesh, described by Zhanak Dehgani is decentralised, domain specific and used by decentralised teams. It builds on the already established world of microservice architectures and domain-driven design.
Domain-Driven design is a topic in it’s own right but encompasses the idea that the domain and the business logic within that domain should be the main driving force behind software design. The same can be applied to data design and the data mesh is one approach.
The data mesh architecture, proposes modelling the data architecture using the same principles, by domain. The data processes are all managed within the relevant business domain. Teams are also domain focused and often multifunctional in their skillset.
In the blog post How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, she mentions a shift in focus towards:
- ‘serving over ingesting‘
- ‘discovering and using over extracting and loading‘
- ‘Publishing events as streams over flowing data around via centralized pipelines’
- ‘Ecosystem of data products over centralized data platform‘
From a microservice-architecture approach, rather than being specifically about decentralised infrastructure, this is actually more about decentralisation of the knowledge, skills and ownership of the data products that make up the business. While it could be that both the infrastructure and the teams are decentralised, the emphasis is decentralised business domains.
Monolithic Team, Monolithic Solution
The way we have designed ourselves traditionally (especially around data) encourages monolithic solutions. One team to rule all the data. This team manages all of the data in the business. These are often painful to set up as there is a lack of understanding between the domain SMEs (subject matter expert) and the data SMEs. The business need new datasets, systems, insights, models. As the business grows, so do the demands on the central data team.
Eventually, the central data team can become a bottleneck as more and more sources and users of data are scheduled to be added. This can make things move very slowly. It is an anti-pattern to development; both for delivering products and for personal up-skilling. It leads to neither side feeling very satisfied and is impossible to scale based on demand. It’s kind of okay for everyone but great for noone.
The data mesh nudges us towards teams that are leaner, multi-functional, agile and empowered. It also focuses on a business structure that is domain-driven rather than function driven. To ensure fresh ideas and continuous learning, technical roles like engineers and developers could regularly switch domain in order to keep circulation of knowledge and learning. For a small company, this concept is still possible. You might have a pool of technical resource that can move around the domain teams. The Data Science team, for example, could work as a kind of consultancy, moving around the domains helping them with data science projects. I.T specialists could also operate as in a consultancy type manner.
Data as a Domain Product
In the data mesh architecture, data is a product of the domain. It can be seen as something that is owned by the domain and can be subscribed to by other parts of the business (access permitting). Domain-Driven Design should be at the forefront of this approach, looking at the different domains that you have. There are a number of methods that a business can adopt to figure out the domain events that exist and to which domain an event belongs. One such method is event-storming, which we can perhaps discuss in more detail another time.
The idea is that teams have the data they want when they want it. The data could be served either via APIs or as events. API security and role access could be defined by the governance and security layer, which sits across all of the domains.
Data as events
Using an event-driven approach would enable data to be produced and consumed by decoupled domains. The teams wouldn’t need to worry about how other business domains want the data shaped or transformed, they are merely the producers. Domains can consume the data available and perform operations as is necessary for their own specific needs. If you want to learn more about event-driven architectures, I’ve written about them before here and here.
Advantages of the Data Mesh
There are some great advantages that I see when it comes to adopting a data mesh approach. Here are three:
Teams can choose their own technology and skills – In the microservice approach, each team is responsible for their own datastore (including the design, technology and operations). They are like their own little company, making their own decisions and delivering their ‘products’ to the rest of the business. This is in a way more realistic, each ‘domain’ has different needs (both technically and non-technically). One might use a SQL database, the other wants to use a NoSQL setup or a document store. One team is great at Python and the other one wants to use something else.
Decoupling services (or domain data) is empowering for the teams – They are now free to make their own decisions on how to collect, organise, store and use data (not only their own but any other sources to which they are subscribed.
Innovation happens faster – The decoupling that happens as a result of the data mesh is good for each of the domains. They can simply subscribe to events from a domain of their chosing. This means a reduction in time requesting and waiting for data (perhaps due to another teams current workload). Being able to just plug in to an API or event stream and start exploring will give rise to new opportunities for innvoation.
Overall, I think the data mesh is a really interesting concept that aligns well with the current shift towards cloud-native ecosystems. However, there are a lot of considerations when moving towards this kind of approach. This is just an introduction and there are a number of elements that I have not discussed such as security, data quality and governance. I hope to address some of these in a future post and talk a lot more about decentralised data.