cloud native, event driven

3 Advantages of the Data Mesh: Distributed Data for the Microservice World.

Photo by Ricardo Gomez Angel on Unsplash

Many companies have adopted the centralised data architecture, typified by large multi-domain centralised monolithic data stores and a central data team. In contrast the Data Mesh, described by Zhanak Dehgani is decentralised, domain specific and used by decentralised teams. It builds on the already established world of microservice architectures and domain-driven design.

Domain-Driven design is a topic in it’s own right but encompasses the idea that the domain and the business logic within that domain should be the main driving force behind software design. The same can be applied to data design and the data mesh is one approach.

The data mesh architecture, proposes modelling the data architecture using the same principles, by domain. The data processes are all managed within the relevant business domain. Teams are also domain focused and often multifunctional in their skillset.

In the blog post How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, she mentions a shift in focus towards:

  • serving over ingesting
  • discovering and using over extracting and loading
  • Publishing events as streams over flowing data around via centralized pipelines’
  • Ecosystem of data products over centralized data platform

From a microservice-architecture approach, rather than being specifically about decentralised infrastructure, this is actually more about decentralisation of the knowledge, skills and ownership of the data products that make up the business. While it could be that both the infrastructure and the teams are decentralised, the emphasis is decentralised business domains.

Monolithic Team, Monolithic Solution

The way we have designed ourselves traditionally (especially around data) encourages monolithic solutions. One team to rule all the data. This team manages all of the data in the business. These are often painful to set up as there is a lack of understanding between the domain SMEs (subject matter expert) and the data SMEs. The business need new datasets, systems, insights, models. As the business grows, so do the demands on the central data team.

Eventually, the central data team can become a bottleneck as more and more sources and users of data are scheduled to be added. This can make things move very slowly. It is an anti-pattern to development; both for delivering products and for personal up-skilling. It leads to neither side feeling very satisfied and is impossible to scale based on demand. It’s kind of okay for everyone but great for noone.

The data mesh nudges us towards teams that are leaner, multi-functional, agile and empowered. It also focuses on a business structure that is domain-driven rather than function driven. To ensure fresh ideas and continuous learning, technical roles like engineers and developers could regularly switch domain in order to keep circulation of knowledge and learning. For a small company, this concept is still possible. You might have a pool of technical resource that can move around the domain teams. The Data Science team, for example, could work as a kind of consultancy, moving around the domains helping them with data science projects. I.T specialists could also operate as in a consultancy type manner.

Data as a Domain Product

In the data mesh architecture, data is a product of the domain. It can be seen as something that is owned by the domain and can be subscribed to by other parts of the business (access permitting). Domain-Driven Design should be at the forefront of this approach, looking at the different domains that you have. There are a number of methods that a business can adopt to figure out the domain events that exist and to which domain an event belongs. One such method is event-storming, which we can perhaps discuss in more detail another time.

The idea is that teams have the data they want when they want it. The data could be served either via APIs or as events. API security and role access could be defined by the governance and security layer, which sits across all of the domains.

Data as events

Using an event-driven approach would enable data to be produced and consumed by decoupled domains. The teams wouldn’t need to worry about how other business domains want the data shaped or transformed, they are merely the producers. Domains can consume the data available and perform operations as is necessary for their own specific needs. If you want to learn more about event-driven architectures, I’ve written about them before here and here.

Advantages of the Data Mesh

There are some great advantages that I see when it comes to adopting a data mesh approach. Here are three:

Teams can choose their own technology and skills – In the microservice approach, each team is responsible for their own datastore (including the design, technology and operations). They are like their own little company, making their own decisions and delivering their ‘products’ to the rest of the business. This is in a way more realistic, each ‘domain’ has different needs (both technically and non-technically). One might use a SQL database, the other wants to use a NoSQL setup or a document store. One team is great at Python and the other one wants to use something else.

Decoupling services (or domain data) is empowering for the teams – They are now free to make their own decisions on how to collect, organise, store and use data (not only their own but any other sources to which they are subscribed.

Innovation happens faster – The decoupling that happens as a result of the data mesh is good for each of the domains. They can simply subscribe to events from a domain of their chosing. This means a reduction in time requesting and waiting for data (perhaps due to another teams current workload). Being able to just plug in to an API or event stream and start exploring will give rise to new opportunities for innvoation.

Summary

Overall, I think the data mesh is a really interesting concept that aligns well with the current shift towards cloud-native ecosystems. However, there are a lot of considerations when moving towards this kind of approach. This is just an introduction and there are a number of elements that I have not discussed such as security, data quality and governance. I hope to address some of these in a future post and talk a lot more about decentralised data.

event driven

3 advantages of Event-Driven Architecture

My latest posts have put a lot of focus on Cloud native technologies. The last few have mentioned things like CloudEvents and Knative Eventing and it got me thinking… why might people want to implement event driven ecosystems in the first place?

I’ve decided to put together three advantages that I think offer pretty attractive prospects for implementing an Event Driven Architecture pattern.

True Decoupling of Producers and Consumers

The nature of an Event Driven Architecture ecosystem lends itself to microservices and, in this type of system there is (hopefully) a loose coupling between the services. Depending on the communication between microservices, there may still be dependencies between them
(e.g a http request/response approach).

In the excellent book, ‘Designing Event-Driven Systems‘, Ben Stopford tells us that event-driven services core mantra is “Centralize an immutable stream of facts. Decentralise the freedom to act, adapt and change”.

Because the ownership of data is separated by domain, this gives a nice logical separation between the production and consumption of events. As a producer I do not need to concern myself with how the events I produce are going to be consumed. Vice versa for the team consuming them. They are free to figure out for themselves what to do with the events, they do not need to be instructed. The message structure is also not important. It can be json, xml, avro etc. Doesn’t matter.

The broker and some kind of trigger between it and the services enables messages to be ingested into the event driven eco-system and then broadcast out to whichever services are interested in receiving them.

Business narrative of what has happened that can’t be changed

We have all heard the term ‘single source of truth’ and this is usually just a rumor (like the treasure chest hidden at the end of the rainbow). Well, in an event-driven ecosystem it really exists!

As mentioned above, an event-stream should be an immutable stream of facts. This is very representative of how our daily lives unfold; as a series of events. These events happened and it’s not possible to go back and change them unless you own one of these (remember, terrible things can happen to those who meddle with time)…

This is an advantage for business data governance as you can always look back in the log for auditing or to see what happened.

It is becoming more and more common for companies to need to explain their ‘data-derived’ decisions, e.g why a customer’s application for finance or insurance has been rejected. The log of immutable events that EDA provides us can provide a key component of this auditing.

Real-time event streams for Data Science.

One of the reasons I am enthusiastic about EDA is that it is particularly well suited to in-stream processing. It lends itself to fast decision making, things where milliseconds count.

Business logic can be applied while data is in motion rather than needing to wait for the data to land somewhere and then do the analysis. This is good for things like fraud detection, predictive analytics. Oftentimes, we need to know if a transaction is fraudulent before it completes.

Further Reading

There are many reasons you might want to use eventing as the backbone of your system and if you want to find out more about Event-Driven Architecture then I recommend the following resources as a start:

  • Designing Event-Driven Systems by Ben Stopford
  • Building Event-Driven Microservices by Adam Bellemare (pre-release)
  • Cloud Native Patterns by Cornelia Davis
  • I wrote a follow up, longer article on this topic here on IBM Developer.