And the first thing on my mind is kind of where we left off…
Over the last year, I’ve been to a few talks on data mesh and the area I have found them to be lacking in is putting some specifics against benefits. A few have mentioned the value being subjective, but when you are making a financial case for allocating resource and releasing cash from the top cats, they don’t warm easily to that logic.
There are a lot of words used: outcome-driven, scalable, control, agility, efficiency, data potential, data-driven value.
I agree with all of the above but wanted to explore some specifics on where the value can/tends to drop out and how some of it may materialise.
Hold on. What is data mesh?
For those who may be new to this concept; data mesh is a socio-technical approach to managing and sharing data. It’s particularly suited to
- Analytical workflows – training models, dashboards to display dashboards, predictive analytics, historic analysis
- Big Data – lots of data, lots of pipelines
- Scaling complexity – data coming from various sources, varying uses
Zhamak Dehghani has mentioned the objectives of data mesh being:
- Responding gracefully to change
- Sustain agility during growth
- Get value from data
If you want to know more about it then check out this article or this website.
This will be a five part series and for the purposes of keeping it organised, I’ve split it into (at least) four parts based on the principles of data mesh.
Let’s just recap them:
- Domain Ownership
- Data as a product
- Self-serve data platform
- Data governed wherever it is
For this article let’s start with the first – Data democratisation; individual domains owning the data they produce and leveraging their SME (or domain) knowledge to improve data quality.
This means that rather than data being owned centrally, it is owned by the product domains. These domains decide how their data is managed and drive the product roadmap.
The benefits that I will discuss in relationship to this are:
- Removing bottlenecks that come from the inability to scale out people
- Improved data quality
- Clarifying data owners and empowering teams to curate and innovate at pace
- Empowering organisational structures
Removal of bottlenecks:
When people talk about data mesh, the main bottleneck commonly illustrated is that of a central data team. Most companies who have a centralised data model tend to hold their data and technical resource in one place. These teams are working on a backlog that encompasses work for every (or most) domains within the business, commonly dealing with data requests, changes to data, data monitoring and working on enabling new data sources.
I’d just like to say that there are some great relationships that are built between central data teams and the business and I think it’s unfair to dismiss it as a completely dysfunctional set-up. However, it’s unreasonable to expect a technical data expert to understand the inner workings, data suitability, strategic and product roadmaps of every domain within a company. Rather than offering a service, they should offer their skills to the domain teams. A data engineer, for example, could be working across a few domain groups, allowing more space for understanding the context and offering bespoke solutions.
The rate of business data is increasing, as are the number of digital initiatives as companies try to constantly adapt to changing conditions. The central data team are encompassing more than ever.
The central data team fall in the murky middle of what this article from ThoughtWorks defines as data producers (the people who produce the data and are experts on it) and data consumers (the people who want to use the data). This has implications for how invested people are in bringing together the facets that need to exist for data-led initiatives to succeed.
When a domain teams remit is entirely operational, they do not need to concern themselves with a data product roadmap or how their process affects downstream value possibilities. This can cause bottlenecks in terms of scalability and a real lack of connection and alignment with the business purpose.
Let’s imagine a logistics team within an FMCG (Fast-moving consumer goods) company, who interact with geospatial data on day-to-day basis. They are tracking vehicles in their fleet to optimise routing and troubleshoot any issues that arise en route. The company have a central data team who provide the infrastructure for the collection, transformation and availability of this data for use.
Now a new technology like h3 comes in. It offers faster geospatial analysis and allows them to speed up optimisation queries. The implications of it are that they could make more deliveries and reduce the costs.
In a traditional central data approach they may not be able to migrate to a this new indexing system, perhaps due to issues with supporting it or the central team simply don’t have time. They also have a limited understanding of geospatial data and methodologies for working with it. This becomes a bottleneck to opening up new cost-savings opportunities for the business.
However, when the logistics team own the data as a domain they can decide for themselves to use h3. Provided they do not break any functionality for other users around the business, they could swap technology (a lot faster than waiting for the central data team). They already have the inside knowledge because they deal with geospatial data day-in day-out. The deep knowledge of the domain team is leveraged much faster. They can optimise more routes and get their deliveries done quicker and more efficiently.
This data being made available to the rest of the business, can now also be used by other team like fleet management. They use it to understand changing routes and how this impacts the configuration of fleet they need.
Improved Data Quality
A Gartner survey revealed that poor data quality costs businesses an average of $15m per year. Central technology teams lack the domain knowledge needed to assess the quality or suitability of the data.
Machine learning operations is probably going to be a big driver towards a data mesh type architecture. Without good data it’s really hard to make this work well. As they say, trash in trash out. Understanding whether biases may be present in the data, having human-in-the-loop type workflows all lend themselves to having a different organisational structure to ensure data quality. KPMG have previously published research showing that up to sixty-five percent of senior execs do not trust the data being used to drive analytics and AI within their organisation.
Research from McKinsey suggests that, on average, 30% of an employees time was spent on non-value add tasks because of poor data quality and availability.
Let’s take a company that makes industrial machinery which has sensors on it. The machines have sensors that collect data on usage patterns, time on, time off, temperatures, vibrations etc.
One of the analysts in the quality control team are using the data to assess the health of the machines to try and detect when breakages might occur. One of the analysts in the team finds a really neat way to filter outliers in the data for faulty sensors.
However, the central data team can’t easily implement the logic for filtering outliers either because they don’t have time, lack the skills or their tools don’t allow.
So now the quality team are able to do it but it can’t be made available to the rest of the business (not without significant work for the central data team). As a result they become a silo that, because of their method for filtering, have data that is more accurate but crucially different to what the rest of the business has.
The service team use this data to schedule service maintenance. A service engineer gets called out and twenty percent of these call-outs are due to incorrect data. The company have a team of twelve service engineers 1000 hours of callout hours. If they too could quickly benefit from this new outlier logic, they could half the number of call-outs due to outliers. Across this team, 1200 call-out hours could be saved annually, saving the business a large amount of money. Money could also be saved on replacing parts unnecessarily.
Clarity on data ownership
When it’s clear who owns the data and where it is, access to that data should be quicker. And if that is the case then it should also speed up a businesses ability to bring new products and features to market. Finding out who owns data in a business can be an adventure on its own.
Allowing different teams in the business access to data more quickly and from different parts of the business can reduce complex ETL (Extract, Transform, Load) processes and time (normally required from the central data team).
For someone in the business wanting to do some sentiment analysis on social media, they can get up and running with the data from somewhere like Twitter almost immediately. If I want to get internal data on customer feedback on some of our services, it could take a whole lot longer; find the owner, make the business case, make a request to data services, wait for data services. These types of activities are most often initiated from the data science or analytics teams rather than the domains themselves.
This brings us neatly on to the subject of empowerment.
An empowering organisational structure for data
Conway’s law says that the structure of org puts inherent structure with what your produce. Perhaps even moreso with data nowadays.
“Organisations who design systems are constrained to produce designs which are copies of the communication structures of these organisations”
Casey Muratori talks about Conway’s law in this video (not about data mesh!) and it’s pretty eye-opening when you can spot organisational structure in software architectures. Just imagine the possibilities when the organisational structure is enabling rather than disabling in regards to what is possible with data.
Data products are currently being created by data teams, not product teams. By moving the data owners into domains, we will create very different products.
Technology doesn’t solve problems. Let’s just put that out there. Zhamak said something really poignant in her talk at Big Data London which was:
“Technology should disappear into the background”
It’s one of the reasons why I like that it’s referred to as a ‘socio-technical’ approach, with hints at what research suggests, which is that technology is only around twenty percent of a digital initiatives success. The current structure of organisations does not always lend itself to collaborating with data across domains. With many companies still trying to centralise their data, many initiatives simply lack the people and process side of a strategic approach to getting value from data.
We are all in recovery from long and unfruitful campaigns towards single source of truth and, truth is, most master data management initiatives tend to fail (around ninety percent according to Gartner!). Forty-five percent of businesses are unable to locate their master data effectively. These initiatives also tend to need huge investment and simply don’t deliver on the promised value.
I think the business case here is really compelling when we know that between seventy-five and ninety percent of digital initiatives fail. Most of the failure comes from lack of alignment, people and change. Data mesh directly addresses a number of these and even a small percentage of improvement could improve ROI by a significant amount. In this Deloitte report, they note that data mastery, composed of democratisation of data and self-serve insights as well as data products has the strongest impact on business outcomes.
Data mesh can be implemented stepwise and doesn’t require a single architecture that can satisfy all data use cases across the entire business. It doesn’t cling to this illusive single source of truth but rather empowers domains to care about their data and to understand and innovate ways it can be used to achieve business outcomes.
Domains demand data mesh
The possibilities for innovation with data are increased hugely by having domain data teams who are invested in, excited by the potential of and the subject matter experts of their data assets, products and roadmap.
Do you think we can get to a point when domains are asking the business to implement a data mesh rather than it being a technology initiative first?
In the next instalment of this series, I will be discussing the value from having ‘data as a product’. However, hoping to post some other cool stuff not data-mesh related in-between.