cloud native, event driven

3 Advantages of the Data Mesh: Distributed Data for the Microservice World.

Photo by Ricardo Gomez Angel on Unsplash

Many companies have adopted the centralised data architecture, typified by large multi-domain centralised monolithic data stores and a central data team. In contrast the Data Mesh, described by Zhanak Dehgani is decentralised, domain specific and used by decentralised teams. It builds on the already established world of microservice architectures and domain-driven design.

Domain-Driven design is a topic in it’s own right but encompasses the idea that the domain and the business logic within that domain should be the main driving force behind software design. The same can be applied to data design and the data mesh is one approach.

The data mesh architecture, proposes modelling the data architecture using the same principles, by domain. The data processes are all managed within the relevant business domain. Teams are also domain focused and often multifunctional in their skillset.

In the blog post How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, she mentions a shift in focus towards:

  • serving over ingesting
  • discovering and using over extracting and loading
  • Publishing events as streams over flowing data around via centralized pipelines’
  • Ecosystem of data products over centralized data platform

From a microservice-architecture approach, rather than being specifically about decentralised infrastructure, this is actually more about decentralisation of the knowledge, skills and ownership of the data products that make up the business. While it could be that both the infrastructure and the teams are decentralised, the emphasis is decentralised business domains.

Monolithic Team, Monolithic Solution

The way we have designed ourselves traditionally (especially around data) encourages monolithic solutions. One team to rule all the data. This team manages all of the data in the business. These are often painful to set up as there is a lack of understanding between the domain SMEs (subject matter expert) and the data SMEs. The business need new datasets, systems, insights, models. As the business grows, so do the demands on the central data team.

Eventually, the central data team can become a bottleneck as more and more sources and users of data are scheduled to be added. This can make things move very slowly. It is an anti-pattern to development; both for delivering products and for personal up-skilling. It leads to neither side feeling very satisfied and is impossible to scale based on demand. It’s kind of okay for everyone but great for noone.

The data mesh nudges us towards teams that are leaner, multi-functional, agile and empowered. It also focuses on a business structure that is domain-driven rather than function driven. To ensure fresh ideas and continuous learning, technical roles like engineers and developers could regularly switch domain in order to keep circulation of knowledge and learning. For a small company, this concept is still possible. You might have a pool of technical resource that can move around the domain teams. The Data Science team, for example, could work as a kind of consultancy, moving around the domains helping them with data science projects. I.T specialists could also operate as in a consultancy type manner.

Data as a Domain Product

In the data mesh architecture, data is a product of the domain. It can be seen as something that is owned by the domain and can be subscribed to by other parts of the business (access permitting). Domain-Driven Design should be at the forefront of this approach, looking at the different domains that you have. There are a number of methods that a business can adopt to figure out the domain events that exist and to which domain an event belongs. One such method is event-storming, which we can perhaps discuss in more detail another time.

The idea is that teams have the data they want when they want it. The data could be served either via APIs or as events. API security and role access could be defined by the governance and security layer, which sits across all of the domains.

Data as events

Using an event-driven approach would enable data to be produced and consumed by decoupled domains. The teams wouldn’t need to worry about how other business domains want the data shaped or transformed, they are merely the producers. Domains can consume the data available and perform operations as is necessary for their own specific needs. If you want to learn more about event-driven architectures, I’ve written about them before here and here.

Advantages of the Data Mesh

There are some great advantages that I see when it comes to adopting a data mesh approach. Here are three:

Teams can choose their own technology and skills – In the microservice approach, each team is responsible for their own datastore (including the design, technology and operations). They are like their own little company, making their own decisions and delivering their ‘products’ to the rest of the business. This is in a way more realistic, each ‘domain’ has different needs (both technically and non-technically). One might use a SQL database, the other wants to use a NoSQL setup or a document store. One team is great at Python and the other one wants to use something else.

Decoupling services (or domain data) is empowering for the teams – They are now free to make their own decisions on how to collect, organise, store and use data (not only their own but any other sources to which they are subscribed.

Innovation happens faster – The decoupling that happens as a result of the data mesh is good for each of the domains. They can simply subscribe to events from a domain of their chosing. This means a reduction in time requesting and waiting for data (perhaps due to another teams current workload). Being able to just plug in to an API or event stream and start exploring will give rise to new opportunities for innvoation.

Summary

Overall, I think the data mesh is a really interesting concept that aligns well with the current shift towards cloud-native ecosystems. However, there are a lot of considerations when moving towards this kind of approach. This is just an introduction and there are a number of elements that I have not discussed such as security, data quality and governance. I hope to address some of these in a future post and talk a lot more about decentralised data.

cloud native, knative

Knative Eventing: Part 3 – Replying to broker

In part two of my Knative eventing tutorials, we streamed websocket data to a broker and then subscribed to the events with an event display service that displayed the events in real-time via a web UI.

In this post, I am going to show how to create reply events to a broker and then subscribe to them from a different service. Once we are done, we will have something that should look like the following:

In this scenario, we are receiving events from a streaming source (in this case a websocket) and each time we receive an event, we want to send another event as a reply back to the broker. In this example, we are just going to do this very simply in that every time we get an event, we send another.

In the next tutorial, we will look to receive the events and then perform some analysis on the fly, for example analyse the value of a transaction and assign a size variable. We could then implement some logic like if size = XL, send a reply back to the broker, which could then be listened for by an alerting application.

Setup

I am running Kubernetes on Docker Desktop for mac. You will also need Istio and Knative eventing installed.

Deploy a namespace:

kubectl create namespace knative-eventing-websocket-source

apply the knative-eventing label:

kubectl label namespace knative-eventing-websocket-source knative-eventing-injection=enabled

Ensure you have the cluster local gateway set up.

Adding newEvent logic:

In our application code, we are adding some code for sending a new reply event every time an event is received. The code is here:

newEvent := cloudevents.NewEvent()
newEvent.SetSource(fmt.Sprintf("https://knative.dev/jsaladas/transactionClassified"))
newEvent.SetType("dev.knative.eventing.jsaladas.transaction.classify")
newEvent.SetID("1234")
newEvent.SetData("Hi from Knative!")
response.RespondWith(200, &newEvent)

This code creates a new event, with the following information:

source = "https://knative.dev/jsaladas/transactionClassified"

type = "dev.knative.eventing.jsaladas.transaction.classify"

data = "Hi from Knative!"

I can use whichever values I want for the above, this is just the values I decided on, feel free to change them. We can then use these fields later to filter for the reply events only. Our code will now, aside from receiving an event and displaying it, generate a new event that will enter the knative eventing ecosystem.

Initial components to run the example

The main code for this tutorial is already in the github repo for part 2. If you already followed this and have this running then you will need to redeploy the event-display service with a new image. For those who didn’t join for part 2, this part shows you how to deploy the components.

Deploy the websocket source application:

kubectl apply -f 010-deployment.yaml

Here is the yaml below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wseventsource
  namespace: knative-eventing-websocket-source
spec:
  replicas: 1
  selector:
    matchLabels: &labels
      app: wseventsource
  template:
    metadata:
      labels: *labels
    spec:
      containers:
        - name: wseventsource
          image: docker.io/josiemundi/wssourcecloudevents:latest
          env:
          - name: SINK
            value: "http://default-broker.knative-eventing-websocket-source.svc.cluster.local"

Next we will apply the trigger that will set up a subscription for the event display service to subscribe to events from the broker that have a source equal to "wss://ws.blockchain.info/inv". We can also filter on type or even another cloudEvent variable or, if we left them both empty, then it would filter on all events.

kubectl apply -f 040-trigger.yaml

Here is the trigger yaml:

apiVersion: eventing.knative.dev/v1alpha1
kind: Trigger
metadata:
  name: wsevent-trigger
  namespace: knative-eventing-websocket-source
spec:
  broker: default
  filter:
    sourceAndType:
      type: ""
      source: "wss://ws.blockchain.info/inv"
  subscriber:    
    ref:
      apiVersion: v1
      kind: Service
      name: event-display

Next we will deploy the event-display service, which is the specified subscriber of the blockchain events in our trigger.yaml. This application is where we create our reply events.

This is a Kubernetes service so we need to apply the following yaml files:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-display
  namespace: knative-eventing-websocket-source
spec:
  replicas: 1
  selector:
    matchLabels: &labels
      app: event-display
  template:
    metadata:
      labels: *labels
    spec:
      containers:
        - name: event-display
          image: docker.io/josiemundi/test-reply-broker
apiVersion: v1
kind: Service
metadata:
  name: event-display
  namespace: knative-eventing-websocket-source
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
    name: consumer
  - port: 9080
    protocol: TCP
    targetPort: 9080
    nodePort: 31234
    name: dashboard
  selector:
    app: event-display

If you head to localhost:31234, you should see the stream of events.

Subscribe to the reply events

Now we need to add another trigger, this time subscribing only to the reply events (that’s our newEvent that we set up in the go code). You can see that, in this case, we specify the source as "https://knative.dev/jsaladas/transactionClassified".

Here is the trigger yaml we apply:

apiVersion: eventing.knative.dev/v1alpha1
kind: Trigger
metadata:
  name: reply-trigger-test
  namespace: knative-eventing-websocket-source
spec:
  broker: default
  filter:
    sourceAndType:
      type: ""
      source: "https://knative.dev/jsaladas/transactionClassified"
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1alpha1
      kind: Service
      name: test-display

This time, our subscriber is a Knative service called test-display, which we still need to deploy.

Run the following to deploy the knative service that subscribes to reply events:

kubectl --namespace knative-eventing-websocket-source apply --filename - << END
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: test-display
spec:
  template:
    spec:
      containers:
        - image: gcr.io/knative-releases/github.com/knative/eventing-contrib/cmd/event_display@sha256:1d6ddc00ab3e43634cd16b342f9663f9739ba09bc037d5dea175dc425a1bb955
END

We can now get the logs of the test-display service and you should only see the reply messages:

kubectl logs -l serving.knative.dev/service=test-display -c user-container --tail=100 -n knative-eventing-websocket-source

Next time we will look at classifying the events and use transaction size as a reply to the broker.

cloud native

What are CloudEvents?

CloudEvents is a design specification for sending events in a common and uniform way. They are an interesting proposal for standardising the way we send events in an event-driven ecosystem. The specification is an open and a versatile approach for sending and consuming.

CloudEvents is currently an ‘incubating’ project with the CNCF. On the cloudevents website, they specify that the advantages of using cloud events are:

  • Consistency
  • Accessibility
  • Portability

Metadata about an event is contained within a CloudEvent, through a number of required (and optional) attributes including:

  • id
  • source
  • specversion
  • type

For more information about the attributes, you can take a look at the cloudevents spec.

Here is an example of a CloudEvent from my previous eventing example:

You can see the required attributes are:

  • id: 8e3cf8fb-88bb-4a00-a3fe-0635e221ce92
  • source: wss://ws.blockchain.info/inv
  • specversion: 0.3
  • type: websocket-event

There are also some extension attributes such as knativearrivaltime, knativehistory and traceparent. We then also have the body of the message in Data.

Having these set attributes means they can be used for filtering (e.g through a Knative eventing trigger) and also for capturing key information that can be used by other services that subscribe to the events. I can, for example, filter for events that are only from a certain source or of a certain type.

CloudEvents are currently supported by Knative, Azure Event Grid and Open FaaS.

There are number of libraries for CloudEvents inlcuding for Python, Go and Java. I’ve used the go-sdk for CloudEvents a lot lately and will be running through some of this in some future posts.