cloud native, event driven

3 Advantages of the Data Mesh: Distributed Data for the Microservice World.

Photo by Ricardo Gomez Angel on Unsplash

Many companies have adopted the centralised data architecture, typified by large multi-domain centralised monolithic data stores and a central data team. In contrast the Data Mesh, described by Zhanak Dehgani is decentralised, domain specific and used by decentralised teams. It builds on the already established world of microservice architectures and domain-driven design.

Domain-Driven design is a topic in it’s own right but encompasses the idea that the domain and the business logic within that domain should be the main driving force behind software design. The same can be applied to data design and the data mesh is one approach.

The data mesh architecture, proposes modelling the data architecture using the same principles, by domain. The data processes are all managed within the relevant business domain. Teams are also domain focused and often multifunctional in their skillset.

In the blog post How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, she mentions a shift in focus towards:

  • serving over ingesting
  • discovering and using over extracting and loading
  • Publishing events as streams over flowing data around via centralized pipelines’
  • Ecosystem of data products over centralized data platform

From a microservice-architecture approach, rather than being specifically about decentralised infrastructure, this is actually more about decentralisation of the knowledge, skills and ownership of the data products that make up the business. While it could be that both the infrastructure and the teams are decentralised, the emphasis is decentralised business domains.

Monolithic Team, Monolithic Solution

The way we have designed ourselves traditionally (especially around data) encourages monolithic solutions. One team to rule all the data. This team manages all of the data in the business. These are often painful to set up as there is a lack of understanding between the domain SMEs (subject matter expert) and the data SMEs. The business need new datasets, systems, insights, models. As the business grows, so do the demands on the central data team.

Eventually, the central data team can become a bottleneck as more and more sources and users of data are scheduled to be added. This can make things move very slowly. It is an anti-pattern to development; both for delivering products and for personal up-skilling. It leads to neither side feeling very satisfied and is impossible to scale based on demand. It’s kind of okay for everyone but great for noone.

The data mesh nudges us towards teams that are leaner, multi-functional, agile and empowered. It also focuses on a business structure that is domain-driven rather than function driven. To ensure fresh ideas and continuous learning, technical roles like engineers and developers could regularly switch domain in order to keep circulation of knowledge and learning. For a small company, this concept is still possible. You might have a pool of technical resource that can move around the domain teams. The Data Science team, for example, could work as a kind of consultancy, moving around the domains helping them with data science projects. I.T specialists could also operate as in a consultancy type manner.

Data as a Domain Product

In the data mesh architecture, data is a product of the domain. It can be seen as something that is owned by the domain and can be subscribed to by other parts of the business (access permitting). Domain-Driven Design should be at the forefront of this approach, looking at the different domains that you have. There are a number of methods that a business can adopt to figure out the domain events that exist and to which domain an event belongs. One such method is event-storming, which we can perhaps discuss in more detail another time.

The idea is that teams have the data they want when they want it. The data could be served either via APIs or as events. API security and role access could be defined by the governance and security layer, which sits across all of the domains.

Data as events

Using an event-driven approach would enable data to be produced and consumed by decoupled domains. The teams wouldn’t need to worry about how other business domains want the data shaped or transformed, they are merely the producers. Domains can consume the data available and perform operations as is necessary for their own specific needs. If you want to learn more about event-driven architectures, I’ve written about them before here and here.

Advantages of the Data Mesh

There are some great advantages that I see when it comes to adopting a data mesh approach. Here are three:

Teams can choose their own technology and skills – In the microservice approach, each team is responsible for their own datastore (including the design, technology and operations). They are like their own little company, making their own decisions and delivering their ‘products’ to the rest of the business. This is in a way more realistic, each ‘domain’ has different needs (both technically and non-technically). One might use a SQL database, the other wants to use a NoSQL setup or a document store. One team is great at Python and the other one wants to use something else.

Decoupling services (or domain data) is empowering for the teams – They are now free to make their own decisions on how to collect, organise, store and use data (not only their own but any other sources to which they are subscribed.

Innovation happens faster – The decoupling that happens as a result of the data mesh is good for each of the domains. They can simply subscribe to events from a domain of their chosing. This means a reduction in time requesting and waiting for data (perhaps due to another teams current workload). Being able to just plug in to an API or event stream and start exploring will give rise to new opportunities for innvoation.


Overall, I think the data mesh is a really interesting concept that aligns well with the current shift towards cloud-native ecosystems. However, there are a lot of considerations when moving towards this kind of approach. This is just an introduction and there are a number of elements that I have not discussed such as security, data quality and governance. I hope to address some of these in a future post and talk a lot more about decentralised data.

knative, kubernetes

Knative Eventing: Part 2 – streaming CloudEvents to a UI

I’ve been looking at Knative eventing a fair bit lately and one of the things I have been doing is building an eventing demo (the first part of which can be found here). As part of this demo, I wanted to understand how I could get CloudEvents that were being sent by my producer to display in real time via a web UI (event display service UI).

Here is a bit of info and an overview of the approach I took. The code to run through this tutorial can be found here.

Prerequisites and set-up

First, you will need to have Knative and your chosen Gateway provider installed (I tried this with both Istio and Gloo, which both worked fine). You can follow the instructions here.

Initially deploy the 001-namespace.yaml by running:

kubectl apply -f 001-namespace.yaml

Verify you have a broker:

kubectl -n knative-eventing-websocket-source get broker default

You will see that the broker has a URL, this is what we will use as our SINK in the next step.

Deploy the Blockchain Events Sender Application

The application that sends the events was discussed in my Knative Eventing: Part 1 post and you can find the repo with all the code for this application here.

To get up and running you can simply run the 010-deployment.yaml file. Here is a reminder of what it looks like:

apiVersion: apps/v1
kind: Deployment
  name: wseventsource
  namespace: knative-eventing-websocket-source
  replicas: 1
    matchLabels: &labels
      app: wseventsource
      labels: *labels
        - name: wseventsource
          - name: SINK
            value: "http://default-broker.knative-eventing-websocket-source.svc.cluster.local"

This is a Kubernetes app deployment. The name of the deployment is wseventsource and the namespace is knative-eventing-websocket-source. We have defined an environmental variable of SINK, for which we set the value as the address of our broker.

Verify events are being sent by running:

kubectl --namespace knative-eventing-websocket-source logs -l app=wseventsource --tail=100 

This is what we currently have deployed:

Add a trigger – Send CloudEvents to Event-Display

Now we can deploy our trigger, which will set our event-display service as the subscriber.

# Knative Eventing Trigger to trigger the helloworld-go service
kind: Trigger
  name: wsevent-trigger
  namespace: knative-eventing-websocket-source
  broker: default
      type: ""
      source: ""
      apiVersion: v1
      kind: Service
      name: event-display

In the file above, we define our trigger name as wsevent-trigger and the namespace. In spec > filter I am basically specifying for the broker to send all events to the subscriber. The subscriber in this case is a Kubernetes services rather than a Knative Service.

kubectl apply -f 030-trigger.yaml

Now we have the following:

A trigger can exist before the service and vice versa. Let’s set up our event display.

Stream CloudEvents to Event Display service

I used the following packages to build the Event Display service:

Originally I deployed my event-display application as a Knative Service and this was fine but I could only access the events through the logs or by using curl.

Ideally, I wanted to build a stream of events that was push all the way to the UI. However, I discovered that for this use case it wasn’t possible to deploy this way. This is because Knative serving does not allow multiple ports in a service deployment.

I asked the question about it in the Knative Slack channel and the response was mainly to use mux and specify a path (I saw something similar in the sockeye GitHub project).

In the end, I chose to deploy as a native Kubernetes service instead. The reason is that it seemed like the most applicable way to do this, both in terms of functionality and also security. I was a little unsure about the feasibility of using mux in production as you may not want to expose an internal port externally.

For the kncloudevents project, I struggled to find detailed info or examples but the code is built on top of the Go sdk for CloudEvents and there are some detailed docs for the Python version.

We can use it to listen for HTTP cloudevents requests. By default it will listen on port 8080. When we use the StartReceiver function, this is essentially telling our code to start listening. Because this happens on one port, we need another to ListenAndServe.

So here are the two yaml files that we deploy for the event-display.

App Deployment:

apiVersion: apps/v1
kind: Deployment
  name: event-display
  namespace: knative-eventing-websocket-source
  replicas: 1
    matchLabels: &labels
      app: event-display
      labels: *labels
        - name: event-display

Service Deployment:

apiVersion: v1
kind: Service
  name: event-display
  namespace: knative-eventing-websocket-source
  type: NodePort
  - port: 80
    protocol: TCP
    targetPort: 8080
    name: consumer
  - port: 9080
    protocol: TCP
    targetPort: 9080
    nodePort: 31234
    name: dashboard
    app: event-display

With everything deployed we now have the following:

Now if you head to the nodeport specified in the yaml:


Next time, we will look at how to send a reply event back into the Knative eventing space.

cloud native

What are CloudEvents?

CloudEvents is a design specification for sending events in a common and uniform way. They are an interesting proposal for standardising the way we send events in an event-driven ecosystem. The specification is an open and a versatile approach for sending and consuming.

CloudEvents is currently an ‘incubating’ project with the CNCF. On the cloudevents website, they specify that the advantages of using cloud events are:

  • Consistency
  • Accessibility
  • Portability

Metadata about an event is contained within a CloudEvent, through a number of required (and optional) attributes including:

  • id
  • source
  • specversion
  • type

For more information about the attributes, you can take a look at the cloudevents spec.

Here is an example of a CloudEvent from my previous eventing example:

You can see the required attributes are:

  • id: 8e3cf8fb-88bb-4a00-a3fe-0635e221ce92
  • source: wss://
  • specversion: 0.3
  • type: websocket-event

There are also some extension attributes such as knativearrivaltime, knativehistory and traceparent. We then also have the body of the message in Data.

Having these set attributes means they can be used for filtering (e.g through a Knative eventing trigger) and also for capturing key information that can be used by other services that subscribe to the events. I can, for example, filter for events that are only from a certain source or of a certain type.

CloudEvents are currently supported by Knative, Azure Event Grid and Open FaaS.

There are number of libraries for CloudEvents inlcuding for Python, Go and Java. I’ve used the go-sdk for CloudEvents a lot lately and will be running through some of this in some future posts.

knative, kubernetes

Step by Step: Deploy and interact with a Knative Service

In this post, I will show how to deploy a Knative service and interact with it through curl and via the browser. I’ll go over some of the useful stuff to know as I found this kind of confusing at first.

I’m running this on a mac using the Kubernetes that’s built in to Docker Desktop, so things will be a bit different if you are running another flavor of Kubernetes. You will need Istio and the Knative serving components installed to follow along with this.

For the service, we are deploying a simple web app example from, which by default prints out “Hi there, I love (word of your choice)”. The code is at the link above, or I have a simple test image on Docker hub, which just prints out “Hi there, I love test” (oh the lack of creativity!)

Deploying a Knative Service

First we need to create a namespace, in which our Knative service will be deployed. For example:

kubectl create namespace web-service

Here is the Knative service deployment, which is a file called service.yaml.

kind: Service
  name: event-display
        - image:

Deploy the service yaml by running the following command:

kubectl apply -f service.yaml -n web-service

Now run the following in order to view the Knative service and some details we will need:

kubectl get ksvc -n web-service

There are a few fields, including:

NAME: The name of the service

URL: The url of the service, which we will need to interact with it. By default the URL will be “<your-service-name>.<namespace>” however you can also have a custom domain.

READY: This should say “True”, if not it will say “False” and there will be a reason in the REASON field.

After a little while, you might notice the service will disappear as it scales down to zero. More on that in a while.


To interact with the service we just deployed, we need to understand a bit about the IngressGateway. By default, Knative uses the istio-ingressgateway as its gateway service. We need to understand this in order to expose our service outside of the local cluster.

We can look at the istio-ingressgateway using the following command:

kubectl get service istio-ingressgateway --namespace istio-system

This will return the following:

Within the gateway configuration, there are a number of ports and NodePorts specified as default including the one we will use to communicate with our service:

port: number: 80, name: http2 protocol: HTTP

To find the port for accessing the service you can run the following:

kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?("http2")].port}'   

You can customise the Gateway configuration. Details and the different ports can be found here in the Istio documentation. I’d also recommend running through the Istio httpbin example to understand a bit more about istio and ingressgateway.

To interact with our service we will need to combine both the URL ( and the EXTERNAL-IP (localhost) which we saw for the istio-ingressgateway. Depending on your set up, these will not be the same as mine.

It will be something like the following:

curl -H "Host:"

Scaling our Service

Your initial pod has probably disappeared right now because when a service is idle, it will scale down to zero after around 90 seconds. You should see the pod start ‘Terminating’ and then disappear.

Knative uses the KPA (Knative Pod Autoscaler), which runs as a Kubernetes deployment. The KPA scales based on requests (concurrency), however it is also possible to use the HPA (Horizontal Pod Autoscaler), which allows scaling based on CPU.

You can find out more detailed information about autoscaling here but for now just note that you can change the parameters in the ConfigMap.

To see the autoscaler config you can run the following command:

kubectl describe configmap config-autoscaler -n knative-serving

To edit the ConfigMap:

kubectl edit configmap config-autoscaler -n knative-serving 

In the result you will see some fields including:

scale-to-zero-grace-period: 30s
stable-window: 60s

The scale-to-zero-grace-period specifies how long it will wait until it scales an inactive service down to zero. The autoscaler takes a 60 second window to assess activity. If it is determined that within that 60 seconds stable-window, there are no events, it will then wait another 30 seconds before scaling to zero. This is why it takes around 90 seconds to terminate an inactive service.

If desired, these can be amended so that your service will scale down faster or slower. There is also a field called enable-scale-to-zero, which (if you want to be able to scale to zero) must be set to “true”.

Test using curl

Once you curl the service again you should see the pod spin up again.

curl -H "Host:"

Should return:

Hi there, I love test!

Access Knative Service through browser

If you are using Docker Desktop on a mac, to access through a browser you could add the host to the hostfile on your mac.

sudo vi /etc/hosts

Add to the file and save it.

Alternatively, if you don’t want to (or can’t) change the host file, I used the “Simple Modify Headers” browser plugin. Then click on the icon once installed and select ‘configure’. Input the parameters as follows and then click the start button.

Now open http://localhost/test and you should see:

knative, kubernetes

Knative Eventing Example: Part 1

In my last post I shared some methods for getting up and running with Knative eventing. In this post, we are going to step back a little to try and understand how the eventing component works. It gives a high level overview of a demo you can follow along with on GitHub.

This will be the first of a series of posts, which walk through an example, that we will build out in complexity over the coming weeks.

First, lets go over a few reasons why we might look to use eventing.

What capabilities does Knative Eventing bring?

  • Ability to decouple producers and consumers (this means, for example, that consumers can be subscribed to an event type before any of those event types have been produced).
  • Events are published as CloudEvents (this is a topic I would like to cover separately in more detail).
  • Push-based messaging

There are a number of key components that I will describe below, which together will make up the initial example. Channels and subscriptions will not be included in this post, we’ll discuss those another time.

What are we building?

Let’s first take a look at the diagram below to get a picture of how these components fit and interact together. This diagram shows the type of demo scenario we are looking to recreate over the next few posts.

Each of these components are deployed using yaml files, except for the broker, which is automatically created once the knative injection is enabled within the namespace. You can deploy a custom broker if you wish but I won’t include that in this post.

In this simple example, we use a Kubernetes app deployment as the source and a Knative Service as the consumer, which will subscribe to the events.

The code that I use to stream the events to the broker is available here on GitHub and gives more detailed instructions if you want to build it yourself. It also contains the yaml files used in this tutorial.


Our source is the producer of the events. It could be an application, a web socket, a process etc. It produces events that other services may or may not be interested in subscribing to.

There are a number of different types of sources, each one is a custom resource. The range of sources available can be seen in the Knative documentation here. You can create your own event source if you need to.

The following yaml shows a simple Kubernetes app deployment which is our source:

In the above example, the source is a go application, which streams messages via a web socket connection. It sends them as CloudEvents, which our service will consume. It is available to view here.

Broker and Trigger are CRDs, which will manage the delivery of events and abstract away the details of these from the related services.


The broker is where events get received. It is like a holding area, from where they can be consumed by those interested. As mentioned above, a default broker is automatically created when you label your namespace with kubectl label namespace my-event-namespace knative-eventing-injection=enabled


Our trigger provides a filter, by which it is determines which events should be delivered to a given consumer.

Here below is an example trigger, which just defines that the event-display service subscribes to all events from the default broker.

Under spec, I can also add filter: > attributes: and then include some CloudEvent attributes to filter by. We can then filter the CloudEvent fields, such as type or source etc in order to determine those to which a service subscribes. Here is another example, which filters on a specific event type:

You can also filter on an expression such as:

expression: ce.type == "com.github.pull.create"


We can have one or multiple consumers. These are the services that are interested (or not) in the events. If you have a single consumer, you can send straight from the source to consumer.

Here is the Knative Service deployment yaml:

Because I want to send events to a Knative Service, I need to have the cluster-local visibility label (this is explained in more detail below). For the rest, I am using a pre-built image from knative for a simple event display, the code for which can be found here.

Once you have all of these initial components, it looks like this:


I had some issues with getting the events to the consumer at first when trying this initial demo out. In the end I found out that in order to sink to a Knative Service, you need to add the cluster local gateway to the Istio installation. This is somewhat vaguely mentioned in the docs around installing Istio but could probably have been a bit clearer. Luckily, I found this great post, which helped me massively!

When you install Istio you will need to ensure that you see (at least) the following:

Handy tips and stuff I wish I had known before…

If you want to add a sink URL in your source file, see the example below for a broker sink:


default is the name of the broker and knative-eventing-websocket-source is the namespace.

In order to verify events are being sent/received/consumed then you can use the following examples as a reference:

//Getting the logs of the source app
kubectl --namespace knative-eventing-websocket-source logs -l app=wseventsource --tail=100 

//Getting the logs of the broker
kubectl --namespace knative-eventing-websocket-source logs -l --tail=100 

//Getting the logs of a Knative service
kubectl logs -l -c user-container --since=10m -n knative-eventing-websocket-source

You should see something like this when you get the service logs:

Next steps

Next time we will be looking at building out our service and embellishing it a little so we can transform, visualise and send new events back to a broker.