cloud native, event driven

3 Advantages of the Data Mesh: Distributed Data for the Microservice World.

Photo by Ricardo Gomez Angel on Unsplash

Many companies have adopted the centralised data architecture, typified by large multi-domain centralised monolithic data stores and a central data team. In contrast the Data Mesh, described by Zhanak Dehgani is decentralised, domain specific and used by decentralised teams. It builds on the already established world of microservice architectures and domain-driven design.

Domain-Driven design is a topic in it’s own right but encompasses the idea that the domain and the business logic within that domain should be the main driving force behind software design. The same can be applied to data design and the data mesh is one approach.

The data mesh architecture, proposes modelling the data architecture using the same principles, by domain. The data processes are all managed within the relevant business domain. Teams are also domain focused and often multifunctional in their skillset.

In the blog post How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, she mentions a shift in focus towards:

  • serving over ingesting
  • discovering and using over extracting and loading
  • Publishing events as streams over flowing data around via centralized pipelines’
  • Ecosystem of data products over centralized data platform

From a microservice-architecture approach, rather than being specifically about decentralised infrastructure, this is actually more about decentralisation of the knowledge, skills and ownership of the data products that make up the business. While it could be that both the infrastructure and the teams are decentralised, the emphasis is decentralised business domains.

Monolithic Team, Monolithic Solution

The way we have designed ourselves traditionally (especially around data) encourages monolithic solutions. One team to rule all the data. This team manages all of the data in the business. These are often painful to set up as there is a lack of understanding between the domain SMEs (subject matter expert) and the data SMEs. The business need new datasets, systems, insights, models. As the business grows, so do the demands on the central data team.

Eventually, the central data team can become a bottleneck as more and more sources and users of data are scheduled to be added. This can make things move very slowly. It is an anti-pattern to development; both for delivering products and for personal up-skilling. It leads to neither side feeling very satisfied and is impossible to scale based on demand. It’s kind of okay for everyone but great for noone.

The data mesh nudges us towards teams that are leaner, multi-functional, agile and empowered. It also focuses on a business structure that is domain-driven rather than function driven. To ensure fresh ideas and continuous learning, technical roles like engineers and developers could regularly switch domain in order to keep circulation of knowledge and learning. For a small company, this concept is still possible. You might have a pool of technical resource that can move around the domain teams. The Data Science team, for example, could work as a kind of consultancy, moving around the domains helping them with data science projects. I.T specialists could also operate as in a consultancy type manner.

Data as a Domain Product

In the data mesh architecture, data is a product of the domain. It can be seen as something that is owned by the domain and can be subscribed to by other parts of the business (access permitting). Domain-Driven Design should be at the forefront of this approach, looking at the different domains that you have. There are a number of methods that a business can adopt to figure out the domain events that exist and to which domain an event belongs. One such method is event-storming, which we can perhaps discuss in more detail another time.

The idea is that teams have the data they want when they want it. The data could be served either via APIs or as events. API security and role access could be defined by the governance and security layer, which sits across all of the domains.

Data as events

Using an event-driven approach would enable data to be produced and consumed by decoupled domains. The teams wouldn’t need to worry about how other business domains want the data shaped or transformed, they are merely the producers. Domains can consume the data available and perform operations as is necessary for their own specific needs. If you want to learn more about event-driven architectures, I’ve written about them before here and here.

Advantages of the Data Mesh

There are some great advantages that I see when it comes to adopting a data mesh approach. Here are three:

Teams can choose their own technology and skills – In the microservice approach, each team is responsible for their own datastore (including the design, technology and operations). They are like their own little company, making their own decisions and delivering their ‘products’ to the rest of the business. This is in a way more realistic, each ‘domain’ has different needs (both technically and non-technically). One might use a SQL database, the other wants to use a NoSQL setup or a document store. One team is great at Python and the other one wants to use something else.

Decoupling services (or domain data) is empowering for the teams – They are now free to make their own decisions on how to collect, organise, store and use data (not only their own but any other sources to which they are subscribed.

Innovation happens faster – The decoupling that happens as a result of the data mesh is good for each of the domains. They can simply subscribe to events from a domain of their chosing. This means a reduction in time requesting and waiting for data (perhaps due to another teams current workload). Being able to just plug in to an API or event stream and start exploring will give rise to new opportunities for innvoation.

Summary

Overall, I think the data mesh is a really interesting concept that aligns well with the current shift towards cloud-native ecosystems. However, there are a lot of considerations when moving towards this kind of approach. This is just an introduction and there are a number of elements that I have not discussed such as security, data quality and governance. I hope to address some of these in a future post and talk a lot more about decentralised data.

event driven

3 advantages of Event-Driven Architecture

My latest posts have put a lot of focus on Cloud native technologies. The last few have mentioned things like CloudEvents and Knative Eventing and it got me thinking… why might people want to implement event driven ecosystems in the first place?

I’ve decided to put together three advantages that I think offer pretty attractive prospects for implementing an Event Driven Architecture pattern.

True Decoupling of Producers and Consumers

The nature of an Event Driven Architecture ecosystem lends itself to microservices and, in this type of system there is (hopefully) a loose coupling between the services. Depending on the communication between microservices, there may still be dependencies between them
(e.g a http request/response approach).

In the excellent book, ‘Designing Event-Driven Systems‘, Ben Stopford tells us that event-driven services core mantra is “Centralize an immutable stream of facts. Decentralise the freedom to act, adapt and change”.

Because the ownership of data is separated by domain, this gives a nice logical separation between the production and consumption of events. As a producer I do not need to concern myself with how the events I produce are going to be consumed. Vice versa for the team consuming them. They are free to figure out for themselves what to do with the events, they do not need to be instructed. The message structure is also not important. It can be json, xml, avro etc. Doesn’t matter.

The broker and some kind of trigger between it and the services enables messages to be ingested into the event driven eco-system and then broadcast out to whichever services are interested in receiving them.

Business narrative of what has happened that can’t be changed

We have all heard the term ‘single source of truth’ and this is usually just a rumor (like the treasure chest hidden at the end of the rainbow). Well, in an event-driven ecosystem it really exists!

As mentioned above, an event-stream should be an immutable stream of facts. This is very representative of how our daily lives unfold; as a series of events. These events happened and it’s not possible to go back and change them unless you own one of these (remember, terrible things can happen to those who meddle with time)…

This is an advantage for business data governance as you can always look back in the log for auditing or to see what happened.

It is becoming more and more common for companies to need to explain their ‘data-derived’ decisions, e.g why a customer’s application for finance or insurance has been rejected. The log of immutable events that EDA provides us can provide a key component of this auditing.

Real-time event streams for Data Science.

One of the reasons I am enthusiastic about EDA is that it is particularly well suited to in-stream processing. It lends itself to fast decision making, things where milliseconds count.

Business logic can be applied while data is in motion rather than needing to wait for the data to land somewhere and then do the analysis. This is good for things like fraud detection, predictive analytics. Oftentimes, we need to know if a transaction is fraudulent before it completes.

Further Reading

There are many reasons you might want to use eventing as the backbone of your system and if you want to find out more about Event-Driven Architecture then I recommend the following resources as a start:

  • Designing Event-Driven Systems by Ben Stopford
  • Building Event-Driven Microservices by Adam Bellemare (pre-release)
  • Cloud Native Patterns by Cornelia Davis
  • I wrote a follow up, longer article on this topic here on IBM Developer.

cloud native, knative

Knative Eventing: Part 3 – Replying to broker

In part two of my Knative eventing tutorials, we streamed websocket data to a broker and then subscribed to the events with an event display service that displayed the events in real-time via a web UI.

In this post, I am going to show how to create reply events to a broker and then subscribe to them from a different service. Once we are done, we will have something that should look like the following:

In this scenario, we are receiving events from a streaming source (in this case a websocket) and each time we receive an event, we want to send another event as a reply back to the broker. In this example, we are just going to do this very simply in that every time we get an event, we send another.

In the next tutorial, we will look to receive the events and then perform some analysis on the fly, for example analyse the value of a transaction and assign a size variable. We could then implement some logic like if size = XL, send a reply back to the broker, which could then be listened for by an alerting application.

Setup

I am running Kubernetes on Docker Desktop for mac. You will also need Istio and Knative eventing installed.

Deploy a namespace:

kubectl create namespace knative-eventing-websocket-source

apply the knative-eventing label:

kubectl label namespace knative-eventing-websocket-source knative-eventing-injection=enabled

Ensure you have the cluster local gateway set up.

Adding newEvent logic:

In our application code, we are adding some code for sending a new reply event every time an event is received. The code is here:

newEvent := cloudevents.NewEvent()
newEvent.SetSource(fmt.Sprintf("https://knative.dev/jsaladas/transactionClassified"))
newEvent.SetType("dev.knative.eventing.jsaladas.transaction.classify")
newEvent.SetID("1234")
newEvent.SetData("Hi from Knative!")
response.RespondWith(200, &newEvent)

This code creates a new event, with the following information:

source = "https://knative.dev/jsaladas/transactionClassified"

type = "dev.knative.eventing.jsaladas.transaction.classify"

data = "Hi from Knative!"

I can use whichever values I want for the above, this is just the values I decided on, feel free to change them. We can then use these fields later to filter for the reply events only. Our code will now, aside from receiving an event and displaying it, generate a new event that will enter the knative eventing ecosystem.

Initial components to run the example

The main code for this tutorial is already in the github repo for part 2. If you already followed this and have this running then you will need to redeploy the event-display service with a new image. For those who didn’t join for part 2, this part shows you how to deploy the components.

Deploy the websocket source application:

kubectl apply -f 010-deployment.yaml

Here is the yaml below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wseventsource
  namespace: knative-eventing-websocket-source
spec:
  replicas: 1
  selector:
    matchLabels: &labels
      app: wseventsource
  template:
    metadata:
      labels: *labels
    spec:
      containers:
        - name: wseventsource
          image: docker.io/josiemundi/wssourcecloudevents:latest
          env:
          - name: SINK
            value: "http://default-broker.knative-eventing-websocket-source.svc.cluster.local"

Next we will apply the trigger that will set up a subscription for the event display service to subscribe to events from the broker that have a source equal to "wss://ws.blockchain.info/inv". We can also filter on type or even another cloudEvent variable or, if we left them both empty, then it would filter on all events.

kubectl apply -f 040-trigger.yaml

Here is the trigger yaml:

apiVersion: eventing.knative.dev/v1alpha1
kind: Trigger
metadata:
  name: wsevent-trigger
  namespace: knative-eventing-websocket-source
spec:
  broker: default
  filter:
    sourceAndType:
      type: ""
      source: "wss://ws.blockchain.info/inv"
  subscriber:    
    ref:
      apiVersion: v1
      kind: Service
      name: event-display

Next we will deploy the event-display service, which is the specified subscriber of the blockchain events in our trigger.yaml. This application is where we create our reply events.

This is a Kubernetes service so we need to apply the following yaml files:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-display
  namespace: knative-eventing-websocket-source
spec:
  replicas: 1
  selector:
    matchLabels: &labels
      app: event-display
  template:
    metadata:
      labels: *labels
    spec:
      containers:
        - name: event-display
          image: docker.io/josiemundi/test-reply-broker
apiVersion: v1
kind: Service
metadata:
  name: event-display
  namespace: knative-eventing-websocket-source
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
    name: consumer
  - port: 9080
    protocol: TCP
    targetPort: 9080
    nodePort: 31234
    name: dashboard
  selector:
    app: event-display

If you head to localhost:31234, you should see the stream of events.

Subscribe to the reply events

Now we need to add another trigger, this time subscribing only to the reply events (that’s our newEvent that we set up in the go code). You can see that, in this case, we specify the source as "https://knative.dev/jsaladas/transactionClassified".

Here is the trigger yaml we apply:

apiVersion: eventing.knative.dev/v1alpha1
kind: Trigger
metadata:
  name: reply-trigger-test
  namespace: knative-eventing-websocket-source
spec:
  broker: default
  filter:
    sourceAndType:
      type: ""
      source: "https://knative.dev/jsaladas/transactionClassified"
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1alpha1
      kind: Service
      name: test-display

This time, our subscriber is a Knative service called test-display, which we still need to deploy.

Run the following to deploy the knative service that subscribes to reply events:

kubectl --namespace knative-eventing-websocket-source apply --filename - << END
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: test-display
spec:
  template:
    spec:
      containers:
        - image: gcr.io/knative-releases/github.com/knative/eventing-contrib/cmd/event_display@sha256:1d6ddc00ab3e43634cd16b342f9663f9739ba09bc037d5dea175dc425a1bb955
END

We can now get the logs of the test-display service and you should only see the reply messages:

kubectl logs -l serving.knative.dev/service=test-display -c user-container --tail=100 -n knative-eventing-websocket-source

Next time we will look at classifying the events and use transaction size as a reply to the broker.

knative, kubernetes

Knative Eventing: Part 2 – streaming CloudEvents to a UI

I’ve been looking at Knative eventing a fair bit lately and one of the things I have been doing is building an eventing demo (the first part of which can be found here). As part of this demo, I wanted to understand how I could get CloudEvents that were being sent by my producer to display in real time via a web UI (event display service UI).

Here is a bit of info and an overview of the approach I took. The code to run through this tutorial can be found here.

Prerequisites and set-up

First, you will need to have Knative and your chosen Gateway provider installed (I tried this with both Istio and Gloo, which both worked fine). You can follow the instructions here.

Initially deploy the 001-namespace.yaml by running:

kubectl apply -f 001-namespace.yaml

Verify you have a broker:

kubectl -n knative-eventing-websocket-source get broker default

You will see that the broker has a URL, this is what we will use as our SINK in the next step.

Deploy the Blockchain Events Sender Application

The application that sends the events was discussed in my Knative Eventing: Part 1 post and you can find the repo with all the code for this application here.

To get up and running you can simply run the 010-deployment.yaml file. Here is a reminder of what it looks like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wseventsource
  namespace: knative-eventing-websocket-source
spec:
  replicas: 1
  selector:
    matchLabels: &labels
      app: wseventsource
  template:
    metadata:
      labels: *labels
    spec:
      containers:
        - name: wseventsource
          image: docker.io/josiemundi/wssourcecloudevents:latest
          env:
          - name: SINK
            value: "http://default-broker.knative-eventing-websocket-source.svc.cluster.local"

This is a Kubernetes app deployment. The name of the deployment is wseventsource and the namespace is knative-eventing-websocket-source. We have defined an environmental variable of SINK, for which we set the value as the address of our broker.

Verify events are being sent by running:

kubectl --namespace knative-eventing-websocket-source logs -l app=wseventsource --tail=100 

This is what we currently have deployed:

Add a trigger – Send CloudEvents to Event-Display

Now we can deploy our trigger, which will set our event-display service as the subscriber.

# Knative Eventing Trigger to trigger the helloworld-go service
apiVersion: eventing.knative.dev/v1alpha1
kind: Trigger
metadata:
  name: wsevent-trigger
  namespace: knative-eventing-websocket-source
spec:
  broker: default
  filter:
    sourceAndType:
      type: ""
      source: ""
  subscriber:    
    ref:
      apiVersion: v1
      kind: Service
      name: event-display

In the file above, we define our trigger name as wsevent-trigger and the namespace. In spec > filter I am basically specifying for the broker to send all events to the subscriber. The subscriber in this case is a Kubernetes services rather than a Knative Service.

kubectl apply -f 030-trigger.yaml

Now we have the following:

A trigger can exist before the service and vice versa. Let’s set up our event display.

Stream CloudEvents to Event Display service

I used the following packages to build the Event Display service:

Originally I deployed my event-display application as a Knative Service and this was fine but I could only access the events through the logs or by using curl.

Ideally, I wanted to build a stream of events that was push all the way to the UI. However, I discovered that for this use case it wasn’t possible to deploy this way. This is because Knative serving does not allow multiple ports in a service deployment.

I asked the question about it in the Knative Slack channel and the response was mainly to use mux and specify a path (I saw something similar in the sockeye GitHub project).

In the end, I chose to deploy as a native Kubernetes service instead. The reason is that it seemed like the most applicable way to do this, both in terms of functionality and also security. I was a little unsure about the feasibility of using mux in production as you may not want to expose an internal port externally.

For the kncloudevents project, I struggled to find detailed info or examples but the code is built on top of the Go sdk for CloudEvents and there are some detailed docs for the Python version.

We can use it to listen for HTTP cloudevents requests. By default it will listen on port 8080. When we use the StartReceiver function, this is essentially telling our code to start listening. Because this happens on one port, we need another to ListenAndServe.

So here are the two yaml files that we deploy for the event-display.

App Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-display
  namespace: knative-eventing-websocket-source
spec:
  replicas: 1
  selector:
    matchLabels: &labels
      app: event-display
  template:
    metadata:
      labels: *labels
    spec:
      containers:
        - name: event-display
          image: docker.io/josiemundi/bitcoinfrontendnew

Service Deployment:

apiVersion: v1
kind: Service
metadata:
  name: event-display
  namespace: knative-eventing-websocket-source
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
    name: consumer
  - port: 9080
    protocol: TCP
    targetPort: 9080
    nodePort: 31234
    name: dashboard
  selector:
    app: event-display

With everything deployed we now have the following:

Now if you head to the nodeport specified in the yaml:

http://localhost:31234

Next time, we will look at how to send a reply event back into the Knative eventing space.

knative, kubernetes

Knative Eventing Example: Part 1

In my last post I shared some methods for getting up and running with Knative eventing. In this post, we are going to step back a little to try and understand how the eventing component works. It gives a high level overview of a demo you can follow along with on GitHub.

This will be the first of a series of posts, which walk through an example, that we will build out in complexity over the coming weeks.

First, lets go over a few reasons why we might look to use eventing.

What capabilities does Knative Eventing bring?

  • Ability to decouple producers and consumers (this means, for example, that consumers can be subscribed to an event type before any of those event types have been produced).
  • Events are published as CloudEvents (this is a topic I would like to cover separately in more detail).
  • Push-based messaging

There are a number of key components that I will describe below, which together will make up the initial example. Channels and subscriptions will not be included in this post, we’ll discuss those another time.

What are we building?

Let’s first take a look at the diagram below to get a picture of how these components fit and interact together. This diagram shows the type of demo scenario we are looking to recreate over the next few posts.

Each of these components are deployed using yaml files, except for the broker, which is automatically created once the knative injection is enabled within the namespace. You can deploy a custom broker if you wish but I won’t include that in this post.

In this simple example, we use a Kubernetes app deployment as the source and a Knative Service as the consumer, which will subscribe to the events.

The code that I use to stream the events to the broker is available here on GitHub and gives more detailed instructions if you want to build it yourself. It also contains the yaml files used in this tutorial.

Source

Our source is the producer of the events. It could be an application, a web socket, a process etc. It produces events that other services may or may not be interested in subscribing to.

There are a number of different types of sources, each one is a custom resource. The range of sources available can be seen in the Knative documentation here. You can create your own event source if you need to.

The following yaml shows a simple Kubernetes app deployment which is our source:

In the above example, the source is a go application, which streams messages via a web socket connection. It sends them as CloudEvents, which our service will consume. It is available to view here.

Broker and Trigger are CRDs, which will manage the delivery of events and abstract away the details of these from the related services.

Broker

The broker is where events get received. It is like a holding area, from where they can be consumed by those interested. As mentioned above, a default broker is automatically created when you label your namespace with kubectl label namespace my-event-namespace knative-eventing-injection=enabled

Trigger

Our trigger provides a filter, by which it is determines which events should be delivered to a given consumer.

Here below is an example trigger, which just defines that the event-display service subscribes to all events from the default broker.

Under spec, I can also add filter: > attributes: and then include some CloudEvent attributes to filter by. We can then filter the CloudEvent fields, such as type or source etc in order to determine those to which a service subscribes. Here is another example, which filters on a specific event type:

You can also filter on an expression such as:

expression: ce.type == "com.github.pull.create"

Consumer

We can have one or multiple consumers. These are the services that are interested (or not) in the events. If you have a single consumer, you can send straight from the source to consumer.

Here is the Knative Service deployment yaml:

Because I want to send events to a Knative Service, I need to have the cluster-local visibility label (this is explained in more detail below). For the rest, I am using a pre-built image from knative for a simple event display, the code for which can be found here.

Once you have all of these initial components, it looks like this:

Issues

I had some issues with getting the events to the consumer at first when trying this initial demo out. In the end I found out that in order to sink to a Knative Service, you need to add the cluster local gateway to the Istio installation. This is somewhat vaguely mentioned in the docs around installing Istio but could probably have been a bit clearer. Luckily, I found this great post, which helped me massively!

When you install Istio you will need to ensure that you see (at least) the following:

Handy tips and stuff I wish I had known before…

If you want to add a sink URL in your source file, see the example below for a broker sink:

http://default-broker.knative-eventing-websocket-source.svc.cluster.local

default is the name of the broker and knative-eventing-websocket-source is the namespace.

In order to verify events are being sent/received/consumed then you can use the following examples as a reference:

//Getting the logs of the source app
kubectl --namespace knative-eventing-websocket-source logs -l app=wseventsource --tail=100 

//Getting the logs of the broker
kubectl --namespace knative-eventing-websocket-source logs -l eventing.knative.dev/broker=default --tail=100 

//Getting the logs of a Knative service
kubectl logs -l serving.knative.dev/service=event-display -c user-container --since=10m -n knative-eventing-websocket-source

You should see something like this when you get the service logs:

Next steps

Next time we will be looking at building out our service and embellishing it a little so we can transform, visualise and send new events back to a broker.