knative, kubernetes

Step by Step: Deploy and interact with a Knative Service

In this post, I will show how to deploy a Knative service and interact with it through curl and via the browser. I’ll go over some of the useful stuff to know as I found this kind of confusing at first.

I’m running this on a mac using the Kubernetes that’s built in to Docker Desktop, so things will be a bit different if you are running another flavor of Kubernetes. You will need Istio and the Knative serving components installed to follow along with this.

For the service, we are deploying a simple web app example from, which by default prints out “Hi there, I love (word of your choice)”. The code is at the link above, or I have a simple test image on Docker hub, which just prints out “Hi there, I love test” (oh the lack of creativity!)

Deploying a Knative Service

First we need to create a namespace, in which our Knative service will be deployed. For example:

kubectl create namespace web-service

Here is the Knative service deployment, which is a file called service.yaml.

kind: Service
  name: event-display
        - image:

Deploy the service yaml by running the following command:

kubectl apply -f service.yaml -n web-service

Now run the following in order to view the Knative service and some details we will need:

kubectl get ksvc -n web-service

There are a few fields, including:

NAME: The name of the service

URL: The url of the service, which we will need to interact with it. By default the URL will be “<your-service-name>.<namespace>” however you can also have a custom domain.

READY: This should say “True”, if not it will say “False” and there will be a reason in the REASON field.

After a little while, you might notice the service will disappear as it scales down to zero. More on that in a while.


To interact with the service we just deployed, we need to understand a bit about the IngressGateway. By default, Knative uses the istio-ingressgateway as its gateway service. We need to understand this in order to expose our service outside of the local cluster.

We can look at the istio-ingressgateway using the following command:

kubectl get service istio-ingressgateway --namespace istio-system

This will return the following:

Within the gateway configuration, there are a number of ports and NodePorts specified as default including the one we will use to communicate with our service:

port: number: 80, name: http2 protocol: HTTP

To find the port for accessing the service you can run the following:

kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?("http2")].port}'   

You can customise the Gateway configuration. Details and the different ports can be found here in the Istio documentation. I’d also recommend running through the Istio httpbin example to understand a bit more about istio and ingressgateway.

To interact with our service we will need to combine both the URL ( and the EXTERNAL-IP (localhost) which we saw for the istio-ingressgateway. Depending on your set up, these will not be the same as mine.

It will be something like the following:

curl -H "Host:"

Scaling our Service

Your initial pod has probably disappeared right now because when a service is idle, it will scale down to zero after around 90 seconds. You should see the pod start ‘Terminating’ and then disappear.

Knative uses the KPA (Knative Pod Autoscaler), which runs as a Kubernetes deployment. The KPA scales based on requests (concurrency), however it is also possible to use the HPA (Horizontal Pod Autoscaler), which allows scaling based on CPU.

You can find out more detailed information about autoscaling here but for now just note that you can change the parameters in the ConfigMap.

To see the autoscaler config you can run the following command:

kubectl describe configmap config-autoscaler -n knative-serving

To edit the ConfigMap:

kubectl edit configmap config-autoscaler -n knative-serving 

In the result you will see some fields including:

scale-to-zero-grace-period: 30s
stable-window: 60s

The scale-to-zero-grace-period specifies how long it will wait until it scales an inactive service down to zero. The autoscaler takes a 60 second window to assess activity. If it is determined that within that 60 seconds stable-window, there are no events, it will then wait another 30 seconds before scaling to zero. This is why it takes around 90 seconds to terminate an inactive service.

If desired, these can be amended so that your service will scale down faster or slower. There is also a field called enable-scale-to-zero, which (if you want to be able to scale to zero) must be set to “true”.

Test using curl

Once you curl the service again you should see the pod spin up again.

curl -H "Host:"

Should return:

Hi there, I love test!

Access Knative Service through browser

If you are using Docker Desktop on a mac, to access through a browser you could add the host to the hostfile on your mac.

sudo vi /etc/hosts

Add to the file and save it.

Alternatively, if you don’t want to (or can’t) change the host file, I used the “Simple Modify Headers” browser plugin. Then click on the icon once installed and select ‘configure’. Input the parameters as follows and then click the start button.

Now open http://localhost/test and you should see:

azure, data science, machine learning in production

Building a Data Science Environment in Azure: Part 1

For the last few months, I have been looking into how to create a Data Science environment within Azure. There are multiple ways to approach this and it depends on the size and needs of your team. This is just one way in a space where there are many others (e.g using Databricks).

Over the next few months, I will be running a few posts about how to get this kind of environment up and running.

First off, let’s mention some reasons why you might be looking to set up a Data Science sandbox in Azure rather than on premise.

Reasons why:

  • On-prem machines too slow.
  • Inappropriate (or no) tooling on-prem (and not fast enough deployment of relevant tools to local machines).
  • Slow IT process to request increased compute.
  • On-prem machines are under utilised.
  • Different needs per user in the team. One person may be running some heavy calculations, whilst someone else just runs some small weekly reports.
  • Lack of collaboration within company (perhaps cross department or even regional).
  • No clear process for getting models into production.

Once you have clarified the why, you can start to shape the high level requirements of your environment. Key requirements of our Data Science Sandbox could be:

  • Flexibility – Enable both IT to have control but also the data scientists to have choice.
  • Freedom – Enable data scientists by giving them the freedom to work with the tools they feel most confident.
  • Collaboration – Encourage collaborating, sharing of methods and also the ability to re-use and improve models across a business.

You will want to think about who your users are, what tools they are currently using and also what they want to use going forward.

At this point, you might do a little scribble on a piece of paper to define what this might look like in principle. Here is my very simple overview of what we are going to be building over the next few posts. I’ve taken inspiration from a number of Microsoft’s own process diagrams.

Let’s take a look in more detail at the above process.

  1. We have our Data Science sandbox, which is where the model build takes place. The Lab has access to production data but may also need to make API calls or users may want to access their own personal files located in blob etc. This component is composed of a number of labs (via Dev Test Labs). These labs could be split by team/subject area etc.
  2. Once we have a model we would like to move to production, the model is version controlled, containerised and deployed via Kubernetes. This falls under the Data Ops activity.
  3. The model is served in a production environment and we take the inputs and then monitor the performance of our model. For now, I have this as ML Service but you could also use ML Flow or KubeFlow.
  4. This feeds back into the model, which can be retrained if necessary and the process starts again.

The main technology components proposed are:

  • Dev Test Labs
  • Docker
  • Kubernetes
  • ML Service

In the next post, we will start setting up our Data Science environment. We will start by looking at setting up Dev Test Labs in Azure.


How to: Docker Swarm

This tutorial will show you how to get your first Docker swarm up and running. In my example, I am using two Ubuntu machines, one will be the master and one will be the worker.

Install Docker Community Edition on the machines.

Follow the instructions on the website in order to install docker ce for Ubuntu.

Check Docker is installed by running:

docker --version

Install Docker machine:

Docker machine will allow us to install a docker engine on our machines and manage them using docker-machine commands. You can find out more about Docker machine on their website.

curl -L`uname -s`-`uname -m` >/tmp/docker-machine &&
chmod +x /tmp/docker-machine &&
sudo cp /tmp/docker-machine /usr/local/bin/docker-machine

I have installed docker machine on both of the hosts.

On both servers, we need to change the hosts file via command line. You will need to add two lines to the end of the file. The easiest way to do this is using vim text editor, which will allow you to edit the file in command line. We need to add the ip address of each machine and let it know which is the manager and which is the worker.

If you have additional workers then you can add worker02, worker03 etc.

sudo vim /etc/hosts    manager    worker01

We now need to run the following from the manager machine:

sudo docker swarm init --advertise-addr 

We will get a response back, which will be a docker swarm join that we then need to run on the worker machine. It will look something like this (example from dockers own swarm tutorial):

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \

Once you have done this your swarm will be up and running. If you run:

docker info

you will be able to see the details of your swarm, including how many nodes, managers, containers etc.

Here are some useful links on Docker swarm that I used to get mine up and running:

Next time we will look at getting something up and running in docker swarm mode.

docker, kubernetes, machine learning in production

Deploying an ML model in Kubernetes

A while back I started looking into how to deploy and scale Machine Learning models. I was recommended the book Machine Learning Logistics by Ted Dunning and Ellen Friedman and started to look into their proposed method of deployment. So far, I have only got to the containerisation and orchestration, however there is still a whole lot more to do 🙂 

I thought I would offer and easy tutorial to get started if you want to try this out. I’m not going to talk about a production ready solution as this would need a fair bit of refinement. 

There are various options for doing this (feel free to let me know what you might be implementing) and this is just one possible approach. I guess the key is to do what fits best with your workflow process. 

All of the code is on GitHub, so if you want to follow along then head there for a more detailed run through (including all code and commands to run etc). I’m not going to put it all in this post as it would be very long 🙂 

For this project I decided to run everything from of the Azure DSVM (Data Science Virtual Machine). However, you can run it locally from your own machine. I ran it from the following spec machine:

Standard B2ms (2 vcpus, 8 GB memory) (Linux)

You will need:

  • Jupyter Notebooks (already on the DSVM)
  • Docker (already on the DSVM)
  • A Docker hub account
  • An Azure account with AKS

Building the model

I won’t go much into the model code but basically I built a simple deep learning model using Keras and the open source wine dataset. The model was created by following this awesome tutorial from DataCamp!

I followed the tutorial step by step and then saved the model. Keras has it’s own save function, which is recommended over using pickle. You need to save both the model and the scaler because we will need it to normalise the data afterwards in the flask app.

Building a Web app using Flask and Containerising it

If you are using the DSVM then under the ‘Networking’ options we need to add another option for the ‘Inbound Port Rules’. Add port 5000. This is the port where our flask app will run. 

For building the docker container, I used this easy to follow ‘Hello Whale’ tutorial by Codefresh as a reference. 

I built a simple flask app, which predicts red or white wine by using some sliders to allocate values to the attributes available in the dataset. As I mentioned, the code for the app is on GitHub. It’s not the prettiest app, feel free to beautify it 🙂 

You will also need to create a Dockerfile and a requirements.txt file (both in the GitHub repo linked above). The Dockerfile contains the commands needed to build the image and the requirements.txt file contains all of the components that your app needs in order to run. 

You will need to make a folder called flask-app and inside place your file, your Dockerfile and your requirements.txt file. 

Navigate via the cli to the flask-app folder and then run the following command:

docker build -t flask-app:latest .

Now to run the container you need to do:

docker run -d -p 5000:5000 flask-app

If you want to stop a docker container then you can use the command:

docker stop <container_name>

Be sure to use the name of the container and not the image name, otherwise it won’t stop. Docker assigns it’s own weird and wonderful names unless you specify otherwise using the –name attribute.

Upload the image to Docker hub

You will need a Docker account to do this. Log in to docker hub using the following command:

docker login --username username

You will then be prompted to enter your password. Then run the following commands to tag and push the image into the repo.

docker tag <your image id> <your docker hub username>/<repo name>

docker push <your docker hub name>/<repo name>

We now have our image available in the Docker hub repo.

Deploying on Kubernetes

For this part I used Azure’s AKS service. It simplifies the Kubernetes process and (for the spec I had) costs a few pounds a day. It has a dashboard UI that is launched in the browser, which lets you easily see your deployments and from there you can do most of the stuff you can do from the cli. 

I set a low spec cluster for Kubernetes:

Standard B2s (2 vcpus, 4 GB memory) and with only 1 node (you can scale it down from the default 3). 

To deploy from the docker hub image. 

Log in to your AKS cluster with the following command:

az aks get-credentials --resource-group <your resource group> --name <your aks cluster>

Pull the image and create a container:

kubectl run wine-app --image=josiemundi/flask-app:latest --port 5000

If you type:

kubectl get pods

You can see the status of your pod. Pods are the smallest unit in Kubernetes and what Kubernetes groups containers in. In this case our container is alone in its pod. It can take a couple of a minutes for a pod to get up and running. 

Expose the app so we get an external ip address for it:

kubectl expose deployment wine-app --type=LoadBalancer --port 80 --target-port 5000

You can check the status of the external ip by using the command:

kubectl get service

This can also take a couple of minutes. Once you have an external ip you can head on over to it and see your app running! 

To delete your deployment use:

kubectl delete deployment <name of deployment>