Ops … I did it again – MLOps with Kubeflow, MLflow

gears

Machine Learning is one of the hottest area nowadays. New algorithms and models are widely used in commercial solutions thus the whole ML process as a software development and deployment process needs to be optimized.

Kubeflow is an opensource platform which allows to build complete multi-user analytical environment. It is setup on the Kubernetes thus it can be simply installed on a public cloud, on premise Kubernetes cluster or on your workstation.

On the other hand MLFlow is a platform which can be run as standalone application. It doesn’t require Kubernetes thus the setup much more simpler then Kubeflow but it doesn’t support multi-user/multi-team separation.

In this article we will use Kubeflow and MLflow to build the isolated workspace and MLOps pipelines for analytical teams.

Currently we use Kubeflow platform in @BankMillennium to build AI solutions and conduct MLOPS process and this article is inspired by the experience gained while launching and using the platform.

Before you will continue reading please watch short introduction:

AI Platform

The core of the platform will be setup using Kubeflow (version 1.0.1) on Kubernetes (v1.17.0). The Kuberenetes was setup using Rancher RKE which simplifies the installation.

kubeflow main

The Kubeflow gives complete analytical multi-user/multi-teams environment with: authentication (dex), jupyter notebook workspace, pipelines, metadata store, artifact store, models deployment engines (kfserving, seldon).

kubeflow notebooks

Namespace isolation

The user namespaces by default are isolated in Kubeflow UI but in fact are not isolated at all.

The ServiceRoleBinding configuration is very naive and checks only kubeflow-userid header to check RBAC access.

apiVersion: rbac.istio.io/v1alpha1
kind: ServiceRoleBinding
metadata:
  annotations:
    role: admin
    user: admin@kubeflow.org
  namespace: qooba
  ownerReferences:
  - apiVersion: kubeflow.org/v1
    blockOwnerDeletion: true
    controller: true
    kind: Profile
    name: qooba
    uid: 400b5e7b-4b58-40e7-8613-7b0ef01a55ba
spec:
  roleRef:
    kind: ServiceRole
    name: ns-access-istio
  subjects:
  - properties:
      request.headers[kubeflow-userid]: admin@kubeflow.org

Thus we can simply access other namespace notebook from notebooks in different namespace setting kubeflow-userid header:

import requests
url='http://{{notebook_name}}.{{namespace}}.svc.cluster.local'

headers={    
        'kubeflow-userid': "admin@kubeflow.org"
}

requests.get(url,headers=headers).text

To fix this we can setup appropriate Kubernetes NetworkPolicies eg.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-default
  namespace: {{namespace}}
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchExpressions:
          - {key: namespace, operator: In, values: [{{namespace}}, kubeflow, istio-system, kube-system]}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-egress-all
  namespace: {{namespace}}
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-dns
  namespace: {{namespace}}
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          namespace: kube-system
    ports:
    - protocol: UDP
      port: 53
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-istio
  namespace: {{namespace}}
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          namespace: istio-system
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-kubeflow
  namespace: {{namespace}}
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          namespace: kubeflow
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-internal
  namespace: {{namespace}}
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          namespace: {{namespace}}

Isolated model registry

By default Kubeflow is equipped with metadata and artifact store shared between namespaces which makes it difficult to secure and organize spaces for teams. To fix this we will setup separate MLflow Tracking Server and Model Registry for each team namespace.

MLflow docker image qooba/mlflow:

FROM continuumio/miniconda3
RUN apt update && apt install python3-mysqldb default-libmysqlclient-dev  -yq
RUN pip install mlflow sklearn jupyterlab watchdog[watchmedo] boto3
RUN conda install pymysql
ENV NB_PREFIX /
CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]

mlflow.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mlflow-pv-claim
  namespace: qooba
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: managed-nfs-storage
  volumeMode: Filesystem
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mlflow
  namespace: qooba
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow
  namespace: qooba
  labels:
    app: mlflow
spec:
  ports:
  - name: http
    port: 5000
    targetPort: 5000
  selector:
    app: mlflow
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow
  namespace: qooba
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow
      version: v1
  template:
    metadata:
      labels:
        app: mlflow
        version: v1
    spec:
      serviceAccountName: mlflow
      containers:
      - image: qooba/mlflow
        imagePullPolicy: IfNotPresent
        name: mlflow
        command: ["mlflow","server","-h","0.0.0.0","--backend-store-uri","sqlite:///mlflow/mlflow.db","--default-artifact-root","/mlflow/mlruns"]]
        ports:
        - containerPort: 5000
        volumeMounts:
          - mountPath: /mlflow
            name: mlflow
          - mountPath: /dev/shm
            name: dshm
      volumes:
        - name: mlflow
          persistentVolumeClaim:
            claimName: mlflow-pv-claim
        - emptyDir:
            medium: Memory
          name: dshm
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mlflow
  namespace: qooba
spec:
  hosts:
  - "*"
  gateways:
  - qooba/mlflow-gateway
  http:
  - match:
    - uri:
        prefix: /
    rewrite:
        uri: /
    route:
    - destination:
        port:
          number: 5000
        host: mlflow
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: mlflow-gateway
  namespace: qooba
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 5000
      protocol: HTTP
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: mlflow-filter
  namespace: istio-system
spec:
  filters:
  - filterConfig:
      httpService:
        authorizationRequest:
          allowedHeaders:
            patterns:
            - exact: cookie
            - exact: X-Auth-Token
        authorizationResponse:
          allowedUpstreamHeaders:
            patterns:
            - exact: kubeflow-userid
        serverUri:
          cluster: outbound|8080||authservice.istio-system.svc.cluster.local
          failureModeAllow: false
          timeout: 10s
          uri: http://authservice.istio-system.svc.cluster.local
      statusOnError:
        code: GatewayTimeout
    filterName: envoy.ext_authz
    filterType: HTTP
    insertPosition:
      index: FIRST
    listenerMatch:
      listenerProtocol: HTTP
      listenerType: GATEWAY
      portNumber: 5000
  workloadLabels:
    istio: ingressgateway
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: dex-mlflow
  namespace: auth
spec:
  gateways:
  - qooba/mlflow-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /dex/
    route:
    - destination:
        host: dex.auth.svc.cluster.local
        port:
          number: 5556

additionally we have to edit istio gateway and add mlflow to access the mlflow UI:

kubectl edit svc istio-ingressgateway -n istio-system

and add:

spec:
  ports:
  ...
  - name: mlflow
    nodePort: 31382
    port: 5000
    protocol: TCP
    targetPort: 5000

The MLflow repository can be accessed from web browser:
mlflow repository

Additionally we have to mount PersistentVolume mlflow-pv-claim to user notebook where we will store the training artifacts:

kubectl edit Notebook -n qooba sklearn
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  labels:
    app: sklearn
  name: sklearn
  namespace: qooba
spec:
  template:
    spec:
      containers:
      - env: []
        image: qooba/mlflow
        name: sklearn
        resources:
          requests:
            cpu: "0.5"
            memory: 1.0Gi
        volumeMounts:
        - mountPath: /home/jovyan
          name: workspace-sklearn
        - mountPath: /mlflow
          name: mlflow
        - mountPath: /dev/shm
          name: dshm
      serviceAccountName: default-editor
      ttlSecondsAfterFinished: 300
      volumes:
      - name: workspace-sklearn
        persistentVolumeClaim:
          claimName: workspace-sklearn
      - name: mlflow
        persistentVolumeClaim:
          claimName: mlflow-pv-claim
      - emptyDir:
          medium: Memory
        name: dshm  

Now analysts can log models and metrics from jupyter notebook workspace
(code example from https://www.mlflow.org/docs/latest/tutorials-and-examples/tutorial.html):

import os
import warnings
import sys
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn

import logging

remote_server_uri='http://mlflow:5000'
mlflow.set_tracking_uri(remote_server_uri)

mlflow.set_experiment("/my-experiment2")


logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the wine-quality csv file from the URL
csv_url = (
    "./winequality-red.csv"
)
try:
    data = pd.read_csv(csv_url, sep=";")
except Exception as e:
    logger.exception(
        "Unable to download training & test CSV, check your internet connection. Error: %s", e
    )

train, test = train_test_split(data)

train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

alpha = 0.5
l1_ratio = 0.5


with mlflow.start_run():
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

    if tracking_url_type_store != "file":
        mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel2")
    else:
        mlflow.sklearn.log_model(lr, "model")

I definitely recommend to use git versioned MLflow projects instead of running code directly from jupyter because
MLflow model registry will keep the git commit hash used for the run which will help to reproduce the results.

MLOps

mlops diagram

Now I’d like to propose the process of building and deploying ML models.

Training

As described before the model is prepared and trained by the analyst which works in the Jupyter workspace and logs metrics and model to the MLflow tracking and model registry.

MLflow UI

Senior Analyst (currently the MLflow doesn’t support roles assignment) checks the model metrics and decides to promote it to Staging/Production stage in MLflow UI.

Model promotion

We will create additional application which will track the changes in the MLflow registry and initialize the deployment process.

The on each MLflow registry change the python application will check the database, prepare and commit k8s deployments and upload models artifacts to minio.

Because the applications commits the deployments to git repository we need to generate ssh keys:

ssh-keygen

and store them as a secrets:

kubectl create secret generic ssh-key-secret --from-file=id_rsa=./id_rsa --from-file=id_rsa.pub=./id_rsa.pub -n qooba

Now we can deploy the application:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflowwatch
  namespace: qooba
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflowwatch
      version: v1
  template:
    metadata:
      labels:
        app: mlflowwatch
        version: v1
    spec:
      containers:
      - image: qooba/mlflow:watchdog
        imagePullPolicy: IfNotPresent
        name: mlflowwatch
        command: ["/mlflow/start-watch.sh"]
        env:
        - name: GIT_REPO_URL
          value: ...
        - name: GIT_REPO_IP
          value: ...
        - name: BUCKET_NAME
          value: qooba
        - name: AWS_ACCESS_KEY_ID
          value: minio
        - name: AWS_SECRET_ACCESS_KEY
          value: minio123
        - name: MLFLOW_S3_ENDPOINT_URL
          value: http://minio.qooba.svc.cluster.local:9000
        ports:
        - containerPort: 5000
        volumeMounts:
          - mountPath: /mlflow
            name: mlflow
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /etc/ssh-key
            name: ssh-key-secret-volume
            readOnly: true
      volumes:
        - name: mlflow
          persistentVolumeClaim:
            claimName: mlflow
        - emptyDir:
            medium: Memory
          name: dshm
        - name: ssh-key-secret-volume
          secret:
            defaultMode: 256
            secretName: ssh-key-secret

start-watch.sh:

#!/bin/bash
watchmedo shell-command --patterns='*.db' --recursive --wait --command='/mlflow/watch.sh' /mlflow

watch.sh

#!/bin/bash
cd /mlflow

if [ ! -d "/root/.ssh" ] 
then
  cp -r /etc/ssh-key /root/.ssh
  chmod -R 700 /root/.ssh

  ssh-keygen -R $GIT_REPO_URL
  ssh-keygen -R $GIT_REPO_IP
  ssh-keygen -R $GIT_REPO_URL,$GIT_REPO_IP
  ssh-keyscan -H $GIT_REPO_URL,$GIT_REPO_IP >> ~/.ssh/known_hosts
  ssh-keyscan -H $GIT_REPO_IP >> ~/.ssh/known_hosts
  ssh-keyscan -H $GIT_REPO_URL >> ~/.ssh/known_hosts

  git config --global user.name "mlflowwatch"
  git config --global user.email "mlflowwatch@qooba.net"
  git branch --set-upstream-to=origin/master master

fi

python3 /mlflow/watch.py

git add .
git commit -a -m "mlflow autocommit"
git push origin master

watch.py:

import os
import jinja2
import sqlite3
from collections import defaultdict
import boto3
import botocore

class Watcher:

    def __init__(self):
        self._model_deployment=ModelDeployment()
        self._model_registry=ModelRegistry()
        self._model_store=ModelStore()


    def process(self):
        model_groups = self._model_registry.models_info()
        for model_name, models_data in model_groups.items():
            print(f'{model_name}:')
            for model_data in models_data:
                print(f"- stage: {model_data['stage']}")
                print(f"  path: {model_data['path']}")
                self._model_deployment.generate_deployment(model_name, model_data)
                self._model_store.upload_model(model_data)


class ModelDeployment:

    def __init__(self):
        self._create_dir('deployments')
        self._template=self._prepare_template()

    def generate_deployment(self, model_name, model_data):
        self._create_dir(f'deployments/{model_name}')
        stage = model_data['stage'].lower()
        path = model_data['path'].replace('/mlflow/mlruns','s3://qooba/mlflow')
        self._create_dir(f'deployments/{model_name}/{stage}')
        outputText = self._template.render(model=path)
        with open(f'deployments/{model_name}/{stage}/deployment.yaml','w') as f:
            f.write(outputText)

    def _create_dir(self, directory):    
        if not os.path.exists(directory):
            os.makedirs(directory)

    def _prepare_template(self):
        templateLoader = jinja2.FileSystemLoader(searchpath="./")
        templateEnv = jinja2.Environment(loader=templateLoader)
        return templateEnv.get_template("deployment.yaml")

class ModelRegistry:

    def __init__(self):
        self._conn = sqlite3.connect('/mlflow/mlflow.db')

    def models_info(self):
        models=self._conn.execute("SELECT distinct name, version, current_stage, source FROM model_versions where current_stage in ('Staging','Production') order by version desc;").fetchall()
        res=defaultdict(list)

        for s in models:
            res[s[0].lower()].append({"tag": str(s[1]), "stage": s[2], "path": s[3]})

        return dict(res)


class ModelStore:

    def __init__(self):
        self._bucket_name=os.environ['BUCKET_NAME']
        self._s3=self._create_s3_client()
        self._create_bucket(self._bucket_name)

    def upload_model(self, model_data):  
        path = model_data['path']
        s3_path = path.replace('/mlflow/mlruns','mlflow')
        try:
            self._s3.head_object(Bucket=self._bucket_name, Key=f'{s3_path}/MLmodel')
        except botocore.errorfactory.ClientError as e:
            files = [(f'{path}/{f}',f'{s3_path}/{f}') for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))]
            for file in files:
                self._s3.upload_file(file[0], self._bucket_name, file[1])

    def _create_s3_client(self):
        return boto3.client('s3',
                  aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
                  aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
                  endpoint_url=os.environ["MLFLOW_S3_ENDPOINT_URL"])

    def _create_bucket(self, bucket_name):
        try:
            self._s3.head_bucket(Bucket=bucket_name)
        except botocore.client.ClientError as e:
            self._s3.create_bucket(Bucket=bucket_name)


if __name__ == "__main__":
    Watcher().process()

The model deployments will be prepared using the template:
deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-t1
  namespace: qooba
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow-t1
      version: v1
  template:
    metadata:
      labels:
        app: mlflow-t1
        version: v1
    spec:
      containers:
      - image: qooba/mlflow:serving
        imagePullPolicy: IfNotPresent
        name: mlflow-t1
        env:
        - name: AWS_ACCESS_KEY_ID
          value: minio
        - name: AWS_SECRET_ACCESS_KEY
          value: minio123
        - name: MLFLOW_S3_ENDPOINT_URL
          value: http://minio.qooba.svc.cluster.local:9000
        - name: MODEL
          value: {{model}}
        ports:
        - containerPort: 5000
        volumeMounts:
          - mountPath: /dev/shm
            name: dshm
      volumes:
        - emptyDir:
            medium: Memory
          name: dshm

If the model is promoted to the Staging/Production the process prepares the deployment yaml and uploads model to S3 store.

We will use minio as a S3 model store.

minio.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: minio
  namespace: qooba
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pv-claim
  namespace: qooba
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: managed-nfs-storage
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: minio
    namespace: qooba
  name: minio
  namespace: qooba
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: minio
      namespace: qooba
  template:
    metadata:
      labels:
        app: minio
        namespace: qooba
    spec:
      serviceAccountName: minio
      containers:
      - args:
        - server
        - /data
        env:
        - name: MINIO_ACCESS_KEY
          value: minio
        - name: MINIO_SECRET_KEY
          value: minio123
        image: minio/minio:RELEASE.2018-02-09T22-40-05Z
        imagePullPolicy: IfNotPresent
        name: minio
        ports:
        - containerPort: 9000
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: data
          subPath: minio
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: minio-pv-claim
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: minio
    namespace: qooba
  name: minio
  namespace: qooba
spec:
  ports:
  - port: 9000
    protocol: TCP
    targetPort: 9000
  selector:
    app: minio
    namespace: qooba

ArgoCD

No it is time to setup ArgoCD which will sync the Git deployments changes with Kubernetes configuration and automatically deploy newly promoted models.

argocd

To deploy MLflow models we will use docker image

qooba/mlflow:serving

FROM continuumio/miniconda3
RUN pip install mlflow==1.11.0 cloudpickle==1.6.0 scikit-learn==0.23.2 gevent boto3
ENV GUNICORN_CMD_ARGS="--timeout 60 -k gevent"
WORKDIR /opt/mlflow
ENV PORT=5000
ENV WORKER_NUMBER=4
CMD mlflow models serve -m $MODEL -h 0.0.0.0 -p $PORT -w $WORKER_NUMBER --no-conda

and configuration:
mlflow.serving.yaml:

apiVersion: v1
kind: Service
metadata:
  name: mlflow-t1
  namespace: qooba
  labels:
    app: mlflow-t1
spec:
  ports:
  - name: http
    port: 5000
    targetPort: 5000
  selector:
    app: mlflow-t1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-t1
  namespace: qooba
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow-t1
      version: v1
  template:
    metadata:
      labels:
        app: mlflow-t1
        version: v1
    spec:
      containers:
      - image: qooba/mlflow:serving
        imagePullPolicy: IfNotPresent
        name: mlflow-t1
        env:
        - name: AWS_ACCESS_KEY_ID
          value: minio
        - name: AWS_SECRET_ACCESS_KEY
          value: minio123
        - name: MLFLOW_S3_ENDPOINT_URL
          value: http://minio.qooba.svc.cluster.local:9000
        - name: MODEL
          value: s3://qooba/mlflow/1/e0167f65abf4429b8c58f56b547fe514/artifacts/model
        ports:
        - containerPort: 5000
        volumeMounts:
          - mountPath: /dev/shm
            name: dshm
      volumes:
        - emptyDir:
            medium: Memory
          name: dshm
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mlflow-t1
  namespace: qooba
spec:
  hosts:
  - "*"
  gateways:
  - qooba/mlflow-serving-gateway
  http:
  - match:
    - uri:
        prefix: /serving/qooba/t1
    rewrite:
        uri: /
    route:
    - destination:
        port:
          number: 5000
        host: mlflow-t1
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: mlflow-serving-gateway
  namespace: qooba
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 5000
      protocol: HTTP

Each time new model is promoted the ArgoCD applies new deployment with the new model s3 path:

- name: MODEL
  value: s3://qooba/mlflow/1/e0167f65abf4429b8c58f56b547fe514/artifacts/model

Inference services

Finally we can access model externally and generate predictions. Please note that in article the model is deployed in the same k8s namespace (in real solution model will be deployed on the separate k8s cluster) thus to access the model I have to send authservice_session otherwise request will redirected to the dex login page.

import json
import requests
import getpass

authservice_session = getpass.getpass()

headers={
    'Cookie': f'authservice_session={authservice_session}',
    'Content-Type': 'application/json'
}

data={
    "columns": ["fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide","total sulfur dioxide","density","pH","sulphates","alcohol"],
    "data": [[7.4,0.7,0,1.9,0.076,11,34,0.9978,3.51,0.56,9.4],
[7.8,0.88,0,2.6,0.098,25,67,0.9968,3.2,0.68,9.8]]
}

url='http://qooba-ai:31382/serving/qooba/t1/invocations'
requests.post(url, headers=headers,data=json.dumps(data)).text

# Response: [5.576883967129615, 5.50664776916154]

AI Scissors – sharp cut with neural networks

scissors

Cutting photos background is one of the most tedious graphical task. In this article will show how to simplify it using neural networks.

I will use U^2-Net networks which are described in detail in the arxiv article and python library rembg to create ready to use drag and drop web application which you can use running docker image.

The project code is available on my github https://github.com/qooba/aiscissors
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiscissors

Before you will continue reading please watch quick introduction:

Neural network

To correctly remove the image background we need to select the most visually attractive objects in an image which is covered by Salient Object Detection (SOD). To connect a low memory and computation cost with competitive results against state of art methods the novel U^2-Net architecture will be used.

U-Net convolutional networks have characteristic U shape with symmetric encoder-decoder structure. At each encoding stage the feature maps are downsampled (torch.nn.MaxPool2d) and then upsampled at each decoding
stage (torch.nn.functional.upsample). Downsample features are transferred and concatenated with upsample features using residual connections.

U^2-Net network uses two-level nested U-structure where the main architecture is a U-Net like encoder-decoder and each stage contains residual U-block. Each residual U-block repeats donwsampling/upsampling procedures which are also connected using residual connections.

neural network architecture

Nested U-structure extracts and aggregates the features at each level and enables to capture local and global information from shallow and deep layers.

The U^2-Net architecture is precisely described in arxiv article. Moreover we can go through the pytorch model definition of U2NET and U2NETP.

Additionally the authors also shared the pretrained models: U2NET (176.3MB) and U2NETP (4.7 MB).

The lighter U2NETP version is only 4.7 MB thus it can be used in mobile applications.

Web application

The neural network is wrapped with rembg library which automatically download pretrained networks and gives simple python api. To simplify the usage I have decided to create drag and drop web application (https://github.com/qooba/aiscissors)

In the application you can drag and the drop the image and then compare image with and without background side by side.

web application

You can simply run the application using docker image:

docker run --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

if you have GPU card you can use it:

docker run --gpus all  --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

When you run the container the pretrained models are downloaded thus I have mount local directory u2net_models to /root/.u2net to avoid download each time I run the container.

References

https://arxiv.org/pdf/2005.09007.pdf

https://github.com/NathanUA/U-2-Net

https://github.com/danielgatis/rembg

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin Pattern Recognition 106 107404 (2020)

“Hey Google” with Rasa – complete CI/CD solution for multilingual chatbots

old phone

In this article I will show how to build the complete CI/CD solution for building, training and deploying multilingual chatbots.
I will use Rasa core framework, Gitlab pipelines, Minio and Redis to build simple two language google assistant.

The project code is available on my github https://github.com/qooba/heygoogle-with-rasa

Before you will continue reading please watch quick introduction:

Architecture

architecture diagram

The solution contains several components thus I will describe each of them.

Google actions

To build google assistant we need to create and configure the google action project.
google actions create

We will build our own nlu engine thus we will start with the blank project.
Then we need to install gactions CLI to manage project from command line.
To access your projects you need to authenticate using command:

gactions login

if you want you can create the project using templates:

gactions init

to simplify the tutorial I have included configuration in the repository. You will need to set your project id in settings.yaml and webhook configuration using your ngrok address.
Configuration can be deployed using command:

gactions push

Ngrok

As mentioned before for development purposes I have used the ngrok to proxy the traffic from public endpoint (used for webhook destination) to localhost:8081:

ngrok http 8081

NGINX with LuaJIT

Currently in google action project is not possible to set different webhook addresses for different languages thus I have used NGINX and LuaJIT to route the traffic to proper language container.
The information about language context is included in the request body which can be handled using Lua script:

server {
        listen 80;
        resolver 127.0.0.11 ipv6=off;
        location / {
            set $target '';
            access_by_lua '
                local cjson = require("cjson")
                ngx.req.read_body()
                local text = ngx.var.request_body
                local value = cjson.new().decode(text)
                local lang = string.sub(value["user"]["locale"],1,2)
                ngx.var.target = "http://heygoogle-" .. lang
            ';
            proxy_pass $target;
        }
    }

Rasa application

The rasa core is one of the famous framework for building chatbots. I have decided to create separate docker container for each language which gives flexibility in terms of scalability and deployment.
Dockerfile (development version with watchdog) for rasa application (qooba/rasa:1.10.10_app):

FROM rasa/rasa:1.10.10
USER root
RUN pip3 install python-jose watchdog[watchmedo]
ENTRYPOINT watchmedo auto-restart -d . -p '*.py' --recursive -- python3 app.py

Using default rasa engine you have to restart the container when you want to deploy new retrained model thus I have decided to wrap it with simple python application which additionally listen the redis PubSub topic and waits for event which automatically reloads the model without restarting the whole application. Additionally there are separate topics for different languages thus we can simply deploy and reload model for specific language.

Redis

In this solution the redis has two responsibilities:
* EventBus – as mentioned above chatbot app listen events sent from GitLab pipeline worker.
* Session Store – which keeps the conversations state thus we can simply scale the chatbots

We can simply run Redis using command:

docker run --name redis -d --rm --network gitlab redis

Minio

Minio is used as a Rasa Model Store (Rasa supports the S3 protocol). The GitLab pipeline worker after model training uploads the model package to Minio. Each language has separate bucket:

model store

To run minio we will use command (for whole solution setup use run.sh where environment variables are set) :

docker run -d --rm -p 9000:9000 --network gitlab --name minio \
  -e "MINIO_ACCESS_KEY=$MINIO_ACCESS_KEY" \
  -e "MINIO_SECRET_KEY=$MINIO_SECRET_KEY" \
  -v $(pwd)/minio/data:/data \
  minio/minio server /data

Gitlab pipelines

In this solution I have used the gitlab as a git repository and CI/CD engine.
You can simply run the GitLab locally using gitlab docker image:

docker run -d --rm -p 80:80 -p 8022:22 -p 443:443 --name gitlab --network gitlab \
  --hostname gitlab \
  -v $(pwd)/gitlab/config:/etc/gitlab:Z \
  -v $(pwd)/gitlab/logs:/var/log/gitlab:Z \
  -v $(pwd)/gitlab/data:/var/opt/gitlab:Z \
  gitlab/gitlab-ce:latest

Notice that I have used gitlab hostname (without this pipelines does not work correctly on localhost) thus you will need to add appropriate entry to /etc/hosts:

127.0.1.1   gitlab

Now you can create new project (in my case I called it heygoogle).
Likely you already use 22 port thus for ssh I used 8022.
You can clone the project using command (remember to setup ssh keys):

git clone ssh://git@localhost:8022/root/heygoogle.git

Before you can use the gitlab runner you have to configure at least one worker.
First you get registration token (Settings -> CI/CD -> Runners):

gitlab runners

and run once:

docker run --rm --network gitlab -v /srv/gitlab-runner/config:/etc/gitlab-runner gitlab/gitlab-runner register \
  --non-interactive \
  --docker-network-mode gitlab \
  --executor "docker" \
  --docker-image ubuntu:latest \
  --url "http://gitlab/" \
  --registration-token "TWJABbyzkVWVAbJc9bSx" \
  --description "docker-runner" \
  --tag-list "docker,aws" \
  --run-untagged="true" \
  --locked="false" \
  --access-level="not_protecte

Now you can run the gitlab-runner container:

docker run -d --rm --name gitlab-runner --network gitlab \
     -v /srv/gitlab-runner/config:/etc/gitlab-runner \
     -v /var/run/docker.sock:/var/run/docker.sock \
     gitlab/gitlab-runner:latest

To create pipeline you simply commit the .gitlab-ci.yml into your repository.
In our case it contains two steps (one for each language):

variables:
  MINIO_ACCESS_KEY: AKIAIOSFODNN7EXAMPLE
  MINIO_SECRET_KEY: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  PROJECT_NAME: heygoogle

stages:
  - process_en
  - process_pl

step-1:
  image: qooba/rasa:1.10.10
  stage: process_en
  script:
    - ./pipeline.sh en
  interruptible: true

step-2:
  image: qooba/rasa:1.10.10
  stage: process_pl
  script:
    - ./pipeline.sh pl
  interruptible: true

Gitlab pipeline steps use qooba/rasa:1.10.10 docker image:

FROM rasa/rasa:1.10.10
USER root
RUN apt update && apt install git -yq
ENTRYPOINT /bin/bash

thus they have complete rasa environment.

The pipeline.sh script:

#!/bin/bash

lang=$1
echo "Processing $lang"

if (($(git diff-tree --no-commit-id --name-only -r $CI_COMMIT_SHA | grep ^$lang/ | wc -l) > 0)); then
   echo "Training $lang"
   cd $lang
   rasa train
   rasa test
   cd ..
   python3 pipeline.py --language $lang
else
   echo

checks if something have changed in chosen language directory, trains and tests the model
and finally uploads trained model to Minio and publish event to Redis using
pipeline.py:

import os
import boto3
import redis
from botocore.client import Config

def upload_model(project_name: str, language: str, model: str):
    s3=boto3.resource("s3",endpoint_url="http://minio:9000",
        aws_access_key_id=os.environ["MINIO_ACCESS_KEY"],
        aws_secret_access_key=os.environ["MINIO_SECRET_KEY"],
        config=Config(signature_version="s3v4"),region_name="us-east-1")
    bucket_name=f'{project_name}-{language}'
    print(f"BUCKET NAME: {bucket_name}") 
    bucket_exists=s3.Bucket(bucket_name) in s3.buckets.all() or s3.create_bucket(Bucket=bucket_name)
    s3.Bucket(bucket_name).upload_file(f"/builds/root/{project_name}/{language}/models/{model}",model)


def publish_event(project_name: str, language: str, model: str):
    topic_name=f'{project_name}-{language}'
    print(f"TOPIC NAME: {topic_name}") 
    client=redis.Redis(host="redis", port=6379, db=0);
    client.publish(topic_name, model)

if __name__ == '__main__':
    import argparse

    project_name=os.environ["PROJECT_NAME"]

    parser = argparse.ArgumentParser(description='Create a ArcHydro schema')
    parser.add_argument('--language', metavar='path', required=True,
                        help='the model language')

    args = parser.parse_args()
    model=os.listdir(f"/builds/root/{project_name}/{args.language}/models/")[0]
    print("Uploading model")
    upload_model(project_name=project_name, language=args.language, model=model)

    print("Publishing event")
    publish_event(project_name=project_name, language=args.language, model=model)

Now after each change in the repository the gitlab starts the pipeline run:
gitlab pipeline

gitlab step

Summary

We have built complete solution for creating, training, testing and deploying the chatbots.
Additionally the solution supports multi language chatbots keeping scalability and deployment flexibility.
Moreover trained models can be continuously deployed without chatbot downtime (for Kubernetes environments
the Canary Deployment could be another solution).
Finally we have integrated solution with the google actions and created simple chatbot.

DeepMicroscopy – my portable ML laboratory

DIV

Today I’m very happy to finally release my open source project DeepMicroscopy.
In this project I have created the platform where you can capture the images from the microscope, annotate, train the Tensorflow model and finally observe real time object detection.
The project is configured on the Jetson Nano device thus it can work with compact and portable solutions.

The project code is available on my github https://github.com/qooba/deepmicroscopy

Before you will continue reading please watch quick introduction:

1. Architecture

The solution requires three devices:
* Microscope with usb camera – e.g. Velleman CAMCOLMS3 2Mpx
* Inference server – Jetson Nano
* Training server – PC equipped with GPU card e.g. NVIDIA GTX 1050 Ti

The whole solution was built using docker images thus now I will describe components installed on each device.

Jetson

The Jetson device contains three components:
* Frontend – Vue application running on Nginx
* Backend – Python application which is the core of the solution
* Storage – Minio storage where projects, images and annotations are stored

Training Server

The training server contains two components:
* Frontend – Vue application running on Nginx
* Backend – Python application which handles the training logic

2. Platform functionalities

The most of platform’s functionality is installed on the Jetson Nano. Because the Jetson Nano compute capabilities are insufficient for model training purposes I have decided to split this part into three stages which I will describe in the training paragraph.

Projects management

In the Deep Microscopy you can create multiple projects where you annotate and recognize different objects.

You can create and switch projects in the top left menu. Each project data is kept in the separate bucket in the minio storage.

Images Capture

When you open the Capture panel in the web application and click Play ▶ button the WebRTC socket between browser and backend is created (I have used the aiortc python library). To make it working in the Chrome browser we need two things:
* use TLS for web application – the self signed certificate is already configured in the nginx
* allow Camera to be used for the application – you have to set it in the browser

Now we can stream the image from camera to the browser (I have used OpenCV library to fetch the image from microscope through usb).

When we decide to capture specific frame and click Plus ✚ button the backend saves the current frame into project bucket of minio storage.

Annotation

The annotation engine is based on the Via Image Annotator. Here you can see all images you have captured for specific project. There are a lot of features eg. switching between images (left/right arrow), zoom in/out (+/-) and of course annotation tools with different shapes (currently the training algorithm expects the rectangles) and attributes (by default the class attribute is added which is also expected by the training algorithm).

This is rather painstaking and manual task thus when you will finish remember to save the annotations by clicking save button (currently there is no auto save). When you save the project the project file (with the via schema) is saved in the project bucket.

Training

When we finish image annotation we can start model training. As mentioned before it is split into three stages.

Data package

At the beginning we have to prepare data package (which contains captured images and our annotations) by clicking the DATA button.

Training server

Then we drag and drop the data package to the application placed on machine with higher compute capabilities.

After upload the training server automatically extracts the data package, splits into train/test data and starts training.
Currently I have used the MobileNet V2 model architecture and I base on the pretrained tensorflow model.

When the training is finished the model is exported using TensorRT which optimizes the model inference performance especially on NVIDIA devices like Jetson Nano.

During and after training you can inspect all models using builtin tensorboard.

The web application periodically check training state and when the training is finished we can download the model.

Uploading model

Finally we upload the TensorRT model back to the Jetson Nano device. The model is saved into selected project bucket thus you can use multiple models for each project.

Object detection

On the Execute panel we can choose model from the drop down list (where we have list of models uploaded for selected project) and load the model clicking RUN (typically it take same time to load the model). When we click Play ▶ button the application shows real time object detection. If we want to change the model we can click CLEAR and then choose and RUN another model.

Additionally we can fetch additional detection statistics which are sent using Web Socket. Currently the number of detected items and average width, height, score are returned.

3. Setup

To start working with the Jetson Nano we have to install Jetson Nano Developer Kit.

The whole platform is working with Docker and all Dockerfiles are included in the GitHub repository

Because Jetson Nano has aarch64 / arm64 architecture thus we need separate images for Jetson components.

Jetson dockers:
* front – frontend web app
* app – backend web app
* minio – minio storage for aarch64 / arm64 architecture

Training Server dockers:
* serverfront – frontend app
* server – backend app

If you want you can build the images by yourself or you can use built images from DockerHub.

The simplest option is to run run.app.sh on Jetson Nano and run.server.sh on Training Server which will setup the whole platform.

Thanks for reading 🙂

Online IDE ? – do it yourself

DIV

Jupyter Notebook is one of the most useful tool for data exploration, machine learning and fast prototyping. There are many plugins and projects which make it even more powerful:
* jupyterlab-git
* nbdev
* jupyter debugger

But sometimes you simply need IDE …

One of my favorite text editor is vim. It is lightweight, fast and with appropriate plugins it can be used as a IDE.
Using Dockerfile you can build jupyter environment with fully equipped vim:

FROM continuumio/miniconda3
RUN apt update && apt install curl git cmake ack g++ python3-dev vim-youcompleteme tmux -yq
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/qooba/vim-python-ide/master/setup.sh)"
RUN conda install xeus-python jupyterlab jupyterlab-git -c conda-forge
RUN jupyter labextension install @jupyterlab/debugger @jupyterlab/git
RUN pip install nbdev
RUN echo "alias ls='ls --color=auto'" >> /root/.bashrc
CMD bin/bash

Now you can run the image:

docker run --name jupyter -d --rm -p 8888:8888 -v $(pwd)/jupyter:/root/.jupyter -v $(pwd)/notebooks:/opt/notebooks qooba/miniconda3 /bin/bash -c "jupyter lab --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root --NotebookApp.password='' --NotebookApp.token=''"

In the jupyter lab start terminal session, run bash (it works better in bash) and then vim.
The online IDE is ready:

DIV

References

[1] Top image Boskampi from Pixabay

FastAI with TensorRT on Jetson Nano

DIV

IoT and AI are the hottest topics nowadays which can meet on Jetson Nano device.
In this article I’d like to show how to use FastAI library, which is built on the top of the PyTorch on Jetson Nano. Additionally I will show how to optimize the FastAI model for the usage with TensorRT.

You can find the code on https://github.com/qooba/fastai-tensorrt-jetson.git.

1. Training

Although the Jetson Nano is equipped with the GPU it should be used as a inference device rather than for training purposes. Thus I will use another PC with the GTX 1050 Ti for the training.

Docker gives flexibility when you want to try different libraries thus I will use the image which contains the complete environment.

Training environment Dockerfile:

FROM nvcr.io/nvidia/tensorrt:20.01-py3
WORKDIR /
RUN apt-get update && apt-get -yq install python3-pil
RUN pip3 install jupyterlab torch torchvision
RUN pip3 install fastai
RUN DEBIAN_FRONTEND=noninteractive && apt update && apt install curl git cmake ack g++ tmux -yq
RUN pip3 install ipywidgets && jupyter nbextension enable --py widgetsnbextension
CMD ["sh","-c", "jupyter lab --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root --NotebookApp.password='' --NotebookApp.token=''"]

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

If you don’t want to build your image simply run:

docker run --gpus all  --name jupyter -d --rm -p 8888:8888 -v $(pwd)/docker/gpu/notebooks:/opt/notebooks qooba/fastai:1.0.60-gpu

Now you can use pets.ipynb notebook (the code is taken from lesson 1 FastAI course) to train and export pets classification model.

from fastai.vision import *
from fastai.metrics import error_rate

# download dataset
path = untar_data(URLs.PETS)
path_anno = path/'annotations'
path_img = path/'images'
fnames = get_image_files(path_img)

# prepare data 
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'
bs = 16
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

# prepare model learner
learn = cnn_learner(data, models.resnet34, metrics=error_rate)

# train 
learn.fit_one_cycle(4)

# export
learn.export('/opt/notebooks/export.pkl')

Finally you get pickled pets model (export.pkl).

2. Inference (Jetson Nano)

The Jetson Nano device with Jetson Nano Developer Kit already comes with the docker thus I will use it to setup the inference environment.

I have used the base image nvcr.io/nvidia/l4t-base:r32.2.1 and installed the pytorch and torchvision.
If you have JetPack 4.4 Developer Preview you can skip this steps and start with the base image nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.5-py3.

The FastAI installation on Jetson is more problematic because of the blis package. Finally I have found the solution here.

Additionally I have installed torch2trt package which converts PyTorch model to TensorRT.

Finally I have used the tensorrt from the JetPack which can be found in
/usr/lib/python3.6/dist-packages/tensorrt .

The final Dockerfile is:

FROM nvcr.io/nvidia/l4t-base:r32.2.1
WORKDIR /
# install pytorch 
RUN apt update && apt install -y --fix-missing make g++ python3-pip libopenblas-base
RUN wget https://nvidia.box.com/shared/static/ncgzus5o23uck9i5oth2n8n06k340l6k.whl -O torch-1.4.0-cp36-cp36m-linux_aarch64.whl
RUN pip3 install Cython
RUN pip3 install numpy torch-1.4.0-cp36-cp36m-linux_aarch64.whl
# install torchvision
RUN apt update && apt install libjpeg-dev zlib1g-dev git libopenmpi-dev openmpi-bin -yq
RUN git clone --branch v0.5.0 https://github.com/pytorch/vision torchvision
RUN cd torchvision && python3 setup.py install
# install fastai
RUN pip3 install jupyterlab
ENV TZ=Europe/Warsaw
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && apt update && apt -yq install npm nodejs python3-pil python3-opencv
RUN apt update && apt -yq install python3-matplotlib
RUN git clone https://github.com/NVIDIA-AI-IOT/torch2trt.git /torch2trt && mv /torch2trt/torch2trt /usr/local/lib/python3.6/dist-packages && rm -r /torch2trt
COPY tensorrt /usr/lib/python3.6/dist-packages/tensorrt
RUN pip3 install --no-deps fastai
RUN git clone https://github.com/fastai/fastai /fastai
RUN apt update && apt install libblas3 liblapack3 liblapack-dev libblas-dev gfortran -yq
RUN curl -LO https://github.com/explosion/cython-blis/files/3566013/blis-0.4.0-cp36-cp36m-linux_aarch64.whl.zip && unzip blis-0.4.0-cp36-cp36m-linux_aarch64.whl.zip && rm blis-0.4.0-cp36-cp36m-linux_aarch64.whl.zip
COPY blis-0.4.0-cp36-cp36m-linux_aarch64.whl .
RUN pip3 install scipy pandas blis-0.4.0-cp36-cp36m-linux_aarch64.whl spacy fastai scikit-learn
CMD ["sh","-c", "jupyter lab --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root --NotebookApp.password='' --NotebookApp.token=''"]

As before you can skip the docker image build and use ready image:

docker run --runtime nvidia --network app_default --name jupyter -d --rm -p 8888:8888 -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix -v $(pwd)/docker/jetson/notebooks:/opt/notebooks qooba/fastai:1.0.60-jetson

Now we can open jupyter notebook on jetson and move pickled model file export.pkl from PC.
The notebook jetson_pets.ipynb show how to load the model.

import torch
from torch2trt import torch2trt
from fastai.vision import *
from fastai.metrics import error_rate

learn = load_learner('/opt/notebooks/')
learn.model.eval()
model=learn.model

if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

Additionally we can optimize the model using torch2trt package:

x = torch.ones((1, 3, 224, 224)).cuda()
model_trt = torch2trt(learn.model, [x])

Let’s prepare example input data:

import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)

Finally we can run prediction for PyTorch and TensorRT model:

x=input_batch
y = model(x)
y_trt = model_trt(x)

and compare PyTorch and TensorRT performance:

def prediction_time(model, x):
    import time
    times = []
    for i in range(20):
        start_time = time.time()
        y_trt = model(x)

        delta = (time.time() - start_time)
        times.append(delta)
    mean_delta = np.array(times).mean()
    fps = 1/mean_delta
    print('average(sec):{},fps:{}'.format(mean_delta,fps))

prediction_time(model,x)
prediction_time(model_trt,x)

where for:
* PyTorch – average(sec):0.0446, fps:22.401
* TensorRT – average(sec):0.0094, fps:106.780

The TensorRT model is almost 5 times faster thus it is worth to use torch2trt.

References

[1] Top image DrZoltan from Pixabay

Live and let her speak – congratulations for the Milla chatbot

CHATBOT

I am pleased to hear that the first Polish banking chatbot with which you can make a transfer was awarded in a competition organized by a Gazeta Bankowa. With Milla you can talk in the Bank Millennium mobile application. Currently, Milla can speak (text to speech), listen (automatic speak recognition) and understand what you write to her (intent detection with slot filling).

This is not a sponsored post 🙂 but I’ve been developing Milla for the few months and I’m really happy that I had opportunity to do this.

Have a nice talk with Milla.

Quantum teleportation do it yourself with Q#

DIV

Quantum computing nowadays is the one of the hottest topics in the computer science world.
Recently IBM unveiled the IBM Q System One: a 20-qubit quantum computer which is touting as “the world’s first fully integrated universal quantum computing system designed for scientific and commercial use”.

In this article I’d like how to show the quantum teleportation phenomenon. I will use the Q# language designed by Microsoft to simplify creating quantum algorithms.

In this example I have used the quantum simulator which I have wrapped with the REST api and put into the docker image.

Quantum teleportation allows moving a quantum state from one location to another. Shared quantum entanglement between two particles in the sending and receiving locations is used to do this without having to move physical particles along with it.

1. Theory

Let’s assume that we want to send the message, specific quantum state described using Dirac notation:

|\psi\rangle=\alpha|0\rangle+\beta|1\rangle

Additionally we have two entangled qubits, first in Laboratory 1 and second in Laboratory 2:

|\phi^+\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)

thus we starting with the input state:

|\psi\rangle|\phi^+\rangle=(\alpha|0\rangle+\beta|1\rangle)(\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)) |\psi\rangle|\phi^+\rangle=\frac{\alpha}{\sqrt{2}}|000\rangle + \frac{\alpha}{\sqrt{2}}|011\rangle + \frac{\beta}{\sqrt{2}}|100\rangle + \frac{\beta}{\sqrt{2}}|111\rangle

To send the message we need to start with two operations applying CNOT and then Hadamard gate.

CNOT gate flips the second qubit only if the first qubit is 1.

Applying CNOT gate will modify the first qubit of the input state and will result in:

\frac{\alpha}{\sqrt{2}}|000\rangle + \frac{\alpha}{\sqrt{2}}|011\rangle + \frac{\beta}{\sqrt{2}}|110\rangle + \frac{\beta}{\sqrt{2}}|101\rangle

Hadamard gate changes states as follows:

|0\rangle \rightarrow \frac{1}{\sqrt{2}}(|0\rangle+|1\rangle))

and

|1\rangle \rightarrow \frac{1}{\sqrt{2}}(|0\rangle-|1\rangle))

Applying Hadmard gate results in:

\frac{\alpha}{\sqrt{2}}(\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle))|00\rangle + \frac{\alpha}{\sqrt{2}}(\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle))|11\rangle + \frac{\beta}{\sqrt{2}}(\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle))|10\rangle + \frac{\beta}{\sqrt{2}}(\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle))|01\rangle

and:

\frac{1}{2}(\alpha|000\rangle+\alpha|100\rangle+\alpha|011\rangle+\alpha|111\rangle+\beta|010\rangle-\beta|110\rangle+\beta|001\rangle-\beta|101\rangle)

which we can write as:

\frac{1}{2}(|00\rangle(\alpha|0\rangle+\beta|1\rangle)+|01\rangle(\alpha|1\rangle+\beta|0\rangle)+|10\rangle(\alpha|0\rangle-\beta|1\rangle)+|11\rangle(\alpha|1\rangle-\beta|0\rangle))

Then we measure the states of the first two qubits (message qubit and Laboratory 1 qubit) where we can have four results:

  • |00\rangle which simplifies equation to: |00\rangle(\alpha|0\rangle+\beta|1\rangle) and indicates that the qubit in the Laboratory 2 is \alpha|0\rangle+\beta|1\rangle

  • |01\rangle which simplifies equation to: |01\rangle(\alpha|1\rangle+\beta|0\rangle) and indicates that the qubit in the Laboratory 2 is \alpha|1\rangle+\beta|0\rangle

  • |10\rangle which simplifies equation to: |10\rangle(\alpha|0\rangle-\beta|1\rangle) and indicates that the qubit in the Laboratory 2 is \alpha|0\rangle-\beta|1\rangle

  • |11\rangle which simplifies equation to: |11\rangle(\alpha|1\rangle-\beta|0\rangle) and indicates that the qubit in the Laboratory 2 is \alpha|1\rangle-\beta|0\rangle

Now we have to send the result classical way from Laboratory 1 to Laboratory 2.

Finally we know what transformation we need to apply to qubit in the Laboratory 2
to make its state equal to message qubit:

|\psi\rangle=\alpha|0\rangle+\beta|1\rangle

if Laboratory 2 qubit is in state:

  • \alpha|0\rangle+\beta|1\rangle we don’t need to do anything.

  • \alpha|1\rangle+\beta|0\rangle we need to apply NOT gate.

  • \alpha|0\rangle-\beta|1\rangle we need to apply Z gate.

  • \alpha|1\rangle-\beta|0\rangle we need to apply NOT gate followed by Z gate

This operations will transform Laboratory 2 qubit state to initial message qubit state thus we moved the particle state from Laboratory 1 to Laboratory 2 without moving particle.

2. Code

Now it’s time to show the quantum teleportation using Q# language. I have used Microsoft Quantum Development Kit to run the Q# code inside the .NET Core application. Additionally I have added the nginx proxy with the angular gui which will help to show the results.
Everything was put inside the docker to simplify the setup.

Before you will start you will need git, docker and docker-compose installed on your machine (https://docs.docker.com/get-started/)

To run the project we have to clone the repository and run it using docker compose:

git clone https://github.com/qooba/quantum-teleportation-qsharp.git
cd quantum-teleportation-qsharp
docker-compose -f app/docker-compose.yml up

Now we can run the http://localhost:8020/ in the browser:
Q#

Then we can put the message in the Laboratory 1, click the Teleport button, trigger for the teleportation process which sends the message to the Laboratory 2.

The text is converted into array of bits and each bit is sent to the Laboratory 2 using quantum teleportation.

In the first step we encode the incoming message using X gate.

if (message) {
    X(msg);
}

Then we prepare the entanglement between the qubits in the Laboratory 1 and Laboratory 2.

H(here);
CNOT(here, there);

In the second step we apply CNOT and Hadamard gate to send the message:

CNOT(msg, here);
H(msg);

Finally we measure the message qubit and the Laboratory 1 qubit:

if (M(msg) == One) {
    Z(there);
}

if (M(here) == One) {
    X(there);
}

If the message qubit has state |1\rangle then we need to apply the Z gate to the Laboratory 2 qubit.
If the Laboratory 1 qubit has state |1\rangle then we need to apply the X gate to the Laboratory 2 qubit. This information must be sent classical way to the Laboratory 2.

Now the Laboratory 2 qubit state is equal to the initial message qubit state and we can check it:

if (M(there) == One) {
    set measurement = true;
}

This kind of communication is secure because even if someone will take over the information sent classical way it is still impossible to decode the message.

Boosting Elasticsearch with machine learning – Elasticsearch, RankLib, Docker

Telescope

Elastic search is powerful search engine. Its distributed architecture give ability to build scalable full-text search solution. Additionally it provides comprehensive query language.

Despite this sometimes the engine and search results is not enough to meet the expectations of users. In such situations it is possible to boost search quality using machine learning algorithms.

In this article I will show how to do this using RankLib library and LambdaMart algorithm . Moreover I have created ready to use platform which:

  1. Index the data
  2. Helps to label the search results in the user friendly way
  3. Trains the model
  4. Deploys the model to elastic search
  5. Helps to test the model

The whole project is setup on the docker using docker compose thus you can setup it very easy.
The platform is based on the elasticsearch learning to rank plugin. I have also used the python example described in this project.

Before you will start you will need docker and docker-compose installed on your machine (https://docs.docker.com/get-started/)

To run the project you have to clone it:

git clone https://github.com/qooba/elasticsearch-learning-to-rank.git

Then to make elasticsearch working you need to create data folder with appropriate access:

cd elasticsearch-learning-to-rank/
mkdir docker/elasticsearch/esdata1
chmod g+rwx docker/elasticsearch/esdata1
chgrp 1000 docker/elasticsearch/esdata1

Finally you can run the project:

docker-compose -f app/docker-compose.yml up

Now you can open the http://localhost:8020/.

1. Architecture

There are three main components:

A. The ngnix reverse proxy with angular app
B. The flask python app which orchestrates the whole ML solution
C. The elastic search with rank lib plugin installed

A. Ngnix

I have used the Ngnix reverse proxy to expose the flask api and the angular gui which helps with going through the whole proces.

ngnix.config

server {
    listen 80;
    server_name localhost;
    root /www/data;

    location / {
        autoindex on;
    }

    location /images/ {
        autoindex on;
    }

    location /js/ {
        autoindex on;
    }

    location /css/ {
        autoindex on;
    }

    location /training/ {
        proxy_set_header   Host                 $host;
        proxy_set_header   X-Real-IP            $remote_addr;
        proxy_set_header   X-Forwarded-For      $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto    $scheme;
        proxy_set_header Host $http_host;

        proxy_pass http://training-app:5090;
    }
}

B. Flask python app

This is the core of the project. It exposes api for:

  • Indexing
  • Labeling
  • Training
  • Testing

It calls directly the elastic search to get the data and do the modifications.
Because training with RankLib require the java thus Docker file for this part contains default-jre installation. Additionally it downloads the RankLib-2.8.jar and tmdb.json (which is used as a default data source) from: http://es-learn-to-rank.labs.o19s.com/.

Dockerfile

FROM python:3

RUN \
    apt update && \
    apt-get -yq install default-jre
RUN mkdir -p /opt/services/flaskapp/src
COPY . /opt/services/flaskapp/src
WORKDIR /opt/services/flaskapp/src
RUN pip install -r requirements.txt
RUN python /opt/services/flaskapp/src/prepare.py
EXPOSE 5090
CMD ["python", "-u", "app.py"]

C. Elastic search

As mentioned before it is the instance of elastic search with the rank lib plugin installed

Dockerfile

FROM docker.elastic.co/elasticsearch/elasticsearch:6.2.4
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install \ 
-b http://es-learn-to-rank.labs.o19s.com/ltr-1.1.0-es6.2.4.zip

All layers are composed with docker-compose.yml:

version: '2.2'
services:
  elasticsearch:
    build: ../docker/elasticsearch
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ../docker/elasticsearch/esdata1:/usr/share/elasticsearch/data
    networks:
      - esnet

  training-app:
    build: ../docker/training-app
    networks:
      - esnet
    depends_on:
      - elasticsearch
    environment:
      - ES_HOST=http://elasticsearch:9200
      - ES_INDEX=tmdb
      - ES_TYPE=movie
    volumes:
      - ../docker/training-app:/opt/services/flaskapp/src

  nginx:
    image: "nginx:1.13.5"
    ports:
      - "8020:80"
    volumes:
      - ../docker/frontend-reverse-proxy/conf:/etc/nginx/conf.d
      - ../docker/frontend-reverse-proxy/www/data:/www/data
    depends_on:
      - elasticsearch
      - training-app
    networks:
      - esnet


volumes:
  esdata1:
    driver: local

networks:
  esnet:

2. Platform

The platform helps to run and understand the whole process thought four steps:

A. Indexing the data
B. Labeling the search results
C. Training the model
D. Testing trained model

A. Indexing

The first step is obvious thus I will summarize it shortly. As mentioned before the default data source is taken from tmdb.json file but it can be simply changed using ES_DATA environment variable in the docker-compose.yml :

training-app:
    environment:
      - ES_HOST=http://elasticsearch:9200
      - ES_DATA=/opt/services/flaskapp/tmdb.json
      - ES_INDEX=tmdb
      - ES_TYPE=movie
      - ES_FEATURE_SET_NAME=movie_features
      - ES_MODEL_NAME=test_6
      - ES_MODEL_TYPE=6
      - ES_METRIC_TYPE=ERR@10

Clicking Prepare Index the data is taken from ES_DATA file and indexed in the elastic search.

prepare index

Additionally you can define:
ES_HOST – the elastic search url
ES_USER/ES_PASSWORD – elastic search credentials, by default authentication is turned off
ES_INDEX/ES_TYPE – index/type name for data from ES_DATA file
ES_FEATURE_SET_NAME – name of container for defined features (described later)
ES_MODEL_NAME – name of trained model kept in elastic search (described later)
ES_MODEL_TYPE – algorithm used to train the model (described later).
ES_METRIC_TYPE – metric type (described later)

We can train and keep multiple models in elastic search which can be used for A/B testing.

B. Labeling

The supervised learning algorithms like learn to rank needs labeled data thus in this step I will focus on this area.
First of all I have to prepare the file label_list.json which contains the list of queries to label e.g.:

[
    "rambo",
    "terminator",
    "babe",
    "die hard",
    "goonies"
]

When the file is ready I can go to the second tab (Step 2 Label).

label

For each query item the platform prepare the result candidates which have to be ranked from 0 to 4.

You have to go through the whole list and at the last step
the labeled movies are saved in the file :

# grade (0-4)   queryid docId   title
# 
# Add your keyword strings below, the feature script will 
# Use them to populate your query templates 
# 
# qid:1: rambo
# qid:2: terminator
# qid:3: babe
# qid:4: die hard
# 
# https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/
# 
# 
4 qid:1 # 7555 Rambo
4 qid:1 # 1370 Rambo III
4 qid:1 # 1368 First Blood
4 qid:1 # 1369 Rambo: First Blood Part II
0 qid:1 # 31362 In the Line of Duty: The F.B.I. Murders
0 qid:1 # 13258 Son of Rambow
0 qid:1 # 61410 Spud
4 qid:2 # 218 The Terminator
4 qid:2 # 534 Terminator Salvation
4 qid:2 # 87101 Terminator Genisys
4 qid:2 # 61904 Lady Terminator
...

Each labeling cycle is saved to the separate file: timestamp_judgments.txt

C. Training

Now it is time to use labeled data to make elastic search much more smarter. To do this we have to indicate the candidates features.
The features list is defined in the files: 1-4.json in the training-app directory.
Each feature file is elastic search query eg. the {{keyword}}
(which is searched text) match the title property:

{
    "query": {
        "match": {
            "title": "{{keywords}}"
        }
    }
}

In this example I have used 4 features:
– title match keyword
– overview match keyword
– keyword is prefix of title
– keyword is prefix of overview

I can add more features without code modification, the list of features is defined and read using naming pattern (1-n.json).

Now I can go to the Step 3 Train tab and simply click the train button.

train

At the first stage the training app takes all feature files and build the features set which is save in the elastic search (the ES_FEATURE_SET_NAME environment variable defines the name of this set).

In the next step the latest labeling file (ordered by the timestamp) is processed (for each labeled item the feature values are loaded) eg.

4 qid:1 # 7555 Rambo

The app takes the document with id=7555 and gets the elastic search score for fetch defined feature.
The Rambo example is translated into:

4   qid:1   1:12.318446 2:10.573845 3:1.0   4:1.0 # 7555    rambo

Which means that score of feature one is 12.318446 (and respectively 10.573845, 1.0, 1.0 for features 2,3,4 ).
This format is readable for the RankLib library. And the training can be perfomed.
The full list of parameters is available on: [https://sourceforge.net/p/lemur/wiki/RankLib/][https://sourceforge.net/p/lemur/wiki/RankLib/].

The ranker type is chosen using ES_MODEL_TYPE parameter:
– 0: MART (gradient boosted regression tree)
– 1: RankNet
– 2: RankBoost
– 3: AdaRank
– 4: Coordinate Ascent
– 6: LambdaMART
– 7: ListNet
– 8: Random Forests

The default used value is LambdaMART.

Additionally setting ES_METRIC_TYPE we can use the optimization metric.
Possible values:
– MAP
– NDCG@k
– DCG@k
– P@k
– RR@k
– ERR@k

The default value is ERR@10

train

Finally we obtain the trained model which is deployed to the elastic search.
The project can deploy multiple trained models and the deployed model name is defined by ES_MODEL_NAME.

D. Testing

In the last step we can test trained and deployed model.

test

We can choose the model using the ES_MODEL_NAME parameter.

It is used in the search query and can be different in each request which is useful when we need to perform A/B testing.

Happy searching 🙂