Distributed Feature Store with Feast and Dask


In this article I will show how we combine Feast and Dask library to create distributed feature store.

Before you will continue reading please watch short introduction:

The Feature Store is very important component of the MLops process which helps to manage historical and online features. With the Feast we can for example read historical features from the parquet files and then materialize them to the Redis as a online store.

But what to do if historical data size exceeds our machine capabilities ? The Dask library can help to solve this problem. Using Dask we can distribute the data and calculations across multiple machines. The Dask can be run on the single machine or on the cluster (k8s, yarn, cloud, HPC, SSH, manual setup). We can start with the single machine and then smoothly pass to the cluster if needed. Moreover thanks to the Dask we can read bunch of parquets using path pattern and evaluate distributed training using libraries like scikit-learn or XGBoost

Feast with Dask

I have prepared ready to use docker image thus you can simply reproduce all steps.

docker run --name feast -d --rm -p 8888:8888 -p 8787:8787 qooba/feast:dask

Then check the Jupyter notebook token which you will need to login:

docker logs -f feast

And open (use the token to login):


The notebook is also available on https://github.com/qooba/feast-dask/blob/main/docker/feast-dask.ipynb.

But with the docker you will have the whole environment ready.

In the notebook you will can find all the steps:

Random data generation

I have used numpy and scikit-learn to generate 1M entities end historical data (10 features generated with make_hastie_10_2 function) for 14 days which I save as a parquet file (1.34GB).

Feast configuration and registry

feature_store.yaml – where I use local registry and Sqlite database as a online store.

features.py – with one file source (generate parquet) and features definition.

The create the Feast registry we have to run:

feast apply

Additionally I have created simple library which helps to inspect feast schema directly in the Jupyter notebook

pip install feast-schema
from feast_schema import FeastSchema


Feast schema

Dask cluster setup

Then I setup simple Dask cluster with scheduler and 4 workers.

dask-scheduler --host --port 8786 --bokeh-port 8787 &

dask-worker --host --worker-port 8701 &
dask-worker --host --worker-port 8702 &
dask-worker --host --worker-port 8703 &
dask-worker --host --worker-port 8704 &

The Dask dashboard is exposed on port 8787 thus you can follow Dask metrics on:


Dask dashboard

Fetching historical features

In the next step I have fetched the historical features using Feast with the Dask:

from feast import FeatureStore

store = FeatureStore(repo_path='.')
training_df = store.get_historical_features(

this takes about 14 seconds and is much more faster than Feast without the Dask.

CPU times: user 2min 51s, sys: 6.64 s, total: 2min 57s
Wall time: 2min 52s

CPU times: user 458 ms, sys: 65.3 ms, total: 524 ms
Wall time: 14.7 s

Distributed training with Sklearn

After fetching the data we can start with the training. We can used fetched Pandas dataframe but we can also fetch Dask dataframe instead:

from feast import FeatureStore
training_dd = store.get_historical_features(

Using Dask dataframe we can continue distributed training with the distributed data.
On the other hand if we will use Pandas dataframe the data will be computed to the one node.

To start distributed training with scikit-learn we can use Joblib library with the dask backend:

import joblib
from sklearn.ensemble import GradientBoostingClassifier
from dask_ml.model_selection import train_test_split

predictors = training_dd[["f0","f1","f2","f3","f4","f5","f6","f7","f8","f9"]]
targets = training_dd[["y"]]

X_train, X_test, y_train, y_test = train_test_split(predictors, targets, test_size=.3)

with joblib.parallel_backend('dask'):
    clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0, verbose=1).fit(X_train, y_train)

    score=clf.score(X_test, y_test)


Online features materialization

Finally I have materialized the data to the local Sqlite database:

feast materialize 2021-01-01T01:00:00 2021-01-31T23:59:00

In this case the materialization data is also prepared using Dask.

How to extract music sources: bass, drums, vocals and other ? – music separation with AI


In this article I will show how we can extract music sources: bass, drums, vocals and other accompaniments using neural networks.

Before you will continue reading please watch short introduction:

Separation of individual instruments from arranged music is another area where machine learning
algorithms could help. Demucs solves this problem using neural networks.

The trained model (https://arxiv.org/pdf/1909.01174v1.pdf) use U-NET architecture which contains two parts encoder and decoder.
On the encoder input we put the original track and after processing we get bass, drums, vocals and other accompaniments at the decoder output.
The encoder, is connected to the decoder, through additional LSTM layer,
as well as residual connections between subsequent layers.

neural network architecture

Ok, we have neural network architecture but what about the training data ?
This is another difficulty which can be handled by the unlabeled data remixing pipeline.
We start with another classifier, which can find the parts of music, which do not contain the specific instruments, for example drums.
Then, we mix it with well known drums signal, and separate the tracks using the model. 

Now we can compare, the separation results, with known drums track and mixture of other instruments. 

According to this, we can calculate the loss (L1 loss), and use it during the training. 

Additionally, we set different loss weights, for known track and the other. 

training data

The whole UI is kept in the docker image thus you can simply try it:

#for CPU
docker run --name aiaudioseparation -it -p 8000:8000 -v $(pwd)/checkpoints:/root/.cache/torch/hub/checkpoints --rm qooba/aimusicseparation

#for GPU
docker run --name aiaudioseparation --gpus all -it -p 8000:8000 -v $(pwd)/checkpoints:/root/.cache/torch/hub/checkpoints --rm qooba/aimusicseparation

web UI

TinyML with Arduino


In this article I will show how to build Tensorflow Lite based jelly bears classifier using Arduino Nano 33 BLE Sense.

Before you will continue reading please watch short introduction:

Currently a machine learning solution can be deployed not only on very powerful machines containing GPU cards but also on a really small devices. Of course such a devices has a some limitation eg. memory etc. To deploy ML model we need to prepare it. The Tensorflow framework allows you to convert neural networks to Tensorflow Lite which can be installed on the edge devices eg. Arduino Nano.

Arduino Nano 33 BLE Sense is equipped with many sensors that allow for the implementation of many projects eg.:
* Digital microphone
* Digital proximity, ambient light, RGB and gesture sensor
* 3D magnetometer, 3D accelerometer, 3D gyroscope
* Capacitive digital sensor for relative humidity and temperature

Examples which I have used in this project can be found here.

Arduino sensors

To simplify device usage I have build Arduino Lab project where you can test and investigate listed sensors directly on the web browser.

The project dependencies are packed into docker image to simplify usage.

Before you start the project you will need to connect Arduino through USB (the Arduino will communicate with docker container through /dev/ttyACM0)

git clone https://github.com/qooba/tinyml-arduino.git
cd tinyml-arduino
# in another terminal tab
# go inside server container 
docker exec -it arduino /bin/bash

For each sensor type you can click Prepare button which will build and deploy appropriate Arduino code.

Sometimes you will have to deploy to arduino manually to do this you will need to
go to arduino container

docker exec -it arduino /bin/bash
cd /arduino
make rgb

Here you have complete Makefile with all types of implemented sensors.

You can start observations using Watch button.
Arduino pdm
Arduino temperature
Arduino rgb

Now we will build TinyML solution.
In the first step we will capture training data:
Arduino capture

The training data will be saved in the csv format. You will need to repeat the proces for each class you will detect.

Captured data will be uploaded to the Colab Notebook.
Here I fully base on the project Fruit identification using Arduino and TensorFlow.
In the notebook we train the model using Tensorflow then convert it to Tensorflow Lite and finally encode to hex format (model.h header file) which is readable by Arduino.

Now we compile and upload model.h header file using drag and drop mechanism.

Arduino upload

Finally we can classify the jelly bears by the color:

Arduino classify

Feast with AI – feed your MLflow models with feature store


In this article I will show how to prepare complete MLOPS solution based on the Feast feature store and MLflow platform.

Before you will continue reading please watch short introduction:

The whole solution will be deployed on the kubernetes (mlflow_feast.yaml).


We will use:
* Feast – as a Feature Store
* MLflow – as model repository
* Minio – as a S3 storage
* Jupyter notebook – as a workspace
* Redis – for a online features store

propensity to buy

To better visualize the whole process we will use the Propensity to buy example where I base on the Kaggle examples and data.


We start in Jupyter Notebook where we prepare Feast feature store schema which is kept in S3.

We can simply inspect the Feast schema in Jupyter Notebook:

from feast import FeatureStore
from IPython.core.display import display, HTML
import json
from json2html import *
import warnings

class FeastSchema:
    def __init__(self, repo_path: str):
        self.store = FeatureStore(repo_path=repo_path)

    def show_schema(self, skip_meta: bool= False):
        display(HTML(json2html.convert(json = feast_schema)))

    def show_table_schema(self, table: str, skip_meta: bool= False):
        display(HTML(json2html.convert(json = {table:feasture_tables_dictionary[table]})))

    def __project_show_schema(self, skip_meta: bool= False):
        for entity in feast_entities:

        for feature_table in feast_feature_tables:
            if 'entities' in feature_table_spec:
                for entity in feature_table_spec['entities']:

            if not skip_meta:


        return feasture_tables_dictionary


In our case we store the data in Apache Parquet files in S3 bucket.
Using the Feast we can fetch the historical features and train the model using Scikit-learn library


store = FeatureStore(repo_path=".")

s3 = fs.S3FileSystem(endpoint_override=os.environ.get("FEAST_S3_ENDPOINT_URL"))
entity_df=pd.read_parquet(f'{bucket_name}/{filename}_entities.parquet', filesystem=s3)

training_df = store.get_historical_features(
    feature_refs = [

predictors = training_df.drop(['propensity_data__ordered','UserID','event_timestamp'], axis=1)
targets = training_df['propensity_data__ordered']

X_train, X_test, y_train, y_test = train_test_split(predictors, targets, test_size=.3)



ac_score=sklearn.metrics.accuracy_score(y_test, predictions)

propensity_model_path = 'propensity.joblib'
joblib.dump(classifier, propensity_model_path)

artifacts = {
    "propensity_model": propensity_model_path,
    "feature_store": "feature_store.yaml"

The model will use online Feast redis features as well as additional features from the request thus we need to wrap the MLflow model and define it:

import mlflow.pyfunc
class PropensityWrapper(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
        import joblib
        from feast import FeatureStore
        import pandas as pd 
        import os

        self.model = joblib.load(context.artifacts["propensity_model"])
        self.store = FeatureStore(repo_path=os.environ.get("FEAST_REPO_PATH"))

    def predict(self, context, model_input):

        feature_vector = self.store.get_online_features(
            entity_rows=[{"UserID": uid} for uid in users]

        merged_data = pd.merge(model_input,data, how="inner", on=["UserID"], suffixes=('_x', '')).drop(['UserID'], axis=1)
        return self.model.predict(merged_data)

Now we can log the MLflow model to the repository:

import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn
import mlflow.pyfunc


with mlflow.start_run():

    #mlflow.log_param("var_smoothing", input_params['var_smoothing'])
    mlflow.log_metric("accuracy_score", ac_score)

    tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

    if tracking_url_type_store != "file":

We can export the code and run is using MLflow cli:

mlflow run . --no-conda --experiment-name="propensity" -P var_smoothing=1e-9

Now we need to materialize features to Redis:

feast materialize 2021-03-22T23:42:00 2021-06-22T23:42:00

Using MLflow we can simply deploy model as a microservice in k8s.
In our case we want to deploy the model models:/propensity_model/Production
which is currently assigned for Production. During start the MLflow will automatically fetch the proper model from S3:

apiVersion: apps/v1
kind: Deployment
  name: mlflow-serving
  namespace: qooba
  replicas: 1
      app: mlflow-serving
      version: v1
        app: mlflow-serving
        version: v1
      - image: qooba/mlflow:serving
        imagePullPolicy: IfNotPresent
        name: mlflow-serving
        - name: MLFLOW_TRACKING_URI
          value: http://mlflow.qooba.svc.cluster.local:5000
        - name: AWS_ACCESS_KEY_ID
              name: minio-auth
              key: username
        - name: AWS_SECRET_ACCESS_KEY
              name: minio-auth
              key: password
        - name: MLFLOW_S3_ENDPOINT_URL
          value: http://minio.qooba.svc.cluster.local:9000
        - name: FEAST_S3_ENDPOINT_URL
          value: http://minio.qooba.svc.cluster.local:9000
        - name: REDIS_TYPE
          value: REDIS
          value: redis.qooba.svc.cluster.local:6379,db=0
        - name: FEAST_TELEMETRY
          value: "false"
        - name: FEAST_REPO_PATH
          value: /feast_repository
        - name: PORT
          value: "5000"
        - name: MODEL
          value: models:/propensity_model/Production
        - containerPort: 5000
          - mountPath: /feast_repository
            name: config
        - name: config
            name: mlflow-serving
            - key: feature_store
              path: feature_store.yaml

On each HTTP request:

import requests
import json


    'Content-Type': 'application/json; format=pandas-records'

    {"UserID": "a720-6b732349-a720-4862-bd21-644732",
     'propensity_data:device_mobile': 1.0,
     'propensity_data:device_computer': 0.0,
     'propensity_data:device_tablet': 0.0

response=requests.post(url, data=json.dumps(data), headers=headers)

The model will fetch the client features (based on UserID) from Redis and HTTP request and generate prediction.

Flink with AI – how to use Flink with MLflow model in Jupyter Notebook


In this article I will show how to process streams with Apache Flink and MLflow model

Before you will continue reading please watch short introduction:

Apache Flink allows for an efficient and scalable way of processing streams. It is a distributed processing engine which supports multiple sources like: Kafka, NiFi and many others
(if we need custom, we can create them ourselves).

Apache Flink also provides the framework for defining streams operations in languages like:
Java, Scala, Python and SQL.

To simplify the such definitions we can use Jupyter Notebook as a interface. Of course we can write in Python using PyFlink library but we can make it even easier using writing jupyter notebook extension (“magic words”).

Using Flink extension (magic.ipynb) we can simply use Flink SQL sql syntax directly in Jupyter Notebook.

To use the extesnions we need to load it:

%reload_ext flinkmagic

Then we need to initialize the Flink StreamEnvironment:


Now we can use the SQL code for example:

FileSystem connector:

    word varchar,
    cnt bigint) WITH (
        'connector.type' = 'filesystem',
        'format.type' = 'csv',
        'connector.path' = '/opt/flink/notebooks/data/word_count_output1')

MySQL connector:

    smstext varchar,
    smstype varchar) WITH (
        'connector.type' = 'jdbc',
        'connector.url' = 'jdbc:mysql://mysql:3306/test',
        'connector.table' = 'sms',
        'connector.driver' = 'com.mysql.jdbc.Driver',
        'connector.write.flush.interval' = '10',
        'connector.username' = 'root',
        'connector.password' = 'my-secret-pw')

Kafka connector:

CREATE TABLE MySourceKafkaTable (word varchar) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'test',
    'connector.startup-mode' = 'latest-offset',
    'connector.properties.bootstrap.servers' = 'kafka:9092',
    'connector.properties.group.id' = 'test',
    'format.type' = 'csv'

The magic keyword will automatically execute SQL in existing StreamingEnvironment.

Now we can apply the Machine Learning model. In plain Flink we can use UDF function defined
in python but we will use MLflow model which wraps the ML frameworks (like PyTorch, Tensorflow, Scikit-learn etc.). Because MLflow expose homogeneous interface we can
create another “jupyter magic” which will automatically load MLflow model as a Flink function.

%flink_mlflow "SPAM_CLASSIFIER" "/mlflow/mlruns/2/64a89b0a6b7346498316bfae4c298535/artifacts/model" "[DataTypes.STRING()]" "DataTypes.STRING()"

Now we can simply write Flink SQL query:

SELECT word as smstext, SPAM_CLASSIFIER(word) as smstype FROM MySourceKafkaTable

which in our case will fetch kafka events and classify it using MLflow spam classifier. The
results will be displayed in the realtime in the Jupyter Notebook as a events DataFrame.

If we want we can simply use other python libraries (like matplotlib and others) to create
graphical representation of the results eg. pie chart.

You can find the whole code including: Flink examples, extension and Dockerfiles here:

You can also use docker image: qooba/flink:dev to test and run notebooks inside.
Please check the run.sh
where you have all components (Kafka, MySQL, Jupyter with Flink, MLflow repository).

Animated Art with AI – face reeanactment in action


In this article I will show how to use artificial intelligence to add motion to the images and photos.

Before you will continue reading please watch short introduction:

Face reenactment

To bring photos to life we can use the face reenactment algorithm designed to transfer the facial movements in the video to another image.

face reenactment diagram

In this project I have used github implementation: https://github.com/AliaksandrSiarohin/first-order-model. Where the extensive description of the neural network architecture can be found in this paper. The solution contains of two parts: motion module and generation module.
The motion module at the first stage extracts the key points from the source and target image. In fact in the solution we assume that reference image which we can to the source and target image exists and at the first stage the transformations from reference image to source (T_{S \leftarrow R} (p_k)) and target (T_{T \leftarrow R} (p_k)) image is calculated respectively. Then the first order Taylor expansions \frac{d}{dp}T_{S \leftarrow R} (p)| {p=p_k} and \frac{d}{dp}T_{T \leftarrow R} (p)| {p=p_k} is used to calculate dense motion field.
The generation module use calculated dense motion field and source image to generate new image that will resemble target image.

face reenactment diagram

The whole solution is packed into docker image thus we can simply reproduce the results using command:

docker run -it --rm --gpus all -v $(pwd)/torch_models:/root/.torch/models -v $(pwd)/checkpoints:/ai/checkpoints -v $(pwd)/test:/ai/test qooba/aifacereeanactment python3 ./prepare.py --source_image /ai/test/test.jpg --driving_video /ai/test/test.mp4 --output /ai/test/test_generated.mp4

NOTE: additional volumes (torch_models and checkpoints) are mount because during first run the trained neural networks are downloaded.

To reproduce the results we need to provide two files motion video and source image. In above example I put them into test directory and mount it into docker container (-v $(pwd)/test:/ai/test) to use them into it.

Below you have all command line options:

usage: prepare.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
                  [--source_image SOURCE_IMAGE]
                  [--driving_video DRIVING_VIDEO] [--crop_image]
                  [--crop_image_padding CROP_IMAGE_PADDING [CROP_IMAGE_PADDING ...]]
                  [--crop_video] [--output OUTPUT] [--relative]
                  [--no-relative] [--adapt_scale] [--no-adapt_scale]
                  [--find_best_frame] [--best_frame BEST_FRAME] [--cpu]


optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       path to config
  --checkpoint CHECKPOINT
                        path to checkpoint to restore
  --source_image SOURCE_IMAGE
                        source image
  --driving_video DRIVING_VIDEO
                        driving video
  --crop_image, -ci     autocrop image
                        autocrop image paddings left, upper, right, lower
  --crop_video, -cv     autocrop video
  --output OUTPUT       output video
  --relative            use relative or absolute keypoint coordinates
  --no-relative         don't use relative or absolute keypoint coordinates
  --adapt_scale         adapt movement scale based on convex hull of keypoints
  --no-adapt_scale      no adapt movement scale based on convex hull of
  --find_best_frame     Generate from the frame that is the most alligned with
                        source. (Only for faces, requires face_aligment lib)
  --best_frame BEST_FRAME
                        Set frame to start from.
  --cpu                 cpu mode.

New Face with AI


In this article I will show how to use artificial intelligence to generate human faces.

Before you will continue reading please watch short introduction:

Generative adversarial network

To generate realistic human faces, we can use neural networks with GAN (Generative adversarial network) architecture.

neural network architecture

The GaN network consists of two parts of the Generator whose task is to generate the image from random input and a discriminator that checks if the generated image is realistic.

training progress

During training, the networks compete with each other, the generator tries to generate better and better images
and thereby deceive the Discriminator. On the other hand, the Discriminator learns to distinguish between real and generated photos.

To train the discriminator, we use both real photos and those generated by the generator.

Finally, we can achieve the following results using DCGAN network.
As you can see some faces look realistic while some are distorted, additionally the network can only generate low resolution images.

training results

We can achieve much better results using the StyleGaN (arxiv article) network, which, among other things, differs in that the next layers of the network are progressively added during training.

I generated the images using pretrained networks and the effect is really amazing.

results stylegan

Unblur low quality face images with AI


In this article I will show how to improve the quality of blurred face images using
artificial intelligence. For this purpose I will use neural networks and FastAI library (ver. 1)

The project code is available on my github: https://github.com/qooba/aiunblur
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiunblur

Before you will continue reading please watch short introduction:

I have based o lot on the fastai course thus I definitely recommend to go through it.


To train neural network how to rebuild the face images we need to provide the
faces dataset which will show how low quality and blurred images should be reconstructed.
Thus we need pairs of low and high quality images.

To prepare the data set we can use available fases dataset eg. FFHQ, Tufts Face Database, CelebA

We will treat the original images as a high resolution data and rescale them
to prepare low resolution input:

import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.utils.mem import *
from torchvision.models import vgg16_bn
from pathlib import Path

path = Path('/opt/notebooks/faces')
path_hr = path/'high_resolution'
path_lr = path/'small-96'

il = ImageList.from_folder(path_hr)

def resize_one(fn, i, path, size):
    dest = path/fn.relative_to(path_hr)
    dest.parent.mkdir(parents=True, exist_ok=True)
    img = PIL.Image.open(fn)
    targ_sz = resize_to(img, size, use_min=True)
    img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
    img.save(dest, quality=60)

sets = [(path_lr, 96)]
for p,size in sets:
    if not p.exists(): 
        print(f"resizing to {size} into {p}")
        parallel(partial(resize_one, path=p, size=size), il.items)

Now we can create data bunch for training:

arch = models.resnet34
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)

def get_data(bs,size):
    data = (src.label_from_func(lambda x: path_hr/x.name)
           .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
           .databunch(bs=bs,num_workers=0).normalize(imagenet_stats, do_y=True))

    data.c = 3
    return data

data = get_data(bs,size)


In this solution we will use a neural network with UNET architecture.

neural network architecture

The UNET neural network contains two parts Encoder and Decoder which are used to reconstruct the face image.
During the first stage Encoder fetch the input, extracts and aggregates the image features. At each stage the features maps are donwsampled.
Then Decoder uses extracted features and tries to rebuild the image upsampling it at each decoding stage. Finally we get regenerated images.

Additionally we need to define the Loss Function which will tell the model if the image was rebuilt correctly and allow to train the model.

To do this we will use additional neural network VGG-16. We will put Generated image and Original image (which is our target) to the network input. Then will compare the features extracted for both images at selected layers and according to this calculated the loss.

Finally we will use Adam optmizer to minimize the loss and achieve better result.

def gram_matrix(x):
    n,c,h,w = x.size()
    x = x.view(n, c, -1)
    return (x @ x.transpose(1,2))/(c*h*w)

base_loss = F.l1_loss

vgg_m = vgg16_bn(True).features.cuda().eval()
requires_grad(vgg_m, False)

blocks = [i-1 for i,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]

class FeatureLoss(nn.Module):
    def __init__(self, m_feat, layer_ids, layer_wgts):
        self.m_feat = m_feat
        self.loss_features = [self.m_feat[i] for i in layer_ids]
        self.hooks = hook_outputs(self.loss_features, detach=False)
        self.wgts = layer_wgts
        self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
              ] + [f'gram_{i}' for i in range(len(layer_ids))]

    def make_features(self, x, clone=False):
        return [(o.clone() if clone else o) for o in self.hooks.stored]

    def forward(self, input, target):
        out_feat = self.make_features(target, clone=True)
        in_feat = self.make_features(input)
        self.feat_losses = [base_loss(input,target)]
        self.feat_losses += [base_loss(f_in, f_out)*w
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.metrics = dict(zip(self.metric_names, self.feat_losses))
        return sum(self.feat_losses)

    def __del__(self): self.hooks.remove()

feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])

learn = unet_learner(data, arch, wd=wd, loss_func=feat_loss, callback_fns=LossMetrics,
                     blur=True, norm_type=NormType.Weight)


After training we can use the model to regenerate the images:



Finally we can export the model and create the drag and drop application which fix the face images in web application.


The whole solution is packed into docker images thus you can simply start it using commands:

# with GPU
docker run -d --gpus all --rm -p 8000:8000 --name aiunblur qooba/aiunblur

# without GPU
docker run -d --rm -p 8000:8000 --name aiunblur qooba/aiunblur

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

Fly AI with Tello drone


The popularity of drones and the area of their application is becoming greater each year.

In this article I will show how to programmatically control Tello Ryze drone, capture camera video and detect objects using Tensorflow. I have packed the whole solution into docker images (the backend and Web App UI are in separate images) thus you can simply run it.

The project code is available on my github https://github.com/qooba/aidrone
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aidrone

Before you will continue reading please watch short introduction:


architecture diagram

The application will use two network interfaces.
The first will be used by the python backend to connect the the Tello wifi to send the commands and capture video stream. In the backend layer I have used the DJITelloPy library which covers all required tello move commands and video stream capture.
To efficiently show the video stream in the browser I have used the WebRTC protocol and aiortc library. Finally I have used the Tensorflow 2.0 object detection with pretrained SSD ResNet50 model.

The second network interface will be used to expose the Web Vue application.
I have used nginx to serve the frontend application


drone controls

Using Web interface you can control the Tello movement where you can:
* start video stream
* stop video stream
* takeoff – which starts Tello flight
* land
* up
* down
* rotate left
* rotate right
* forward
* backward
* left
* right

In addition using draw detection switch you can turn on/off the detection boxes on the captured video stream (however this introduces a delay in the video thus it is turned off by default). Additionally I send the list of detected classes through web sockets which are also displayed.

drone detection

As mentioned before I have used the pretrained model thus It is good idea to train your own model to get better results for narrower and more specific class of objects.

Finally the whole solution is packed into docker images thus you can simply start it using commands:

docker network create -d bridge app_default
docker run --name tello --network app_default --gpus all -d --rm -p 8890:8890 -p 8080:8080 -p 8888:8888 -p 11111:11111/udp  qooba/aidrone /bin/bash -c "python3 drone.py"
docker run -d --rm --network app_default --name nginx -p 80:80 -p 443:443 qooba/aidrone:front

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

AI Scissors – sharp cut with neural networks


Cutting photos background is one of the most tedious graphical task. In this article will show how to simplify it using neural networks.

I will use U^2-Net networks which are described in detail in the arxiv article and python library rembg to create ready to use drag and drop web application which you can use running docker image.

The project code is available on my github https://github.com/qooba/aiscissors
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiscissors

Before you will continue reading please watch quick introduction:

Neural network

To correctly remove the image background we need to select the most visually attractive objects in an image which is covered by Salient Object Detection (SOD). To connect a low memory and computation cost with competitive results against state of art methods the novel U^2-Net architecture will be used.

U-Net convolutional networks have characteristic U shape with symmetric encoder-decoder structure. At each encoding stage the feature maps are downsampled (torch.nn.MaxPool2d) and then upsampled at each decoding
stage (torch.nn.functional.upsample). Downsample features are transferred and concatenated with upsample features using residual connections.

U^2-Net network uses two-level nested U-structure where the main architecture is a U-Net like encoder-decoder and each stage contains residual U-block. Each residual U-block repeats donwsampling/upsampling procedures which are also connected using residual connections.

neural network architecture

Nested U-structure extracts and aggregates the features at each level and enables to capture local and global information from shallow and deep layers.

The U^2-Net architecture is precisely described in arxiv article. Moreover we can go through the pytorch model definition of U2NET and U2NETP.

Additionally the authors also shared the pretrained models: U2NET (176.3MB) and U2NETP (4.7 MB).

The lighter U2NETP version is only 4.7 MB thus it can be used in mobile applications.

Web application

The neural network is wrapped with rembg library which automatically download pretrained networks and gives simple python api. To simplify the usage I have decided to create drag and drop web application (https://github.com/qooba/aiscissors)

In the application you can drag and the drop the image and then compare image with and without background side by side.

web application

You can simply run the application using docker image:

docker run --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

if you have GPU card you can use it:

docker run --gpus all  --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

When you run the container the pretrained models are downloaded thus I have mount local directory u2net_models to /root/.u2net to avoid download each time I run the container.





U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin Pattern Recognition 106 107404 (2020)