AI Scissors – sharp cut with neural networks

scissors

Cutting photos background is one of the most tedious graphical task. In this article will show how to simplify it using neural networks.

I will use U^2-Net networks which are described in detail in the arxiv article and python library rembg to create ready to use drag and drop web application which you can use running docker image.

The project code is available on my github https://github.com/qooba/aiscissors
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiscissors

Before you will continue reading please watch quick introduction:

Neural network

To correctly remove the image background we need to select the most visually attractive objects in an image which is covered by Salient Object Detection (SOD). To connect a low memory and computation cost with competitive results against state of art methods the novel U^2-Net architecture will be used.

U-Net convolutional networks have characteristic U shape with symmetric encoder-decoder structure. At each encoding stage the feature maps are downsampled (torch.nn.MaxPool2d) and then upsampled at each decoding
stage (torch.nn.functional.upsample). Downsample features are transferred and concatenated with upsample features using residual connections.

U^2-Net network uses two-level nested U-structure where the main architecture is a U-Net like encoder-decoder and each stage contains residual U-block. Each residual U-block repeats donwsampling/upsampling procedures which are also connected using residual connections.

neural network architecture

Nested U-structure extracts and aggregates the features at each level and enables to capture local and global information from shallow and deep layers.

The U^2-Net architecture is precisely described in arxiv article. Moreover we can go through the pytorch model definition of U2NET and U2NETP.

Additionally the authors also shared the pretrained models: U2NET (176.3MB) and U2NETP (4.7 MB).

The lighter U2NETP version is only 4.7 MB thus it can be used in mobile applications.

Web application

The neural network is wrapped with rembg library which automatically download pretrained networks and gives simple python api. To simplify the usage I have decided to create drag and drop web application (https://github.com/qooba/aiscissors)

In the application you can drag and the drop the image and then compare image with and without background side by side.

web application

You can simply run the application using docker image:

docker run --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

if you have GPU card you can use it:

docker run --gpus all  --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

When you run the container the pretrained models are downloaded thus I have mount local directory u2net_models to /root/.u2net to avoid download each time I run the container.

References

https://arxiv.org/pdf/2005.09007.pdf

https://github.com/NathanUA/U-2-Net

https://github.com/danielgatis/rembg

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin Pattern Recognition 106 107404 (2020)

DeepMicroscopy – my portable ML laboratory

DIV

Today I’m very happy to finally release my open source project DeepMicroscopy.
In this project I have created the platform where you can capture the images from the microscope, annotate, train the Tensorflow model and finally observe real time object detection.
The project is configured on the Jetson Nano device thus it can work with compact and portable solutions.

The project code is available on my github https://github.com/qooba/deepmicroscopy

Before you will continue reading please watch quick introduction:

1. Architecture

The solution requires three devices:
* Microscope with usb camera – e.g. Velleman CAMCOLMS3 2Mpx
* Inference server – Jetson Nano
* Training server – PC equipped with GPU card e.g. NVIDIA GTX 1050 Ti

The whole solution was built using docker images thus now I will describe components installed on each device.

Jetson

The Jetson device contains three components:
* Frontend – Vue application running on Nginx
* Backend – Python application which is the core of the solution
* Storage – Minio storage where projects, images and annotations are stored

Training Server

The training server contains two components:
* Frontend – Vue application running on Nginx
* Backend – Python application which handles the training logic

2. Platform functionalities

The most of platform’s functionality is installed on the Jetson Nano. Because the Jetson Nano compute capabilities are insufficient for model training purposes I have decided to split this part into three stages which I will describe in the training paragraph.

Projects management

In the Deep Microscopy you can create multiple projects where you annotate and recognize different objects.

You can create and switch projects in the top left menu. Each project data is kept in the separate bucket in the minio storage.

Images Capture

When you open the Capture panel in the web application and click Play ▶ button the WebRTC socket between browser and backend is created (I have used the aiortc python library). To make it working in the Chrome browser we need two things:
* use TLS for web application – the self signed certificate is already configured in the nginx
* allow Camera to be used for the application – you have to set it in the browser

Now we can stream the image from camera to the browser (I have used OpenCV library to fetch the image from microscope through usb).

When we decide to capture specific frame and click Plus ✚ button the backend saves the current frame into project bucket of minio storage.

Annotation

The annotation engine is based on the Via Image Annotator. Here you can see all images you have captured for specific project. There are a lot of features eg. switching between images (left/right arrow), zoom in/out (+/-) and of course annotation tools with different shapes (currently the training algorithm expects the rectangles) and attributes (by default the class attribute is added which is also expected by the training algorithm).

This is rather painstaking and manual task thus when you will finish remember to save the annotations by clicking save button (currently there is no auto save). When you save the project the project file (with the via schema) is saved in the project bucket.

Training

When we finish image annotation we can start model training. As mentioned before it is split into three stages.

Data package

At the beginning we have to prepare data package (which contains captured images and our annotations) by clicking the DATA button.

Training server

Then we drag and drop the data package to the application placed on machine with higher compute capabilities.

After upload the training server automatically extracts the data package, splits into train/test data and starts training.
Currently I have used the MobileNet V2 model architecture and I base on the pretrained tensorflow model.

When the training is finished the model is exported using TensorRT which optimizes the model inference performance especially on NVIDIA devices like Jetson Nano.

During and after training you can inspect all models using builtin tensorboard.

The web application periodically check training state and when the training is finished we can download the model.

Uploading model

Finally we upload the TensorRT model back to the Jetson Nano device. The model is saved into selected project bucket thus you can use multiple models for each project.

Object detection

On the Execute panel we can choose model from the drop down list (where we have list of models uploaded for selected project) and load the model clicking RUN (typically it take same time to load the model). When we click Play ▶ button the application shows real time object detection. If we want to change the model we can click CLEAR and then choose and RUN another model.

Additionally we can fetch additional detection statistics which are sent using Web Socket. Currently the number of detected items and average width, height, score are returned.

3. Setup

To start working with the Jetson Nano we have to install Jetson Nano Developer Kit.

The whole platform is working with Docker and all Dockerfiles are included in the GitHub repository

Because Jetson Nano has aarch64 / arm64 architecture thus we need separate images for Jetson components.

Jetson dockers:
* front – frontend web app
* app – backend web app
* minio – minio storage for aarch64 / arm64 architecture

Training Server dockers:
* serverfront – frontend app
* server – backend app

If you want you can build the images by yourself or you can use built images from DockerHub.

The simplest option is to run run.app.sh on Jetson Nano and run.server.sh on Training Server which will setup the whole platform.

Thanks for reading 🙂