How to extract music sources: bass, drums, vocals and other ? – music separation with AI

calculator

In this article I will show how we can extract music sources: bass, drums, vocals and other accompaniments using neural networks.

Before you will continue reading please watch short introduction:

Separation of individual instruments from arranged music is another area where machine learning
algorithms could help. Demucs solves this problem using neural networks.

The trained model (https://arxiv.org/pdf/1909.01174v1.pdf) use U-NET architecture which contains two parts encoder and decoder.
On the encoder input we put the original track and after processing we get bass, drums, vocals and other accompaniments at the decoder output.
The encoder, is connected to the decoder, through additional LSTM layer,
as well as residual connections between subsequent layers.

neural network architecture

Ok, we have neural network architecture but what about the training data ?
This is another difficulty which can be handled by the unlabeled data remixing pipeline.
We start with another classifier, which can find the parts of music, which do not contain the specific instruments, for example drums.
Then, we mix it with well known drums signal, and separate the tracks using the model. 

Now we can compare, the separation results, with known drums track and mixture of other instruments. 

According to this, we can calculate the loss (L1 loss), and use it during the training. 

Additionally, we set different loss weights, for known track and the other. 

training data

The whole UI is kept in the docker image thus you can simply try it:

#for CPU
docker run --name aiaudioseparation -it -p 8000:8000 -v $(pwd)/checkpoints:/root/.cache/torch/hub/checkpoints --rm qooba/aimusicseparation

#for GPU
docker run --name aiaudioseparation --gpus all -it -p 8000:8000 -v $(pwd)/checkpoints:/root/.cache/torch/hub/checkpoints --rm qooba/aimusicseparation

web UI

Unblur low quality face images with AI

clay

In this article I will show how to improve the quality of blurred face images using
artificial intelligence. For this purpose I will use neural networks and FastAI library (ver. 1)

The project code is available on my github: https://github.com/qooba/aiunblur
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiunblur

Before you will continue reading please watch short introduction:

I have based o lot on the fastai course thus I definitely recommend to go through it.

Data

To train neural network how to rebuild the face images we need to provide the
faces dataset which will show how low quality and blurred images should be reconstructed.
Thus we need pairs of low and high quality images.

To prepare the data set we can use available fases dataset eg. FFHQ, Tufts Face Database, CelebA

We will treat the original images as a high resolution data and rescale them
to prepare low resolution input:

import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.utils.mem import *
from torchvision.models import vgg16_bn
from pathlib import Path

path = Path('/opt/notebooks/faces')
path_hr = path/'high_resolution'
path_lr = path/'small-96'

il = ImageList.from_folder(path_hr)

def resize_one(fn, i, path, size):
    dest = path/fn.relative_to(path_hr)
    dest.parent.mkdir(parents=True, exist_ok=True)
    img = PIL.Image.open(fn)
    targ_sz = resize_to(img, size, use_min=True)
    img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
    img.save(dest, quality=60)

sets = [(path_lr, 96)]
for p,size in sets:
    if not p.exists(): 
        print(f"resizing to {size} into {p}")
        parallel(partial(resize_one, path=p, size=size), il.items)

Now we can create data bunch for training:

bs,size=32,128
arch = models.resnet34
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)

def get_data(bs,size):
    data = (src.label_from_func(lambda x: path_hr/x.name)
           .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
           .databunch(bs=bs,num_workers=0).normalize(imagenet_stats, do_y=True))

    data.c = 3
    return data

data = get_data(bs,size)

Training

In this solution we will use a neural network with UNET architecture.

neural network architecture

The UNET neural network contains two parts Encoder and Decoder which are used to reconstruct the face image.
During the first stage Encoder fetch the input, extracts and aggregates the image features. At each stage the features maps are donwsampled.
Then Decoder uses extracted features and tries to rebuild the image upsampling it at each decoding stage. Finally we get regenerated images.

Additionally we need to define the Loss Function which will tell the model if the image was rebuilt correctly and allow to train the model.

To do this we will use additional neural network VGG-16. We will put Generated image and Original image (which is our target) to the network input. Then will compare the features extracted for both images at selected layers and according to this calculated the loss.

Finally we will use Adam optmizer to minimize the loss and achieve better result.

def gram_matrix(x):
    n,c,h,w = x.size()
    x = x.view(n, c, -1)
    return (x @ x.transpose(1,2))/(c*h*w)

base_loss = F.l1_loss

vgg_m = vgg16_bn(True).features.cuda().eval()
requires_grad(vgg_m, False)

blocks = [i-1 for i,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]

class FeatureLoss(nn.Module):
    def __init__(self, m_feat, layer_ids, layer_wgts):
        super().__init__()
        self.m_feat = m_feat
        self.loss_features = [self.m_feat[i] for i in layer_ids]
        self.hooks = hook_outputs(self.loss_features, detach=False)
        self.wgts = layer_wgts
        self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
              ] + [f'gram_{i}' for i in range(len(layer_ids))]

    def make_features(self, x, clone=False):
        self.m_feat(x)
        return [(o.clone() if clone else o) for o in self.hooks.stored]

    def forward(self, input, target):
        out_feat = self.make_features(target, clone=True)
        in_feat = self.make_features(input)
        self.feat_losses = [base_loss(input,target)]
        self.feat_losses += [base_loss(f_in, f_out)*w
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.metrics = dict(zip(self.metric_names, self.feat_losses))
        return sum(self.feat_losses)

    def __del__(self): self.hooks.remove()

feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])

learn = unet_learner(data, arch, wd=wd, loss_func=feat_loss, callback_fns=LossMetrics,
                     blur=True, norm_type=NormType.Weight)

Results

After training we can use the model to regenerate the images:

results

Application

Finally we can export the model and create the drag and drop application which fix the face images in web application.

results

The whole solution is packed into docker images thus you can simply start it using commands:

# with GPU
docker run -d --gpus all --rm -p 8000:8000 --name aiunblur qooba/aiunblur

# without GPU
docker run -d --rm -p 8000:8000 --name aiunblur qooba/aiunblur

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

AI Scissors – sharp cut with neural networks

scissors

Cutting photos background is one of the most tedious graphical task. In this article will show how to simplify it using neural networks.

I will use U^2-Net networks which are described in detail in the arxiv article and python library rembg to create ready to use drag and drop web application which you can use running docker image.

The project code is available on my github https://github.com/qooba/aiscissors
You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiscissors

Before you will continue reading please watch quick introduction:

Neural network

To correctly remove the image background we need to select the most visually attractive objects in an image which is covered by Salient Object Detection (SOD). To connect a low memory and computation cost with competitive results against state of art methods the novel U^2-Net architecture will be used.

U-Net convolutional networks have characteristic U shape with symmetric encoder-decoder structure. At each encoding stage the feature maps are downsampled (torch.nn.MaxPool2d) and then upsampled at each decoding
stage (torch.nn.functional.upsample). Downsample features are transferred and concatenated with upsample features using residual connections.

U^2-Net network uses two-level nested U-structure where the main architecture is a U-Net like encoder-decoder and each stage contains residual U-block. Each residual U-block repeats donwsampling/upsampling procedures which are also connected using residual connections.

neural network architecture

Nested U-structure extracts and aggregates the features at each level and enables to capture local and global information from shallow and deep layers.

The U^2-Net architecture is precisely described in arxiv article. Moreover we can go through the pytorch model definition of U2NET and U2NETP.

Additionally the authors also shared the pretrained models: U2NET (176.3MB) and U2NETP (4.7 MB).

The lighter U2NETP version is only 4.7 MB thus it can be used in mobile applications.

Web application

The neural network is wrapped with rembg library which automatically download pretrained networks and gives simple python api. To simplify the usage I have decided to create drag and drop web application (https://github.com/qooba/aiscissors)

In the application you can drag and the drop the image and then compare image with and without background side by side.

web application

You can simply run the application using docker image:

docker run --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

if you have GPU card you can use it:

docker run --gpus all  --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors 

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

When you run the container the pretrained models are downloaded thus I have mount local directory u2net_models to /root/.u2net to avoid download each time I run the container.

References

https://arxiv.org/pdf/2005.09007.pdf

https://github.com/NathanUA/U-2-Net

https://github.com/danielgatis/rembg

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin Pattern Recognition 106 107404 (2020)

FastAI with TensorRT on Jetson Nano

DIV

IoT and AI are the hottest topics nowadays which can meet on Jetson Nano device.
In this article I’d like to show how to use FastAI library, which is built on the top of the PyTorch on Jetson Nano. Additionally I will show how to optimize the FastAI model for the usage with TensorRT.

You can find the code on https://github.com/qooba/fastai-tensorrt-jetson.git.

1. Training

Although the Jetson Nano is equipped with the GPU it should be used as a inference device rather than for training purposes. Thus I will use another PC with the GTX 1050 Ti for the training.

Docker gives flexibility when you want to try different libraries thus I will use the image which contains the complete environment.

Training environment Dockerfile:

FROM nvcr.io/nvidia/tensorrt:20.01-py3
WORKDIR /
RUN apt-get update && apt-get -yq install python3-pil
RUN pip3 install jupyterlab torch torchvision
RUN pip3 install fastai
RUN DEBIAN_FRONTEND=noninteractive && apt update && apt install curl git cmake ack g++ tmux -yq
RUN pip3 install ipywidgets && jupyter nbextension enable --py widgetsnbextension
CMD ["sh","-c", "jupyter lab --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root --NotebookApp.password='' --NotebookApp.token=''"]

To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.

If you don’t want to build your image simply run:

docker run --gpus all  --name jupyter -d --rm -p 8888:8888 -v $(pwd)/docker/gpu/notebooks:/opt/notebooks qooba/fastai:1.0.60-gpu

Now you can use pets.ipynb notebook (the code is taken from lesson 1 FastAI course) to train and export pets classification model.

from fastai.vision import *
from fastai.metrics import error_rate

# download dataset
path = untar_data(URLs.PETS)
path_anno = path/'annotations'
path_img = path/'images'
fnames = get_image_files(path_img)

# prepare data 
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'
bs = 16
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

# prepare model learner
learn = cnn_learner(data, models.resnet34, metrics=error_rate)

# train 
learn.fit_one_cycle(4)

# export
learn.export('/opt/notebooks/export.pkl')

Finally you get pickled pets model (export.pkl).

2. Inference (Jetson Nano)

The Jetson Nano device with Jetson Nano Developer Kit already comes with the docker thus I will use it to setup the inference environment.

I have used the base image nvcr.io/nvidia/l4t-base:r32.2.1 and installed the pytorch and torchvision.
If you have JetPack 4.4 Developer Preview you can skip this steps and start with the base image nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.5-py3.

The FastAI installation on Jetson is more problematic because of the blis package. Finally I have found the solution here.

Additionally I have installed torch2trt package which converts PyTorch model to TensorRT.

Finally I have used the tensorrt from the JetPack which can be found in
/usr/lib/python3.6/dist-packages/tensorrt .

The final Dockerfile is:

FROM nvcr.io/nvidia/l4t-base:r32.2.1
WORKDIR /
# install pytorch 
RUN apt update && apt install -y --fix-missing make g++ python3-pip libopenblas-base
RUN wget https://nvidia.box.com/shared/static/ncgzus5o23uck9i5oth2n8n06k340l6k.whl -O torch-1.4.0-cp36-cp36m-linux_aarch64.whl
RUN pip3 install Cython
RUN pip3 install numpy torch-1.4.0-cp36-cp36m-linux_aarch64.whl
# install torchvision
RUN apt update && apt install libjpeg-dev zlib1g-dev git libopenmpi-dev openmpi-bin -yq
RUN git clone --branch v0.5.0 https://github.com/pytorch/vision torchvision
RUN cd torchvision && python3 setup.py install
# install fastai
RUN pip3 install jupyterlab
ENV TZ=Europe/Warsaw
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && apt update && apt -yq install npm nodejs python3-pil python3-opencv
RUN apt update && apt -yq install python3-matplotlib
RUN git clone https://github.com/NVIDIA-AI-IOT/torch2trt.git /torch2trt && mv /torch2trt/torch2trt /usr/local/lib/python3.6/dist-packages && rm -r /torch2trt
COPY tensorrt /usr/lib/python3.6/dist-packages/tensorrt
RUN pip3 install --no-deps fastai
RUN git clone https://github.com/fastai/fastai /fastai
RUN apt update && apt install libblas3 liblapack3 liblapack-dev libblas-dev gfortran -yq
RUN curl -LO https://github.com/explosion/cython-blis/files/3566013/blis-0.4.0-cp36-cp36m-linux_aarch64.whl.zip && unzip blis-0.4.0-cp36-cp36m-linux_aarch64.whl.zip && rm blis-0.4.0-cp36-cp36m-linux_aarch64.whl.zip
COPY blis-0.4.0-cp36-cp36m-linux_aarch64.whl .
RUN pip3 install scipy pandas blis-0.4.0-cp36-cp36m-linux_aarch64.whl spacy fastai scikit-learn
CMD ["sh","-c", "jupyter lab --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root --NotebookApp.password='' --NotebookApp.token=''"]

As before you can skip the docker image build and use ready image:

docker run --runtime nvidia --network app_default --name jupyter -d --rm -p 8888:8888 -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix -v $(pwd)/docker/jetson/notebooks:/opt/notebooks qooba/fastai:1.0.60-jetson

Now we can open jupyter notebook on jetson and move pickled model file export.pkl from PC.
The notebook jetson_pets.ipynb show how to load the model.

import torch
from torch2trt import torch2trt
from fastai.vision import *
from fastai.metrics import error_rate

learn = load_learner('/opt/notebooks/')
learn.model.eval()
model=learn.model

if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

Additionally we can optimize the model using torch2trt package:

x = torch.ones((1, 3, 224, 224)).cuda()
model_trt = torch2trt(learn.model, [x])

Let’s prepare example input data:

import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)

Finally we can run prediction for PyTorch and TensorRT model:

x=input_batch
y = model(x)
y_trt = model_trt(x)

and compare PyTorch and TensorRT performance:

def prediction_time(model, x):
    import time
    times = []
    for i in range(20):
        start_time = time.time()
        y_trt = model(x)

        delta = (time.time() - start_time)
        times.append(delta)
    mean_delta = np.array(times).mean()
    fps = 1/mean_delta
    print('average(sec):{},fps:{}'.format(mean_delta,fps))

prediction_time(model,x)
prediction_time(model_trt,x)

where for:
* PyTorch – average(sec):0.0446, fps:22.401
* TensorRT – average(sec):0.0094, fps:106.780

The TensorRT model is almost 5 times faster thus it is worth to use torch2trt.

References

[1] Top image DrZoltan from Pixabay