Toxicless texts with AI – how to measure text toxicity in the browser

internet

In this article I will show how to measure comments toxicity using Machine Learning models.

Before you will continue reading please watch short introduction:

https://www.youtube.com/watch?v=AECV2qa0Kaw

Hate, rude and toxic comments are common problem in the internet which affects many people. Today, we will prepare neural network, which detects comments toxicity, directly in the browser. The goal is to create solution which will detect toxicity in the realtime and warn the user during writing, which can discourage from writing toxic comments.

To do this, we will train the tensorflow lite model, which will run in the browser using WebAssembly backend. The WebAssembly (WASM) allows running C, C++ or RUST code at native speed. Thanks to this, prediction performance will be better than running it using javascript tensorflowjs version.
Moreover, we can serve the model, on the static page, with no additional backend servers required.

web assembly

To train the model, we will use the Kaggle Toxic Comment Classification Challenge training data,
which contains the labeled comments, with toxicity types:
* toxic
* severe_toxic
* obscene
* threat
* insult
* identity_hate

data set

Our model, will only classify, if the text is toxic, or not. Thus we need to start with preprocessing training data. Then we will use the tensorflow lite model maker library.
We will also use the Averaging Word Embedding specification which will create words embeddings and dictionary mappings using training data thus we can train the model in the different languages.
The Averaging Word Embedding specification based model will be small <1MB.
If we have small dataset we can use the pretrained embeddings. We can choose MobileBERT or BERT-Base specification.
In this case models will much more bigger 25MB w/ quantization 100MB w/o quantization for
MobileBERT and 300MB for BERT-Base (based on: https://www.tensorflow.org/lite/tutorials/model_maker_text_classification#choose_a_model_architecture_for_text_classifier)

train

Using simple model architecture (Averaging Word Embedding), we can achieve about nighty five percent accuracy, and small model size, appropriate 
for the web browser, and web assembly. 

tensorflow lite

Now, let’s prepare the non-toxic forum web application, where we can write the comments.
When we write non-toxic comments, the model won’t block it.
On the other hand, the toxic comments will be blocked, and the user warned.

Of course, this is only client side validation, which can discourage users, from writing toxic comments.

web application

To run the example simply clone git repository and run simple server to serve the static page:

git clone https://github.com/qooba/ai-toxicless-texts.git
cd ai-toxicless-texts
python3 -m http.server

The code to for preparing data, training and exporting model is here:
https://github.com/qooba/ai-toxicless-texts/blob/master/Model_Maker_Toxicity.ipynb

TinyMLOps with Arduino

Bee

In this article I will show how to build MLOps process TinyML on Arduino Nano 33 BLE Sense.

Before you will continue reading please watch short introduction:

In the last article (TinyML with Arduino) I have shown the example TinyML model which will classify
jelly bears using RGB sensor.
The next step, will be to build a process that will simplify, the model versions management, and the deployment.

mlops

The MLflow project is prepared in the Jupyter Notebook. Then we can convert the Notebook to the python code using nbdev library and version it in the Git repository.

Now we are ready to run the MLflow project using command:

mlflow run https://git_repository.git#path --no-conda --experiment-name="arduino"

The model is saved in the MLflow registry and the model version is associated with
the git commit version.

mlops git

The MLflow model contains additional artifacts:
* artifacts.ino – the arduino code which loads and uses the model
* model.h – the Tensorflow Lite model encoded to hex
* reduirements.ino.txt – the list of Arduino dependencies required by the arduino code

Example requirements.ino.txt file:

Arduino_TensorFlowLite@2.4.0-ALPHA
Arduino_APDS9960@1.0.3
Arduino_HTS221@1.0.0
Arduino_LPS22HB@1.0.1
Arduino_LSM9DS1@1.1.0
arduinoFFT@1.5.6

mlops arduino

Finally we can run the command:

docker run -it --network app_default --device=/dev/ttyACM0:/dev/ttyACM0 -e AWS_ACCESS_KEY_ID=minio -e AWS_SECRET_ACCESS_KEY=minio123 -e MLFLOW_S3_ENDPOINT_URL=http://minio:9000 -e MLFLOW_TRACKING_URI=http://mlflow:5000 qooba/tinyml-arduino:mlops ./mlops.sh -r ${RUN_ID}

where:
* –device=/dev/ttyACM0 – is arduino device connected using USB
* AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY – are minio credentials
* MLFLOW_S3_ENDPOINT_URL – is minio url
* MLFLOW_TRACKING_URI – is mlflow url
* ${RUN_ID} – is run id of model saved in MLflow registry

Additionally we have several command line options:

ARDUINO MLOPS

Syntax: docker run -it qooba/tinyml-arduino:mlops -h [-r MLFLOW_RUN_ID] [-s ARDUINO_SERIAL] [-c ARDUINO_CORE] [-m ARDUINO_MODEL]
options:
-h|--help     Print help
-r|--run      MLflow run id
-s|--serial   Arduino device serial (default: /dev/ttyACM0)
-c|--core     Arduino core (default: arduino:mbed_nano)
-m|--model    Arduino model (default: arduino:mbed_nano:nano33ble)

arduino docker

After running the code the docker image qooba/tinyml-arduino:mlops
will fetch the model for indicated RUN_ID from MLFlow.
Then it will install required dependencies using the file requirements.ino.txt.

It will compile the model and the Arduino code.
And finally upload it to the device.

Thanks to this, we can more easily manage subsequent versions of models, and automate the deployment process.

TinyML with Arduino

Ant

In this article I will show how to build Tensorflow Lite based jelly bears classifier using Arduino Nano 33 BLE Sense.

Before you will continue reading please watch short introduction:

Currently a machine learning solution can be deployed not only on very powerful machines containing GPU cards but also on a really small devices. Of course such a devices has a some limitation eg. memory etc. To deploy ML model we need to prepare it. The Tensorflow framework allows you to convert neural networks to Tensorflow Lite which can be installed on the edge devices eg. Arduino Nano.

Arduino Nano 33 BLE Sense is equipped with many sensors that allow for the implementation of many projects eg.:
* Digital microphone
* Digital proximity, ambient light, RGB and gesture sensor
* 3D magnetometer, 3D accelerometer, 3D gyroscope
* Capacitive digital sensor for relative humidity and temperature

Examples which I have used in this project can be found here.

Arduino sensors

To simplify device usage I have build Arduino Lab project where you can test and investigate listed sensors directly on the web browser.

The project dependencies are packed into docker image to simplify usage.

Before you start the project you will need to connect Arduino through USB (the Arduino will communicate with docker container through /dev/ttyACM0)

git clone https://github.com/qooba/tinyml-arduino.git
cd tinyml-arduino
./run.server.sh
# in another terminal tab
./run.nginx.sh
# go inside server container 
docker exec -it arduino /bin/bash
./start.sh

For each sensor type you can click Prepare button which will build and deploy appropriate Arduino code.


NOTE:
Sometimes you will have to deploy to arduino manually to do this you will need to
go to arduino container

docker exec -it arduino /bin/bash
cd /arduino
make rgb

Here you have complete Makefile with all types of implemented sensors.


You can start observations using Watch button.
Arduino pdm
Arduino temperature
Arduino rgb

Now we will build TinyML solution.
In the first step we will capture training data:
Arduino capture

The training data will be saved in the csv format. You will need to repeat the proces for each class you will detect.

Captured data will be uploaded to the Colab Notebook.
Here I fully base on the project Fruit identification using Arduino and TensorFlow.
In the notebook we train the model using Tensorflow then convert it to Tensorflow Lite and finally encode to hex format (model.h header file) which is readable by Arduino.

Now we compile and upload model.h header file using drag and drop mechanism.

Arduino upload

Finally we can classify the jelly bears by the color:

Arduino classify