This project aims to segment the skin in images depicting human hands. The solution is based on the analysis of RGB and HSV color spaces in 4 images (i.e., 1.jpg, 2.jpg, 3.jpg, and 4.jpg) available in the folder dataset
.
I have decided to create a Python script to define a binary threshold manually based on 2 (two) color spaces. This script converts each pixel of an input image into a row with the RGB (3 numbers), HSV (3 numbers) and the label (0: non-skin or 1: skin). You can use one of the images available in the folder dataset
to generate the dataset as a text file.
This Python scripts contains an user interface (see Figure bellow) to help you segment the human skin. It has (6) six trackbars, namely: (1) RGB <=> HSV
, choose the RGB or HSV color spaces; (2) R <=> H
, select the Red or Hue values; (3) G <=> S
, select the Green or Saturation values; (4) B <=> V
, select the Blue or Value values; (5) Lower Range
and (6) Upper Range
, to define the lower and upper boundary parameters used by the function cv2.inRange()
.
Execute the following command to run this script:
python 00_manual_segmentation.py --image "datasets/1.jpg"
Then, move the trackbars until you define a good mask to segment the user's hand (see the OpenCV windows binary
and mask
). When you are satisfied with the final result, press the button s
of your keyboard to save the text file with the RGB and HSV values. The script saves the text file in the same folder of the input image. The dataset contains a sequence of number which represents: R,G,B,H,S,V,label.
Info: In my experiments, the best option was to use the HSV color space because I could segment the user's hand and remove the entire background.
I have evaluated if it is feasible to use RGB and HSV color spaces to develop a skin color detector. This Python scripts plots 3 (three) graphics based on: (1) RGB color space [3D]; (2) HSV color space [3D]; and (3) RGB+HSV [6D]. Each plot shows the non-skin
(red) and skin
(blue) classes, and the decision boundary (plane) based on Logistic Regression.
As each dataset has information about 7.990.272 pixels, I have decided to use Pandas to speed up the file reading process. I also have decided to use a Gaussian sample of 1.000 pixels to generate and render graphics using Matplotlib.
Hint: You can change the sample size on line 58 [sample = dataset.sample(1000, random_state=1)
].
I have used dimensionality reduction to plot the RGB+HSV data into 3D. The figure below shows an example of one plot generated with this Python script. It presents that the dataset contains two different clusters to represent non-skin and skin pixels. The grey plane represents the decision boundary used to classify the pixels into these two classes. According to my initial evaluation, it is possible to use any of the 3 (three) approaches mentioned above as features
to segment the skin color in images depicting human hands.
Execute the following command to run this script:
python 01_plot_dataset.py --dataset "datasets/1.txt"
I have decided to use 5 (five) different Supervised Machine Learning (ML) methods for binary classification, namely: (1) Logistic Regression; (2) K-Nearest Neighbors (KNN); (3) Decision Tree; (4) Support Vector Machine (SVM); and (5) Neural Network (NN) based on Multi Layer Perception (MLP). I have used scikit-learn
library to create, train and evaluate the skin color classifiers.
This Python scripts uses one of the datasets generated with the Python script 00_manual_segmentation.py
. Some ML methods are fast to process of 7.990.272 pixels in a short time. However, others methods (i.e., KNN, SVM) are very slow to process the entire dataset. Therefore, you can change the sample size on line 208.
I have evaluated each ML method for RGB, HSV and RGB+HSV features. I calculated the accuracy
and confusion matrix
to evaluate the ML method using 70% of training data and 30% of test data. The sample in this analysis had 100.000 pixels from the dataset ./datasets/1.txt
. All methods have achieved and accuracy bigger than 99%. The best one was Decision Tree
using HSV and RGB+HSV colorspaces, in which it has achieved 100% of accuracy. See bellow the results of my evaluation:
Accuracy: 0.9991666666666666
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21138 | 13 |
Class 2 Actual |
12 | 8837 |
Accuracy: 0.9999666666666667
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21150 | 1 |
Class 2 Actual |
0 | 8849 |
Accuracy: 0.9999
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21148 | 3 |
Class 2 Actual |
0 | 8849 |
Accuracy: 0.9990666666666667
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21139 | 12 |
Class 2 Actual |
16 | 8833 |
Accuracy: 0.9997333333333334
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21148 | 3 |
Class 2 Actual |
5 | 8844 |
Accuracy: 0.9996
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21147 | 4 |
Class 2 Actual |
8 | 8841 |
Accuracy: 0.9992
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21145 | 6 |
Class 2 Actual |
18 | 8831 |
Accuracy: 1.0
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21151 | 3 |
Class 2 Actual |
0 | 8849 |
Accuracy: 1.0
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21151 | 0 |
Class 2 Actual |
0 | 8849 |
Accuracy: 0.9973666666666666
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21075 | 76 |
Class 2 Actual |
3 | 8846 |
Accuracy: 0.9947
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
20992 | 159 |
Class 2 Actual |
0 | 8849 |
Accuracy: 0.9939
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
20969 | 182 |
Class 2 Actual |
1 | 8848 |
Accuracy: 0.9993
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21138 | 13 |
Class 2 Actual |
8 | 8841 |
Accuracy: 0.9987333333333334
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21120 | 31 |
Class 2 Actual |
7 | 8842 |
Accuracy: 0.9992666666666666
Confusion Matrix:
Class 1 Predicted |
Class 2 Predicted |
|
---|---|---|
Class 1 Actual |
21133 | 18 |
Class 2 Actual |
4 | 8845 |
Execute the following command to run this script:
python 02_training_classifiers.py --dataset "datasets/1.txt"
Finally, this Python script detects and segments the user's hand based on HSV color space. It uses the models trained with the script 02_training_classifiers.py
. Execute the following command to run this script:
python 03_human_hand_detector.py --path "models"
First, it uses the 4 images from the dataset (i.e., 1.jpg, 2.jpg, 3.jpg, and 4.jpg). Figure bellow shows the results of Logistic Regression
, Decision Tree
, SVM
and Multi Layer Perception
using the image 3.jpg
. As you can see, the detector is able to segment the user's hand and remove the entire background from the image.
Hint: Press any button from your keyboard to go to the next image.
After showing all images from the dataset, the Python scripts opens the video capture device and uses the detector to segment the hands of images from your main webcam. You have to press the button q
to quit the program.
To improve the skin detector, it is necessary to perform some additional task, e.g.:
- Join the features of all images from the dataset in a single dataset
- Add more images of human hand with different background and illumination conditions
- Use pre-processing methods (e.g. spatial filtering, mathematical morphology) to remove noise from the input images
- Include infrared images to handle
- Adapt the detector to detect different skin colors