Image classification using Python

Adrián Rodríguez Saínz
5 min readApr 8, 2021

Introduction

This blogpost is part of the “Turing Machine and Deep Learning with Python” course of Turing Students, a student association of Erasmus University Rotterdam.

We used 3 approaches to try to classify images based on what emotions they are picturing.

Dataset

The dataset used contains 32298 grayscale images, each with the size of 48x48 pixels. The training set, from which the computer will learn the patterns, contains 28709, while the test set, against which we can test accuracy, contains 3589 images. All images have a label attached to them, representing the emotion they illustrate. Overall we have 7 labels; happy, sad, fear, surprise, neutral, angry, and disgust.

Example of pictures with their corresponding labels

The distribution of our data is as such:

Number of pictures with their corresponding labels, the images used for training are in “Trainingimages”, and the images used for testing are in “Testimages”

Models used

To start with, there is 7 potential labels for each picture, which means that if we were to pick randomly, we would have 14.286% accuracy, therefore we have to come up with a model capable of more than that.

We used 3 algorithms, K-means Clustering, SVM (Support Vector Machine), and CNN (Convolutional Neural Network).

K-means Clustering

Our first method to use is K-means Clustering, which is an unsupervised clustering algorithm that allocates data points into clusters based on similarity.

Using an unsupervised method for a supervised task was an interesting way to see their uses firsthand and gather additional understanding about how machine learning models work.

Instead of feeding 7 categories to the model to find, we let the model find the differences in data and then have it cluster them into groups, which we later can attribute to our predetermined categories.

Visual representation of K-means Clustering

It was a nice method to use to see how the results change if we paly around with the number of clusters.

As we set the number of clusters higher, the accuracy and homogeneity (images within the same cluster having the same label) increase, while inertia (distance between points in each cluster) decreases.

It does make sense, as the number of clusters increase, the algorithm can make more clusters, made out of data points that are more similar to each other, which naturally will result in the distance between them being less. (Inertia)

Similarly, it is expected for the values of accuracy and homogeneity to go up as the clusters increase. See below visualizations.

With k=3, the model creates 3 clusters, which are way too broad and inaccurate
With k=10, the model creates 10 clusters, which segment the data into 10 clusters, with these 10 clusters being much more accurate and the points within each cluster being much more similar to each other

Our specific model’s inertia, accuracy and homogeneity with different k values:

Inertia, Homogeneity and Accuracy with k being 7, 10, 32, 64, and 128
Testing the model with our test dataset

The accuracy we reached during testing with 128 clusters was 29.3396%, which is roughly double that of randomly guessing (14.286%).

SVM

Secondly, we used SVM to classify our images. SVM is a supervised machine learning algorithm that uses classification algorithms to place data points into clusters, where the clusters are labels attached to each picture in our case.

In order to make our data suitable for SVM, we need to reduce the number of components, as currently a single image is made up of 2304 components. (48x48).

We use PCA to reduce the number of components as much as possible while still being able to explain as much of the variance as possible. We set our goal at 90% explained variance.

We need exactly 104 components to explain 90% of the variance

We plotted explained variance vs number of components on a graph, and even though we only need 104 components to explain 90% of the variance, we chose 150 components to have some leeway.

The PCA components we get are called “eigenfaces”, and they are basically vectors that make up a face. Theoretically all faces can be made with just an “average” face and a weighted sum of eigenfaces.

To illustrate the difference between the original pictures and the pictures projected from 150 components, an example of 10 pictures are shown below, with the original pictures being on top, and their respective projected pictures below them:

As you can see, even though the picture has lost some detail, the main features are still clearly visible

For SVM, we need to choose a c and a gamma, where c stands for error and gamma stands for decision boundary/curvature.

Our optimal parameters are c=5 and gamma=0.005, the results are illustrated below:

We reached an average accuracy of 48%, which is significantly better than the high twenties we accomplished with K-means Clustering.

Above is a matrix representing predicted labels by our model vs their true label

CNN

The third and last model we used was the CNN model. It is a very well known and popular algorithm to solve image classification tasks, so we had high hopes for this one.

With CNN, each image will pass through several layers, which are convolution layers, pooling layers, dropout layers, batch normalization layers, and dense layers.

Structure of our model

We trained the model for 100 epochs, and even though we had a straight up not-a-fun-time with overfitting in the beginning, the final structure resulted in validation accuracy of 63.69%, thanks to the Dropout layers, which we were really happy with.

Accuracy and Validation Accuracy for each epoch

Conclusion

We ended up with the following results during our testing:

Random choice: 14.286%
K-means Clustering: 29.3396%
SVM:
48%
CNN:
63.69%

CNN has proven to be the most accurate model out of the 3, with an amazing result of 63.69. Since our images are grayscale images with a resolution of 48x48, it is quite challenging to distinguish emotions even for humans, so a more than 60% accuracy was a delight to accomplish.

I dare you to guess with an accuracy of at least 63.69%

Thanks for reading ❤

Adrian Rodriguez Sainz

--

--