Train your image recognition service

In previous blog posts we have taken a look at the capabilities and technical implementation details of various cloud image recognition services. These services using standard classifiers might be incapable of detecting your own objects in new domains. For these domains we have to create a so-called custom classifier in order to detect the objects. In this blog I share some experiments with object training.

A custom classifier is a model we train ourselves using a data set of images we provide. Our data set is composed of a training and a test set. The training set contains positive and negative examples of objects we want to classify. Our test set on the other side contains images with the same objects but these images are not included in the test set and are only used to test the quality of the service. Using the training set we create and train a model that we will use to classify images the same way we classify images using built-in classifiers.

Unfortunately not all cloud- and service providers support the creation and use of custom classifiers at this moment. We could only test with IBM and Clarifai at the moment.

Training set

To create a classifier you have to construct a good training set. IBM's service requires at least 10 images per class and preferably about 50 images to work well. Clarifai does not have such a requirement but like IBM it works better with a larger training set. Our training sets contain images with different light conditions and shoot from different angles.

Test set

To make a fair comparison we have to test all models with the same set of images. Our test set contains the following:

selflearning img2

  1. Image of Bontje (light brown and white cow)
  2. Image of Maxima (black and white cow)
  3. Image of Lila (purple and white cow)
  4. Image of Bella (white cow)
  5. Image with all cows
  6. Image with no cows

Comparing results in an objective manner is hard to do manually as there is no binary correct/incorrect ranking. Therefore it is important to rank the results using an objective metric that takes this into account. Root-Mean-Squared Error (RMSE) is a good candidate for comparing various models with the same test set.

In the table below you will find an overview of RMSE scores after creating a custom classifier using our various training sets and predicting the images using our test set. A lower RMSE means a better prediction.

Training set Clarifai IBM
set 1 0.47 0.42
set 2 0.46 0.48
set 3 0.44 0.38
set 4 0.44 0.35
set 5 0.43 0.38

In the table below you can view the confidences returned by Clarifai and IBM with a model train using training set 4 and tested with test set 4.

Cow Clarifai IBM Expected value
Bella 0.24 0.56 1
Bontje 0.29 0.42 0
Lila 0.14 0.04 0
Maxima 0.31 0.04 0


We can observe that a greater amount of pictures shoot at different angles and various backgrounds generally leads to a lower RMSE and as a result the service performs better at detecting objects. The step in improvement becomes smaller once you have a fair amount of pictures. Furthermore, negative examples reduce the confidence for false positives (objects that are not present but the service thinks it is). IBM seems to outperform Clarifai and is clearly better at classifying objects when using your own classifier. (note: based on only one training set).

Custom classifiers definitely allow developers to go beyond what already exists for image recognition as they can use their own vocabulary and object domains. Creating and training a custom classifier takes little to no coding and using it to classify images is only slightly different from using the standard classifier. However, a custom classifier does require a carefully composed and tagged training set as the quality of the training set directly influences the quality of the custom classifier. The results definitely look promising and Mirabeau continues to research in this field. We are curious when providers like AWS or Microsoft Azure will offer features to train your own classifier.