Image recognition with standard cloud services – first findings

Auteur
Hayo Rubingh
Datum

We are very excited about working with Cognitive cloud services. Services - based on artificial intelligence - that help you build smart applications like chatbots, authentication by video recognition or speech-controlled interfaces. We will first take a look at the standard services offered by various public cloud providers, like Amazon Web Services, Microsoft, Google and IBM, as well as specific services like Clarifai and Cloudsight. We devote our attention to text, speech and image recognition services. In this post, I'm going to share a number of prior findings - with conclusions to follow at a later date - about automated image recognition that we've come up with for now.

cowrecognizing phases iphone 04-1024x576

Our first step was focusing on recognizing objects in photos and questioning whether we're ready to make cloud services more intelligent by teaching them new behaviors. To this end, a test application has been developed that has images sequentially analyzed by six different cloud services. We take the results and analyze them, and at the same time we measure how quickly cloud services are returning results. And since each service has its own connection protocols (API), we're also gathering relevant information for our developer teams.

This enables us to put together a ton of useful information that makes it possible for us to make better choices as far as adopting AI cloud services, but also to acquire the necessary implementation experience.

Initial findings

We had the following image analyzed:

cows


This yields the following results:

Microsoft IBM Google AWS Clarifai Cloudsight
grass,
0.9999994
azure color,
0.974
pasture,
0.9630213
Animal,
98.73827
cow,
0.9997515
see
note 3
cow,
0.9999822
mammal,
0.856
habitat,
0.9611388
Cattle,
98.73827
agriculture,
0.998435
-
sky,
0.9977474
animal,
0.856
dairy cow,
0.9576128
Cow,
98.73827
milk,
0.9980802
-
field,
0.977616
ruminant,
0.762
grazing,
0.8915433
Dairy Cow,
98.73827
cattle,
0.9974641
-
animal,
0.8061956
cattle,
0.725
cattle like mammal,
0.8588901
Mammal,
98.73827
beef cattle,
0.996951
-
(8 more) (5 more) (8 more) (5 more) (13 more) -


Note 1: All services except Cloudsight (see Note 3) return a resultset consisting of tags describing the objects found in the image.

Note 2: The number that follows the tag description is the so-called confidence level, the degree to which each service is certain that the object in question has been detected in the picture. According to Clarifai the confidence level is 0.9980802 for seeing some milk, very doubtful to my opinion.

Note 3: Cloudsight doesn’t return any object descriptions, but describes the analyzed image in 1 complete sentence -> “herd of cattle at green grass field during daytime”. Impressive and useful, for example, for generating automated captions for pictures. Also the image recognitizon service of Microsoft is capable of providing full descriptions. In our case Microsoft returned -> “a herd of cattle grazing on a lush green field”.


What's notabe is just how different the returned results can be:

  • There's a huge difference in the number of detected objects. We have seen examples ranging from 3 to 20 items found;
  • The item descriptions are just as divergent. Here's what we get for the item 'cow': “cow, Friesian cattle, dairy cow, cattle, beef cattle". IBM detects our cow is Friesian cattle (confidence 0,574), very nice!
  • There's a big difference in confidence levels in this example. IBM is in general a lot less certain than the others in this specific case. Microsoft and Clarifai are almost 100% certain that there is a cow inside this image.
  • No service detected the cowshed in the upper right corner of the picture.


Another example

A 2nd test with the image below provided further insights. For clarity, we've highlighted the most accurate description (bathroom) in green, and the incorrect evaluations in red.

bathroom


Microsoft IBM Google AWS Clarifai Cloudsight
wall,
0.9887767
room,
0.935
room,
0.9144701
Bathroom,
95.45831
bathroom,
0.9996419
-
indoor,
0.9882076
bathroom,
0.906
bathroom,
0.8838118
Indoors,
95.45831
faucet,
0.9989328
-
bathroom,
0.9870275
gray color,
0.88
plumbing fixture,
0.6657044
Interior Design,
81.83018
contemporary,
0.9988343
-
room,
0.76094
beige color,
0.802
floor,
0.6204514
Room,
81.83018
bathtub,
0.9980046
-
toilet,
0.7345799
booth,
0.577
interior design,
0.5699441
Apartment,
55.53356
washcloset,
0.9974676
-
tile,
0.3172223
closet,
0.577
apartment,
0.5122291
Housing,
55.53356
shower,
0.9965346
-
tiled,
0.2736091
shower stall,
0.576
- Furniture,
51.09834
family,
0.9932662
-
- - - - - -
a bathroom
with a sink
and a mirror
- - - - white front load
washer and dryer
set near vanity
sink and shower room


What is striking, is the good description that Cloudsight gives of the test photo. Impressive! Although, there are some interesting discussions about the deep learning capabilities of Cloudsight.

To reiterate. This first test gives us already a lot of insight into the possibilities of the standard cognitive cloud services. As for image analysis, we are now exploring self-learning opportunities. Or in other words, how can we train a service in order to further improve the accuracy of the analyses? Ultimately, your goal is optimum accuracy to offer the best user experience of your service.

Want to stay posted about AI for your industry?

Want to know more, or stay in touch with our cognitive services research? Follow our blog: blog.mirabeau.nl and if you have specific questions, please get in touch with Edgard Beckand, who leads Mirabeau Labs ebeckand@mirabeau.nl

Tags

Cognitive Services Cloud Innovation