Image recognition with standard cloud services – first findings
We are very excited about working with Cognitive cloud services. Services - based on artificial intelligence - that help you build smart applications like chatbots, authentication by video recognition or speech-controlled interfaces. We will first take a look at the standard services offered by various public cloud providers, like Amazon Web Services, Microsoft, Google and IBM, as well as specific services like Clarifai and Cloudsight. We devote our attention to text, speech and image recognition services. In this post, I'm going to share a number of prior findings - with conclusions to follow at a later date - about automated image recognition that we've come up with for now.
Our first step was focusing on recognizing objects in photos and questioning whether we're ready to make cloud services more intelligent by teaching them new behaviors. To this end, a test application has been developed that has images sequentially analyzed by six different cloud services. We take the results and analyze them, and at the same time we measure how quickly cloud services are returning results. And since each service has its own connection protocols (API), we're also gathering relevant information for our developer teams.
This enables us to put together a ton of useful information that makes it possible for us to make better choices as far as adopting AI cloud services, but also to acquire the necessary implementation experience.
Initial findings
We had the following image analyzed:
This yields the following results:
Microsoft | IBM | AWS | Clarifai | Cloudsight | |
---|---|---|---|---|---|
grass, 0.9999994 |
azure color, 0.974 |
pasture, 0.9630213 |
Animal, 98.73827 |
cow, 0.9997515 |
see note 3 |
cow, 0.9999822 |
mammal, 0.856 |
habitat, 0.9611388 |
Cattle, 98.73827 |
agriculture, 0.998435 |
- |
sky, 0.9977474 |
animal, 0.856 |
dairy cow, 0.9576128 |
Cow, 98.73827 |
milk, 0.9980802 |
- |
field, 0.977616 |
ruminant, 0.762 |
grazing, 0.8915433 |
Dairy Cow, 98.73827 |
cattle, 0.9974641 |
- |
animal, 0.8061956 |
cattle, 0.725 |
cattle like mammal, 0.8588901 |
Mammal, 98.73827 |
beef cattle, 0.996951 |
- |
(8 more) | (5 more) | (8 more) | (5 more) | (13 more) | - |
Note 1: All services except Cloudsight (see Note 3) return a resultset consisting of tags describing the objects found in the image.
Note 2: The number that follows the tag description is the so-called confidence level, the degree to which each service is certain that the object in question has been detected in the picture. According to Clarifai the confidence level is 0.9980802 for seeing some milk, very doubtful to my opinion.
Note 3:Cloudsight doesn’t return any object descriptions, but describes the analyzed image in 1 complete sentence -> “herd of cattle at green grass field during daytime”. Impressive and useful, for example, for generating automated captions for pictures. Also the image recognitizon service of Microsoft is capable of providing full descriptions. In our case Microsoft returned -> “a herd of cattle grazing on a lush green field”.
What's notabe is just how different the returned results can be:
- There's a huge difference in the number of detected objects. We have seen examples ranging from 3 to 20 items found;
- The item descriptions are just as divergent. Here's what we get for the item 'cow': “cow, Friesian cattle, dairy cow, cattle, beef cattle". IBM detects our cow is Friesian cattle (confidence 0,574), very nice!
- There's a big difference in confidence levels in this example. IBM is in general a lot less certain than the others in this specific case. Microsoft and Clarifai are almost 100% certain that there is a cow inside this image.
- No service detected the cowshed in the upper right corner of the picture.
Another example
A 2nd test with the image below provided further insights. For clarity, we've highlighted the most accurate description (bathroom) in green, and the incorrect evaluations in red.
Microsoft | IBM | AWS | Clarifai | Cloudsight | |
---|---|---|---|---|---|
wall, 0.9887767 |
room, 0.935 |
room, 0.9144701 |
Bathroom, 95.45831 |
bathroom, 0.9996419 |
- |
indoor, 0.9882076 |
bathroom, 0.906 |
bathroom, 0.8838118 |
Indoors, 95.45831 |
faucet, 0.9989328 |
- |
bathroom, 0.9870275 |
gray color, 0.88 |
plumbing fixture, 0.6657044 |
Interior Design, 81.83018 |
contemporary, 0.9988343 |
- |
room, 0.76094 |
beige color, 0.802 |
floor, 0.6204514 |
Room, 81.83018 |
bathtub, 0.9980046 |
- |
toilet, 0.7345799 |
booth, 0.577 |
interior design, 0.5699441 |
Apartment, 55.53356 |
washcloset, 0.9974676 |
- |
tile, 0.3172223 |
closet, 0.577 |
apartment, 0.5122291 |
Housing, 55.53356 |
shower, 0.9965346 |
- |
tiled, 0.2736091 |
shower stall, 0.576 |
- | Furniture, 51.09834 |
family, 0.9932662 |
- |
- | - | - | - | - | - |
a bathroom with a sink and a mirror |
- | - | - | - | white front load washer and dryer set near vanity sink and shower room |
What is striking, is the good description that Cloudsight gives of the test photo. Impressive! Although, there are some interesting discussions about the deep learning capabilities of Cloudsight.
To reiterate. This first test gives us already a lot of insight into the possibilities of the standard cognitive cloud services. As for image analysis, we are now exploring self-learning opportunities. Or in other words, how can we train a service in order to further improve the accuracy of the analyses? Ultimately, your goal is optimum accuracy to offer the best user experience of your service.
Want to stay posted about AI for your industry?
Want to know more, or stay in touch with our cognitive services research? Follow our blog: blog.mirabeau.nl and if you have specific questions, please get in touch with Edgard Beckand, who leads Mirabeau Labs ebeckand@mirabeau.nl