How to interact with cloud image recognition services
Recently my colleagues published two blog posts about cognitive Cloud services. In the first blog post Peet described a conceptual solution to show the power of today’s image recognition services. In our second blog Hayo discussed our first functional findings of the tests we have conducted to show the differences in quality. In this blog post I will take a look at the technical side. Particularly the good to know details how to implement these services in your own application.
Communicate with cloud services
Generally speaking, cloud services provide two ways of communication: through a REST API or by use of a SDK. The availability of SDK’s varies a lot. If you are planning on using the REST API directly you should have no issues because API’s are widely supported by all the tested Cloud services. Please find an overview of all cloud service API’s and the supported languages in the table below.
AWS | Clarifai | Cloud sight |
IBM | Microsoft | |
---|---|---|---|---|---|
REST API | REST API | REST API | REST API | REST API | REST API |
Android, JavaScript, iOS, Java, .NET, Node.js, PHP, Ruby, Python | JavaScript, Python, Java, ObjectiveC | Cell | C#, Go, Java, Node.js, PHP, Python | Node.js, Java, Python | Android, C#, Node.js, Swift |
AWS and Google provide the most exhaustive set of SDK’s for a wide variety of languages meaning that you can quickly prototype without having to write implementations yourself.
Test application
In order to test various Cloud services I created a test application that fires requests to all the services, written in C#. For AWS it uses the Amazon SDK and for all other Cloud services it uses REST API’s with RestSharp. We have chosen the Amazon SDK over their API because it was slightly easier to use due to request signing.
Above setup allowed us to abstract the cloud service implementations and switch between various Cloud services without having knowledge about the specific implementation details beforehand. Below you can see a snippet from our LabelDetectionTester:
// The image we are going to use as input
string imageFile = Utils.getFilePath("image.jpg");
// The Cloud providers we are going to test
List<CloudProvider> cloudProviders = new List<CloudProvider>
{
AzureCloud.getInstance(),
AmazonCloud.getInstance(),
ClarifaiCloud.getInstance(),
CloudsightCloud.getInstance(),
GoogleCloud.getInstance(),
IBMCloud.getInstance()
};
foreach (CloudProvider cloudProvider in cloudProviders)
{
// Classify an image
ImageClassification imageClassification = cloudProvider.classifyImage(imageFile);
// Print the cloud provider name and image description if one is present
Console.WriteLine("Cloud provider: " + cloudProvider.getName());
if(imageClassification.description != null)
Console.WriteLine("Description: " + imageClassification.description);
// Sort the tags by confidence
imageClassification.tags.Sort((x, y) => y.confidence.CompareTo(x.confidence));
// Print each tag name and confidence
foreach (Tag tag in imageClassification.tags)
Console.WriteLine(tag.ToString(true));
}
Findings
All providers except Cloudsight return a result in the same request-response cycle. Cloudsight requires you to execute a request to start a classification and send one or more additional requests to poll for a result. This requires only a little more effort to implement.
All services allow you to upload an image in the classification request. Some require you to use a Base64 blob in the JSON body while other services require you to add the file in Form data stream format. Another way to classify an image is to provide the image URL or Cloud storage object details. Below you can see an overview of methods one can use to classify an image.
AWS | Clarifai | Cloud sight |
IBM | MSFT | ||
---|---|---|---|---|---|---|
Direct upload | Form | Base64 | Form | Base64 | Form | Form |
URL | No | Yes | Yes | No | Yes | Yes |
Cloud storage | S3 object | n.a. | n.a. | Google cloud storage | n.a. | n.a. |
Some Cloud services tag images with words/phrases while other services tag images with an image description. An example tag is “Frisian cattle” with confidence 0.5 while an example description is “herd of cattle at green grass field during daytime”. Only Microsoft is able to return both while only Cloudsight is unable to return tags and gives a description instead.
AWS | Clarifai | Cloud sight |
IBM | MSFT | ||
---|---|---|---|---|---|---|
Tags | Yes | Yes | No | Yes | Yes | Yes |
Descrip- tion |
No | No | Yes | No | No | Yes |
Ease of use
The tested REST API’s are relatively easy to use as they require little code to be written. Below you can find a snippet to analyze an image using the Microsoft Azure Computer Vision API.
public class AnalyzeImageRequest
{
public static AnalyzeImageResponse getTagsAndDescription(string subscriptionKey, string imageFilePath)
{
var client = new RestClient("https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze");
var request = new RestRequest(Method.POST);
request.AddQueryParameter("visualFeatures", "Tags,Description");
request.AddQueryParameter("language", "en");
request.AddHeader("Ocp-Apim-Subscription-Key", subscriptionKey);
request.AddFile("image", imageFilePath);
request.AddHeader("Content-Type", "multipart/form-data");
IRestResponse response = client.Execute(request);
return JsonConvert.DeserializeObject<AnalyzeImageResponse>(response.Content);
}
}
An AmazonSDK.Rekognition example for doing a similar request to Amazon Rekognition looks like:
// Create the request
DetectLabelsRequest request = new DetectLabelsRequest()
{
Image = new Image()
{
Bytes = Utils.getImageMemoryStream(imageFilePath)
},
MaxLabels = 10,
MinConfidence = 0
};
// Create configuration
AmazonRekognitionConfig config = new AmazonRekognitionConfig()
{
RegionEndpoint = RegionEndpoint.EUWest1,
ServiceURL = "https://rekognition.eu-west-1.amazonaws.com/"
};
// Fire request
AmazonRekognitionClient client = new AmazonRekognitionClient(amazonAccessKeyId, amazonSecretAccessId, config);
DetectLabelsResponse response = client.DetectLabels(request);
Conclusion
Generally speaking I didn’t have a lot of trouble with implementing various Cloud services. Better and simpler documentation makes implementation much easier. The features Cloudsight offers are limited and request mechanism unconvential. You can only retrieve an image description. Furthermore they don’t provide any SDK. However they definitely deserve a shout out to their simple documentation as it only consists of two endpoints. Therefore you could write your own SDK in no time.
Clarifai, IBM and Google provide easy to find code snippets for their API and SDK’s while Microsoft provides API samples in their documentation and SDK samples are given in GitHub projects.
To make a request for Clarifai you have to authenticate to retrieve an expiring access token. For other Cloud services you can directly make requests with the tokens given in their panels. Amazon requires you to sign requests which takes some extra effort. Their SDK is well written and is available for many languages.
All things considered, the SDK’s provided are useful for quick prototypes and the availability of a SDK for a certain language may be the deciding factor when deciding which Cloud service to use. However, REST API’s are widely supported and implementing the services is a minor investment in the long run.
Cloudsight: "man in grey crewneck t shirt sitting on chair looking at you"
Want to stay posted about AI for your industry?
Want to know more, or stay in touch with our cognitive services research? Follow our blog: blog.mirabeau.nl and if you have specific questions, please get in touch with Edgard Beckand, who leads Mirabeau Labs ebeckand@mirabeau.nl