How to interact with cloud image recognition services

Auteur
Hasan Bilen
Datum

Recently my colleagues published two blog posts about cognitive Cloud services. In the first blog post Peet described a conceptual solution to show the power of today’s image recognition services. In our second blog Hayo discussed our first functional findings of the tests we have conducted to show the differences in quality. In this blog post I will take a look at the technical side. Particularly the good to know details how to implement these services in your own application.

cowrecognizing phases iphone 04-1024x576

Communicate with cloud services

Generally speaking, cloud services provide two ways of communication: through a REST API or by use of a SDK. The availability of SDK’s varies a lot. If you are planning on using the REST API directly you should have no issues because API’s are widely supported by all the tested Cloud services. Please find an overview of all cloud service API’s and the supported languages in the table below.

AWS Clarifai Cloud
sight
Google IBM Microsoft
REST API REST API REST API REST API REST API REST API
Android, JavaScript, iOS, Java, .NET, Node.js, PHP, Ruby, Python JavaScript, Python, Java, ObjectiveC Cell C#, Go, Java, Node.js, PHP, Python Node.js, Java, Python Android, C#, Node.js, Swift

AWS and Google provide the most exhaustive set of SDK’s for a wide variety of languages meaning that you can quickly prototype without having to write implementations yourself.

Test application

In order to test various Cloud services I created a test application that fires requests to all the services, written in C#. For AWS it uses the Amazon SDK and for all other Cloud services it uses REST API’s with RestSharp. We have chosen the Amazon SDK over their API because it was slightly easier to use due to request signing.

Image recognition test application diagram v2

Above setup allowed us to abstract the cloud service implementations and switch between various Cloud services without having knowledge about the specific implementation details beforehand. Below you can see a snippet from our LabelDetectionTester:

// The image we are going to use as input
string imageFile = Utils.getFilePath("image.jpg");

// The Cloud providers we are going to test
List<CloudProvider> cloudProviders = new List<CloudProvider>
{
    AzureCloud.getInstance(),
    AmazonCloud.getInstance(),
    ClarifaiCloud.getInstance(),
    CloudsightCloud.getInstance(),
    GoogleCloud.getInstance(),
    IBMCloud.getInstance()
};


foreach (CloudProvider cloudProvider in cloudProviders)
{
    // Classify an image
    ImageClassification imageClassification = cloudProvider.classifyImage(imageFile);

    // Print the cloud provider name and image description if one is present
    Console.WriteLine("Cloud provider: " + cloudProvider.getName());
    if(imageClassification.description != null)
        Console.WriteLine("Description: " + imageClassification.description);

    // Sort the tags by confidence
    imageClassification.tags.Sort((x, y) => y.confidence.CompareTo(x.confidence));

    // Print each tag name and confidence
    foreach (Tag tag in imageClassification.tags)
        Console.WriteLine(tag.ToString(true));
}


Findings

All providers except Cloudsight return a result in the same request-response cycle. Cloudsight requires you to execute a request to start a classification and send one or more additional requests to poll for a result. This requires only a little more effort to implement.

All services allow you to upload an image in the classification request. Some require you to use a Base64 blob in the JSON body while other services require you to add the file in Form data stream format. Another way to classify an image is to provide the image URL or Cloud storage object details. Below you can see an overview of methods one can use to classify an image.

  AWS Clarifai Cloud
sight
Google IBM MSFT
Direct upload Form Base64 Form Base64 Form Form
URL No Yes Yes No Yes Yes
Cloud storage S3 object n.a. n.a. Google cloud storage n.a. n.a.


Some Cloud services tag images with words/phrases while other services tag images with an image description. An example tag is “Frisian cattle” with confidence 0.5 while an example description is “herd of cattle at green grass field during daytime”. Only Microsoft is able to return both while only Cloudsight is unable to return tags and gives a description instead.

  AWS Clarifai Cloud
sight
Google IBM MSFT
Tags Yes Yes No Yes Yes Yes
Descrip-
tion
No No Yes No No Yes


Ease of use

The tested REST API’s are relatively easy to use as they require little code to be written. Below you can find a snippet to analyze an image using the Microsoft Azure Computer Vision API.


public class AnalyzeImageRequest
{
    public static AnalyzeImageResponse getTagsAndDescription(string subscriptionKey, string imageFilePath)
    {
        var client = new RestClient("https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze");
        var request = new RestRequest(Method.POST);
        request.AddQueryParameter("visualFeatures", "Tags,Description");
        request.AddQueryParameter("language", "en");
        request.AddHeader("Ocp-Apim-Subscription-Key", subscriptionKey);
        request.AddFile("image", imageFilePath);
        request.AddHeader("Content-Type", "multipart/form-data");
        IRestResponse response = client.Execute(request);

        return JsonConvert.DeserializeObject<AnalyzeImageResponse>(response.Content);
    }
}

An AmazonSDK.Rekognition example for doing a similar request to Amazon Rekognition looks like:

// Create the request
DetectLabelsRequest request = new DetectLabelsRequest()
{
    Image = new Image()
    {
        Bytes = Utils.getImageMemoryStream(imageFilePath)
    },
    MaxLabels = 10,
    MinConfidence = 0
};

// Create configuration
AmazonRekognitionConfig config = new AmazonRekognitionConfig()
{
    RegionEndpoint = RegionEndpoint.EUWest1,
    ServiceURL = "https://rekognition.eu-west-1.amazonaws.com/"
};

// Fire request
AmazonRekognitionClient client = new AmazonRekognitionClient(amazonAccessKeyId, amazonSecretAccessId, config);
DetectLabelsResponse response = client.DetectLabels(request);

Conclusion

Generally speaking I didn’t have a lot of trouble with implementing various Cloud services. Better and simpler documentation makes implementation much easier. The features Cloudsight offers are limited and request mechanism unconvential. You can only retrieve an image description. Furthermore they don’t provide any SDK. However they definitely deserve a shout out to their simple documentation as it only consists of two endpoints. Therefore you could write your own SDK in no time.

Clarifai, IBM and Google provide easy to find code snippets for their API and SDK’s while Microsoft provides API samples in their documentation and SDK samples are given in GitHub projects.

To make a request for Clarifai you have to authenticate to retrieve an expiring access token. For other Cloud services you can directly make requests with the tokens given in their panels. Amazon requires you to sign requests which takes some extra effort. Their SDK is well written and is available for many languages.

All things considered, the SDK’s provided are useful for quick prototypes and the availability of a SDK for a certain language may be the deciding factor when deciding which Cloud service to use. However, REST API’s are widely supported and implementing the services is a minor investment in the long run.

Hasan Bilen Cloudsight

Cloudsight: "man in grey crewneck t shirt sitting on chair looking at you"


Want to stay posted about AI for your industry?

Want to know more, or stay in touch with our cognitive services research? Follow our blog: blog.mirabeau.nl and if you have specific questions, please get in touch with Edgard Beckand, who leads Mirabeau Labs ebeckand@mirabeau.nl

Tags

Innovation Cognitive Services API