Retrieving insights from video content

Cloud cognitive services are supporting us to create many great solutions. For example, in this blogpost we discussed our face recognition demo and how it makes use of these services. Equipped with this experience, we decided to further delve in the field of image and video analysis using the same AI-powered cloud cognitive services. Together with Ster, we applied the vision and video APIs of Microsoft Azure to find out what interesting data we can gather from commercials.

Why automated content analysis?

Does the commercial contain any dogs or cats? Are there any celebrities acting in the commercial? What emotions do the actors express? And many more questions can be asked that are interesting for content analysis purposes. Until now, answering these questions was a manual process. This is doable for a couple of commercials but if your dataset contains thousands of videos this approach quickly becomes impractical. With the advancements in image and video analysis, we saw the opportunity to automate this process. And supported by machine learning algorithms, patterns can be found in this data that provide insights to what features in a commercial contribute to its success. Is it that displaying dogs in a commercial leads to a higher conversion rate of the commercial? This is what data analysts or marketing professionals would like to find out.

Proof of Concept

Together with Ster we built a Proof of Concept to explore the possibilities of automated content analysis. We were used three well-known commercials (available on Youtube) and three sheets of manually obtained data from these commercials. For each video the data was ordered by frame (sampled to 1 frame per second). For each frame the dataset contained the objects that are present, if there is a person, what the emotion is of that person and what the displayed text is. Our first goal was to see if we could gather the same data in an automated fashion. To do this we explored the possibilities of the Computer Vision API and the Face API from Microsoft Azure.

ikea-commercial content analysis Sample frame retrieved from IKEA commercial

heineken-commercial content analysis Sample frame retrieved from Heineken commercial

The results

The good news is that when using services from Microsoft Azure, we were very successful in obtaining similar results as the manual analysis. However, the APIs are not limited to these features. We decided to extract even more data! Features like objects, a description of the video frame, if there is a person, the gender, age and emotion of that person, the dominant color, the displayed text and if the frame contains adult content. We also looked at the Video Indexer of Microsoft Azure and were able to extract data like when a new scene begins, when motion is detected and what the spoken lines are in the video, indexed by speaker. These are all great features and the results emphasize the potential these AI-powered cloud services have. And another great characteristic of these APIs is that they improve over time. This means that our results improve over time without any interference, which of course is a great benefit using cloud services.  

Items IKEA Heineken
Objects Indoor, person, table, sitting, chair Person, building, woman, standing, man
Description A group of people sitting at a table. A man and a woman standing in front of a building.
Gender Female Male
Age 38.1 52
Emotion Neutral Happiness
Color Grey Black

Next steps

Since our tool is just a Proof of Concept there is room for improvement. First of all, the data should become more accurate by combining the results of multiple APIs. Secondly, we would like to add different features such as logo detection to answer questions like: For how long is a specific brand logo displayed in the commercial? The answer to this question and to other similar questions can provide valuable insights. To conclude, we are very excited about the possibilities of cloud video analysis services and hope to continue powering our tools with cloud cognitive services.