Image Processing with Microsoft Cognitive Services API and Azure DocumentDB

Cognitive-Services_Computer-Vision-API_01

At build 2016, Microsoft rebranded Project Oxford and introduced it as Microsoft Cognitive Services. In total, there are 21 APIs under 5 categories available in Cognitive Service now.

Computer Vision API is one of the APIs under Vision category which is used to bring image processing into your application. This API extracts and returns rich information about visual contents found in an image.

In this article, we will learn how to utilize Computer Vision API and store the serialized result into Azure DocumentDB. Schema free databases suit perfect for this scenario because we can easily dump the data and store it as a document and also change the data structure and what we expect from the API any time.

Prerequisites:

Cognitive API subscription
Azure subscription and Azure DocumentDB Account
Visual Studio 2015 Community

Solution Structure:

Just to simplify the example, our solution will be composed of two projects. One is called Infrastructure and contains data access logic and services and the other one is an ASP.NET Core project which is called Web and we will build our API on top of it. This solution simplified by purpose but it is expected to separate data access logic from domain services into different layers in your real world application (Hopefully!).

Implementation :

The first step is to store URLs and subscription keys in a config file (rather than hard coding values in the code). For this example, we are going to create two properties called DocumentDB and CognitiveServices in API project’s appsettings.json .

{
  "Logging": {
    "IncludeScopes": false,
    "LogLevel": {
      "Default": "Debug",
      "System": "Information",
      "Microsoft": "Information"
    }
  },
  "DocumentDB": {
    "Database": "[PUT_YOUR_DATABASE_NAME_HERE(Eg. CognitiveDB)]",
    "Collection": "[PUT_YOUR_COLLECTION_NAME_HERE(Eg. Images)]",
    "Endpoint": "[PUT_YOUR_DOCUMENTDB_ACCOUNT_URI_HERE(Eg. https://xxx.documents.azure.com:443/)]",
    "AuthKey": "[PUT_YOUR_DOCUMENTDB_KEY_HERE]"
  },
  "CognitiveService": {
    "ComputerVision": {
      "Url": "https://api.projectoxford.ai/vision/v1.0/analyze?",
      "SubscriptionKey": "[PUT_YOUR_SUBSCRIPTION_KEY_HERE]",
      "ContentType": "application/json"
    }
  }
}

Currently, appsettings.json is still alive in ASP.NET Core, but it seems that ASP.NET team is going to kill this file soon. This file may by replaced by web.config or another type of config file later.

For DocumentDB property, we need DocumentDB database name, collection name, endpoint URL (DocumentDB account endpoint) and key. You can find all these information in DocumentDB account blade.

Cognitive-Services_Computer-Vision-API_02

For Cognitive Service property, we need Service URL (as you can see it points to oxford project at the moment but you always can get the latest address from official documentations), Computer Vision API subscription key and content type. for this example we specify JSON as the content type.

In this example, we are going to implement a simplified repository to work with database.  The main purpose is to show data access layer must be segregated through interfaces and also how to implement a repository pattern to work with DocumentDB. Therefore, we need to define repository contract first:

public interface IImageRepository
{
    Task<Document> CreateAsync(Image image);
}

As you can see it has been called IImageRepository and in the next step we are going to implement the repository concrete class which implements IImageRepository:

public class ImageRepository : IImageRepository
{
    private string endpoint;
    private string authKey;
    private string databaseId;
    private string collectionId;
 
    private DocumentClient client;
    private Database database;
    private DocumentCollection collection;
 
    public ImageRepository(string endpoint, string authKey, string databaseId, string collectionId)
    {
        this.endpoint = endpoint;
        this.authKey = authKey;
        this.databaseId = databaseId;
        this.collectionId = collectionId;
    }
 
    /// <summary>
    /// Represent Azure DocumentDB client service
    /// </summary>
    public DocumentClient Client
    {
        get
        {
            if (client == null)
            {
                Uri endpointUri = new Uri(this.endpoint);
                client = new DocumentClient(endpointUri, this.authKey);
            }
 
            return client;
        }
    }
 
    /// <summary>
    /// Represent context Database
    /// </summary>
    public Database Database
    {
        get
        {
            if (database == null)
            {
                database = ReadOrCreateDatabase();
            }
 
            return database;
        }
    }
 
    /// <summary>
    /// Represent context collection
    /// </summary>
    public DocumentCollection Collection
    {
        get
        {
            if (collection == null)
            {
                collection = ReadOrCreateCollection(Database.SelfLink);
            }
 
            return collection;
        }
    }
 
    /// <summary>
    /// Read or Create context database Id
    /// </summary>
    /// <returns></returns>
    private Database ReadOrCreateDatabase()
    {
        var database = this.Client.CreateDatabaseQuery()
                        .Where(d => d.Id == this.databaseId)
                        .AsEnumerable()
                        .FirstOrDefault();
 
        if (database == null)
        {
            database = this.Client.CreateDatabaseAsync(new Database { Id = this.databaseId }).Result;
        }
 
        return database;
    }
 
    /// <summary>
    /// Read or Create given collection Id
    /// </summary>
    /// <param name="databaseLink">Database self-link</param>
    /// <returns></returns>
    private DocumentCollection ReadOrCreateCollection(string databaseLink)
    {
        var collection = this.Client.CreateDocumentCollectionQuery(databaseLink)
                          .Where(c => c.Id == this.collectionId)
                          .AsEnumerable()
                          .FirstOrDefault();
 
        if (collection == null)
        {
            collection = this.Client.CreateDocumentCollectionAsync(databaseLink, new DocumentCollection { Id = this.collectionId }).Result;
        }
 
        return collection;
    }
 
    /// <summary>
    /// Creates Document in context collection
    /// </summary>
    /// <param name="image">Image Document</param>
    /// <returns></returns>
    public async Task<Document> CreateAsync(Image image)
    {
        if (string.IsNullOrEmpty(image.id))
        {
            image.id = GenerateImageId();
        }
 
        return await this.Client.CreateDocumentAsync(this.Collection.SelfLink, image);
    }
 
    /// <summary>
    /// Generates unique identifier for the document
    /// </summary>
    /// <returns>Unique string Identifier</returns>
    private string GenerateImageId()
    {
        return Guid.NewGuid().ToString();
    }
}

In real world application it is expected to have base repository of type <t> and all repositories must implement base repository. In Line 30 is to create DocumentDB client which is essential to work with DocumentDB. Line 46 calls a method to read or create (if it Does not exist) database. Line 62 follows the same logic to read or create collection. In line 129, a unique string ID for the document is generated. You need a better approach to generate unique document Id for real world application.

As you can see in repository definition and implementation, we are going to store an object of type Image into the database. Basically, Image is a POCO which represent the document structure and it is populated by API. Below is the definition of  Image, Tag and Metadata classes:

public class Image
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }
 
    [JsonProperty(PropertyName = "tags")]
    public List<Tag> Tags { get; set; }
 
    [JsonProperty(PropertyName = "metadata")]
    public Metadata Metadata { get; set; }
}
 
public class Tag
{
    [JsonProperty(PropertyName = "name")]
    public string Name { get; set; }
 
    [JsonProperty(PropertyName = "confidence")]
    public decimal Confidence { get; set; }
 
    [JsonProperty(PropertyName = "hint")]
    public string Hint { get; set; }
}
 
public class Metadata
{
    [JsonProperty(PropertyName = "width")]
    public int Width { get; set; }
 
    [JsonProperty(PropertyName = "height")]
    public int Height { get; set; }
 
    [JsonProperty(PropertyName = "format")]
    public string Format { get; set; }
}

As you can see, all properties annotated with newtonsoft’s JsonProperty attribute to specify how data is going to be serialized. Id (line 3) is a special property in DocumentDB and it is expected to be a unique string as mentioned before.

Next, we need to define our service contract to process images. Below is service definition which has only one method to process the image.

public interface ICognitiveService
{
    Task<Image> ProcessImage(string imageUrl);
}

Now it’s time to implement the ProcessImage method in the concrete service class. It’s again simplified to make it easier to understand.

public class CognitiveService : ICognitiveService
{
    private string uri;
    private string subscriptionKey;
    private string contentType;
 
    public CognitiveService(string url, string subscriptionKey, string contentType)
    {
        this.uri = ($"{url}visualFeatures=Tags");
        this.subscriptionKey = subscriptionKey;
        this.contentType = contentType;
    }
 
    public async Task<Image> ProcessImage(string imageUrl)
    {
        // Instantiate a HTTP Client
        var client = new HttpClient();
 
        // Pass subscription key thru the HTTP Request Header
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
 
        // Format Request body
        byte[] byteData = Encoding.UTF8.GetBytes($"{{\"url\": \"{imageUrl}\"}}");
 
        using (var content = new ByteArrayContent(byteData))
        {
            // Specify Request body Content-Type
            content.Headers.ContentType = new MediaTypeHeaderValue(contentType);
 
            // Send Post Request
            HttpResponseMessage response = await client.PostAsync(uri, content);
 
            // Read Response body into the image model
            return await response.Content.ReadAsAsync<Image>();
        }
 
    }
}

As you can see in line 9, we only specified Tags to be returned from the API. You can extend this if you want to get other set of information about the image (e.g. categories instead of tags). You also can pass multiple comma-separated values to the API. In line 20, Computer Vision API subscription key is passed and URL of the image is passed through request body in line 23. In line 34, API response is deserialized into an Image object.

Now at API project, we need to register our repository and service to be injected into the pipeline. In this example, we will use ASP.NET Core built-in DI; Therefore, we need to register our repository and service in ConfigureServices method of Startup.cs.

services.AddSingleton<IImageRepository, ImageRepository>(s =>
 {
     string databaseId = Configuration["DocumentDB:Database"];
     string collectionId = Configuration["DocumentDB:Collection"];
     string endpoint = Configuration["DocumentDB:Endpoint"];
     string authKey = Configuration["DocumentDB:AuthKey"];
 
     return new ImageRepository(endpoint, authKey, databaseId, collectionId);
 });
 
 services.AddScoped<ICognitiveService, CognitiveService>(s =>
{
    string url = Configuration["CognitiveService:ComputerVision:Url"];
    string subscriptionKey = Configuration["CognitiveService:ComputerVision:SubscriptionKey"];
    string contentType = Configuration["CognitiveService:ComputerVision:ContentType"];
 
    return new CognitiveService(url, subscriptionKey, contentType);
});

As the last step, we only need to create an method inside an API controller to orchestrate the API workflow. For this example, we will call it ProcessImage. In real world it is expected to have better validation and also exception management which is not implemented here for the sake of simplicity.

[HttpPost]
public async Task ProcessImage([FromBody]ProcessImagePayload payload)
{
    if(ModelState.IsValid)
    {
        var image = await _cognitiveService.ProcessImage(payload.Url);
        if (image != null)
        {
            await _imageRepository.CreateAsync(image);
        }
        else
        {
            Response.StatusCode = (int)HttpStatusCode.BadRequest;
        }
    }
    else
    {
        Response.StatusCode = (int)HttpStatusCode.BadRequest;
    }
}

Both repository and cognitive service are injected to the controller through controller constructor.

Below illustrates the solution structure:

Cognitive-Services_Computer-Vision-API_03

Now if you call the API and send an image URL through API payload, you will get the extracted information from the image in the form of a document in DocumentDB.

Cognitive-Services_Computer-Vision-API_04

 

You can find this example source code on github.

Estimating Azure DocumentDB Throughput Needs

Estimating Azure DocumentDB throughput needs

When you want to create a collection in your Azure DocumentDB database account, you have to estimate and specify the throughput size and Azure DocumentDB reserves resources to satisfy your application throughput’s needs. You have to pay for the reserved resources allocated to your collection regardless of usage and the is why it’s important to estimate the throughput size correctly to reduce the operation costs.

What is the throughput measure unit?
Request Unit (RU) per second is the unit of throughput measurement. Azure reserves specified amount of RU/S as your collection throughput. A single request unit represents the processing capacity required to read a single 1KB document. Depends on your document, other requests such as create, update and delete consume more request units.

Specifying throughput size
You have to specify throughput size when you want to create a collection

How to measure throughput size ?
As a first step it’s always good to start with default throughout size. and then start monitoring and measuring consumed request units for common operations and adjust the throughput size. When you query against a collection, Azure returns request charge value in portal or through x-ms-request-charge response header in code. Therefore, you can get some ideas about cost of  your queries.

Azure portal returns request charge
You can see the request charge in Azure portal

Many factors are involved in request unit measurement. Things like number of document properties, indexes, document size and data consistency. Therefore, RU cost differs from application to another application. When you have an idea about your application queries costs and estimated number of requests per second you can estimate how much throughput you need to satisfy your application needs.

Set Alarms
One of the best methods to monitor your service performance and consumed RUs is to set alarms. You can define as many types of alarm as you want to make sure reserved throughput is enough but not more than enough. As a case in point, you can monitor throttled requests number to make sure enough resources has been allocated or check consumed request units to make sure your are not allocating more than enough. If you are keep getting notifications about not consuming expected RUs then it’s time to scale down.

Setting alarms
Administrator is being notified when consumed RUs is less than expected

Brief overview of Azure DocumentDB Document Properties

When you create a document in Azure DocumentDB, regardless of properties in the document, Azure creates and populates some default properties behind the scene for the document. Here is a brief overview of document properties:

_RID: Indicates unique resource identifier. Each document in Azure DocumentDB must have a unique string identifier (across the collection) which can be changed over the time and is specified in the document; However, _RID is generated by Azure and cannot be changed over the time.

_TS: Indicates the last time when document gets modified. Azure updates the value when you modify the document.  This can be very useful for  your application to get latest changes and updates after specific date.

_SELF: Unique addressable URI of the resource. Here is an example of the _SELF link:
dbs/DppTAA==/colls/DppTAvdAA=/docs/DppTAMYvdAABAAAAAAA==/
As you can see it starts with dbs and database Resource ID and then Colls and the collection Resource ID and finally docs and the document _RID identifier.

_ETAG: _ETAGs are used by Azure to manage optimistic concurrency and avoid users override each others’ changes. Stas Kondratyev posted a comprehensive article about _ETAGS here.

_ATTACHMENTS: Indicates path to the document’s attachment.

How to get Document Properties ?

To get the document properties, you need to select the document you want from the collection and then click on Properties icon on Document blade and you can get all the properties in Properties blade.

Azure_DocumentDB-File-Properties_01