Getting Started with Azure Content Moderator

What is the Azure Content Moderator and How do I use it?

Image by Brett Jordan

Azure Content Moderator is an Azure Service that uses machine learning to help with moderation for different forms of content such as text, video, and images. The service provides both a Moderation API and Review API, it’s something interesting to learn about as this service is needed to pass the AI-100 Exam. In this post I will go over what you can do in the content moderator and how to implement it using the Python library.

Text Moderation

Text moderation is used for many different types of applications, some examples include Chatbots, Discussion Forums and Document Verification. The Azure Text Moderation API allows for text to be supplied and it will return a list of potentially harmful words, the type of harmful words, and if any personally identifiable information is in the text.

Creating a Content Moderator in Azure

First you will need to open the Azure Portal (portal.azure.com), once it has loaded click on the search bar at the top and type in “Content Moderator” and click on the Marketplace Item. You will then need to select your subscription, resource group, region, pricing tier (choose Free F0 for testing), and enter a name (this name must be unique). Next hit Review and Create and if it passes validation hit create and the content moderator will deploy which should complete within a few seconds to minutes.

Now that it’s deployed you can check out the following page to test the API. You will need to click on your region of choice (Australia East in my case) and then scroll down to the field Ocp-Apim-Subscription-Key and paste your key from your Azure Content Moderator Service (Available under the “Keys and Endpoints” tab). Hitting send now should give you a response like below that has three major sections which are PII (Personally identifiable information), Classification, and Terms, we will go over each section in more detail below.

Example of an API Response

Pragma: no-cache
apim-request-id: 53ee7e6e-fc5a-4c78-9170-26507c5a6***
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
CSP-Billing-Usage: CognitiveServices.ContentModerator.Transaction=1
Cache-Control: no-cache
Date: Fri, 16 Oct 2020 00:53:21 GMT
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Content-Length: 1085
Content-Type: application/json; charset=utf-8
Expires: -1

{
  "OriginalText": "Is this a cr*p a** email abcdef@abcd.com, phone: 6657789887, IP: 255.255.255.255, 1 Microsoft Way, Redmond, WA 98052",
  "NormalizedText": "   cr*p a** email abcdef@abcd.com, phone: 6657789887, IP: 255.255.255.255, 1 Microsoft Way, Redmond, WA 98052",
  "Misrepresentation": null,
  "PII": {
    "Email": [{
      "Detected": "abcdef@abcd.com",
      "SubType": "Regular",
      "Text": "abcdef@abcd.com",
      "Index": 25
    }],
    "IPA": [{
      "SubType": "IPV4",
      "Text": "255.255.255.255",
      "Index": 65
    }],
    "Phone": [{
      "CountryCode": "US",
      "Text": "6657789887",
      "Index": 49
    }],
    "Address": [{
      "Text": "1 Microsoft Way, Redmond, WA 98052",
      "Index": 82
    }],
    "SSN": []
  },
  "Classification": {
    "ReviewRecommended": true,
    "Category1": {
      "Score": 0.0011853053001686931
    },
    "Category2": {
      "Score": 0.49122610688209534
    },
    "Category3": {
      "Score": 0.98799997568130493
    }
  },
  "Language": "eng",
  "Terms": [{
    "Index": 3,
    "OriginalIndex": 10,
    "ListId": 0,
    "Term": "cr*p"
  }, {
    "Index": 8,
    "OriginalIndex": 15,
    "ListId": 0,
    "Term": "a**"
  }],
  "Status": {
    "Code": 3000,
    "Description": "OK",
    "Exception": null
  },
  "TrackingId": "AUE_ibiza_677e548d-ccf1-475f-b0ed-3a1fa1fd2137_ContentModerator.F0_78430acc-0452-45c9-bf7f-1e45b5651***"
}

Personally identifiable information

The Content Moderation API can detect when personal information is present in the text sent to the API, some of the fields it can detect are email addresses, mailing addresses, IP addresses, phone numbers and social security numbers. When one of these fields are detected the Content Moderation service returns a new PII object in the JSON response like below. The object contains what type of field it is and the location of the field to censor or remove it in the text.

"PII": {
    "Email": [{
      "Detected": "abcdef@abcd.com",
      "SubType": "Regular",
      "Text": "abcdef@abcd.com",
      "Index": 25
    }],
    "IPA": [{
      "SubType": "IPV4",
      "Text": "255.255.255.255",
      "Index": 65
    }],
    "Phone": [{
      "CountryCode": "US",
      "Text": "6657789887",
      "Index": 49
    }],
    "Address": [{
      "Text": "1 Microsoft Way, Redmond, WA 98052",
      "Index": 82
    }],
    "SSN": []
}

Classification

In the JSON response returned by the Content Moderation API there is a classification field which recommends if the text should be reviewed and then scores it on each of the following categories.

  • Category 1: Potential presence of language that might be considered sexually explicit or adult in certain situations.
  • Category 2: Potential presence of language that might be considered sexually suggestive or mature in certain situations.
  • Category 3: Potential presence of language that might be considered offensive in certain situations.
"Classification": {
    "ReviewRecommended": true,
    "Category1": {
      "Score": 0.0011853053001686931
    },
    "Category2": {
      "Score": 0.49122610688209534
    },
    "Category3": {
      "Score": 0.98799997568130493
    }
}

Terms

When passing text to the Content Moderation API, any word that potentially contains profanity is returned as a Term object in the JSON response. The term object contains an Index value to locate the text, the term detected, and the ID of the list being used. This will not be zero if you are using a custom list of words.

"Terms": [{
    "Index": 3,
    "OriginalIndex": 10,
    "ListId": 0,
    "Term": "cr*p"
  }, {
    "Index": 8,
    "OriginalIndex": 15,
    "ListId": 0,
    "Term": "a**"
  }]

Image Moderation

Image Moderation in Azure Content Moderator works similarly to Text Moderation, but it’s used instead to analyse images for adult and racy content, detect text with Optical Character Recognition and Detect Faces. In this post I will only go over detecting adult and racy content within images, but you can play around with the other uses from within the Image Moderation API Console.

Using the Image Moderation API Console

The easiest way to test the Image Moderation functions of the Content Moderator is to use the API Console, you can access the console by using the following link. In the console you will need to select your region then enter your subscription key from the Azure portal in the Ocp-Apim-Subscription-Key field. Once that’s done you can change the request body from the default value to the image of your choice and then hit send and you should get a JSON response like the one below returned.

Example API Response

Pragma: no-cache
apim-request-id: 576338ac-132e-4e22-89f8-df8056ce05af
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
CSP-Billing-Usage: CognitiveServices.ContentModerator.Transaction=1
Cache-Control: no-cache
Date: Fri, 16 Oct 2020 04:58:20 GMT
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Content-Length: 458
Content-Type: application/json; charset=utf-8
Expires: -1
{
  "AdultClassificationScore": 0.021854337304830551,
  "IsImageAdultClassified": false,
  "RacyClassificationScore": 0.045791342854499817,
  "IsImageRacyClassified": false,
  "Result": false,
  "AdvancedInfo": [{
    "Key": "ImageDownloadTimeInMs",
    "Value": "1586"
  }, {
    "Key": "ImageSizeInBytes",
    "Value": "273405"
  }],
  "Status": {
    "Code": 3000,
    "Description": "OK",
    "Exception": null
  },
  "TrackingId": "AUE_ibiza_677e548d-ccf1-475f-b0ed-3a1fa1fd2137_ContentModerator.F0_2fbb5e6f-c7ab-4143-a16e-cb18c7511621"
}

When looking at the Example API response we can see that the main fields are the following.

  • AdultClassificationScore - This value ranges from 0-1 and represents if there could potentially be any content in the image that may be considered adult or explicit in certain scenarios.
  • IsImageAdultClassified - This uses internal thresholds on the AdultClassificationScore to convert the result into a Boolean value
  • RacyClassificationScore - This value ranges from 0-1 and represents if there is any racy content contained within the image.
  • IsImageRacyClassified - This uses internal thresholds on the RacyClassificationScore to convert the result into a Boolean value

The Image Moderation API can also detect text with OCR, Detect Faces and Create custom list, I may go through these functionalities in a future post.

Using the API in a Python Application

If you want to follow along check out this Google Colab workbook I’ve created that guides you through the python code required to use the Content Moderator API.

The code snippet below uses the Content Moderator Python API to analyse a text file with the Content Moderation API, once that’s complete it prints the response to the console. When using the snippet you will need to create a text file named azure_cm_text.txt in the ./text_files/ folder then replace the CONTENT_MODERATOR_ENDPOINT and CONTENT_MODERATOR_SUBSCRIPTION_KEY variables with the corresponding values from your Azure Content Moderator (Available under the “Keys and Endpoints” tab).

import os.path
from pprint import pprint
import time
from io import BytesIO
from random import random
import uuid

%pip install azure-cognitiveservices-vision-contentmoderator

from azure.cognitiveservices.vision.contentmoderator import ContentModeratorClient
import azure.cognitiveservices.vision.contentmoderator.models
from msrest.authentication import CognitiveServicesCredentials

CONTENT_MODERATOR_ENDPOINT = "CONTENT_MODERATOR_ENDPOINT"
CONTENT_MODERATOR_SUBSCRIPTION_KEY = "CONTENT_MODERATOR_SUBSCRIPTION_KEY"

client = ContentModeratorClient(
    endpoint=CONTENT_MODERATOR_ENDPOINT,
    credentials=CognitiveServicesCredentials(CONTENT_MODERATOR_SUBSCRIPTION_KEY)
)

with open(os.path.join('./text_files/', 'azure_cm_text.txt'), "rb") as text_fd:
    screen = client.text_moderation.screen_text(
        text_content_type="text/plain",
        text_content=text_fd,
        language="eng",
        autocorrect=True,
        pii=True
    )
    pprint(screen.as_dict())

For more details on the Python API and the C# and Java API check out this Microsoft documentation page

Overall, this post has shown that the Azure Content Moderation Service API is extremely powerful and is a great solution for content moderation, the main concepts we went through in this post are Text Moderation and Image Moderation. In a future post I will go through the Video Moderation and create a project with the Video Moderation API.

If you enjoyed this post or found it helpful feel free to leave a comment below or react to the post.