TEXTGAIN TEXT ANALYTICS API (1.2.0)

Download OpenAPI specification:Download

The TEXTGAIN API provides a set of URLs to which you can send GET and POST requests. Texts to be analyzed are sent through the q parameter. You also need to send your personal key with each request. The server responds with a JSON string, a standardized, compact data format.

The API description lists the specifications for the GET requests. For alphabetic languages, this request method should suffice. For some languages, the URL might become too large to handle (resulting in a HTTP error 414 error). In these cases you can send your requests through POST in JSON, using the same parameters as the GET requests. The response will be identical.

Administration

Check your available credits

This endpoint shows the credits you spent in the last 24 hours (USED) and your total available credits, according to your subscription plan (ALLOWANCE). Calling this endpoint doesn't spend any credits.

query Parameters
key
required
string

Your unique API key

Responses

Response samples

Content type
application/json
{
  • "used": 101,
  • "key": "example-key",
  • "allowance": 10000
}

Identification

language identification

Language identification detects the language a text is written in. Different languages use different characters. For example, Russian (Кирилица), Chinese (汉字) and Arabic (العربية) are easy to distinguish. Languages that use the same characters (e.g., Latin alphabet, abc) often have cues that set them apart (e.g., é ↔ ë).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

include
Array of strings (Language)
Items Enum: "af" "am" "an" "ar" "as" "ast" "av" "az" "ba" "bar" "bcl" "be" "bg" "bn" "bo" "bpy" "br" "bs" "bxr" "ca" "cbk" "ce" "ceb" "co" "cs" "cv" "cy" "da" "de" "diq" "dv" "el" "en" "eo" "es" "et" "eu" "fa" "fi" "fr" "fy" "ga" "gd" "gl" "gn" "gom" "gu" "gv" "he" "hi" "hif" "hr" "hsb" "ht" "hu" "hy" "ia" "id" "ie" "ilo" "io" "is" "it" "ja" "jbo" "jv" "ka" "kk" "km" "kn" "ko" "krc" "ku" "kv" "kw" "ky" "la" "lb" "lez" "lmo" "lo" "lrc" "lt" "lv" "mai" "mg" "mhr" "min" "mk" "ml" "mn" "mr" "mrj" "ms" "mt" "mwl" "my" "myv" "mzn" "nap" "nds" "ne" "new" "nl" "nn" "oc" "or" "os" "pa" "pam" "pfl" "pl" "pms" "pnb" "ps" "pt" "qu" "rm" "ro" "ru" "rue" "sa" "sah" "sc" "scn" "sco" "sd" "sh" "si" "sk" "sl" "so" "sq" "sr" "su" "sv" "sw" "ta" "te" "tg" "th" "tk" "tl" "tr" "tt" "tyv" "ug" "uk" "ur" "uz" "vec" "vep" "vi" "vo" "war" "wuu" "xal" "xmf" "yi" "yo" "yue" "zh"

Specify the languages you want to include in the results, in case you are sure about the languages a text may be in.

exclude
Array of strings (Language)
Items Enum: "af" "am" "an" "ar" "as" "ast" "av" "az" "ba" "bar" "bcl" "be" "bg" "bn" "bo" "bpy" "br" "bs" "bxr" "ca" "cbk" "ce" "ceb" "co" "cs" "cv" "cy" "da" "de" "diq" "dv" "el" "en" "eo" "es" "et" "eu" "fa" "fi" "fr" "fy" "ga" "gd" "gl" "gn" "gom" "gu" "gv" "he" "hi" "hif" "hr" "hsb" "ht" "hu" "hy" "ia" "id" "ie" "ilo" "io" "is" "it" "ja" "jbo" "jv" "ka" "kk" "km" "kn" "ko" "krc" "ku" "kv" "kw" "ky" "la" "lb" "lez" "lmo" "lo" "lrc" "lt" "lv" "mai" "mg" "mhr" "min" "mk" "ml" "mn" "mr" "mrj" "ms" "mt" "mwl" "my" "myv" "mzn" "nap" "nds" "ne" "new" "nl" "nn" "oc" "or" "os" "pa" "pam" "pfl" "pl" "pms" "pnb" "ps" "pt" "qu" "rm" "ro" "ru" "rue" "sa" "sah" "sc" "scn" "sco" "sd" "sh" "si" "sk" "sl" "so" "sq" "sr" "su" "sv" "sw" "ta" "te" "tg" "th" "tk" "tl" "tr" "tt" "tyv" "ug" "uk" "ur" "uz" "vec" "vep" "vi" "vo" "war" "wuu" "xal" "xmf" "yi" "yo" "yue" "zh"

List of languages you want to exclude from the results, in case you are sure your texts are not in that language.

key
required
string

Your unique API key

Responses

Response samples

Content type
application/json
{
  • "confidence": 0.96,
  • "language": "string",
  • "iso-693-3": "string",
  • "name": "string"
}

Arabic Dialect Identification

Arabic Dialect Identification detects the regional variant of Arabic a text is written in. This endpoint was developed in collaboration with RedCrow.co.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

Response samples

Content type
application/json
{
  • "dialect": "Egypt"
}

Genre identification

Genre classification predicts the type of text, based on its length, tone of voice and content.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

Response samples

Content type
application/json
{
  • "confidence": 0.96,
  • "genre": "string"
}

Mediatopics

This endpoint predicts mediatopic-IDs (previously known as IPTC-codes) for a given text. The classifier was trained on Dutch, but is also able to predict topics for other languages. More information about the mediatopics taxonomy can be found at http://cv.iptc.org/newscodes/subjectcode.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

lang
required
string
Enum: "nl" "None"

Responses

Response samples

Content type
application/json
{
  • "confidence": 0.96,
  • "id": "string",
  • "mediatopic": "string"
}

Lexicon

Part-of-Speech Tagging

Part-of-speech tagging identifies sentence breaks and word types. Words have different roles depending on how they are used. For example, the word shop can be a noun (a shop, object) or a verb (to shop, action).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum: "en" "es" "cs" "da" "de" "fr" "it" "nl" "pt" "ru" "sv" "sw"
key
required
string

Your unique API key

Responses

Response samples

Content type
application/json
{
  • "text": [
    ],
  • "confidence": 0.96
}

Lemmatization

Lemmatization involves the morphological analysis of words to reduce them to their dictionary form (lemma). It is more powerful than stemming, which simply strips morphological prefixes, rather than taking into account a word's part-of-speech and allomorphic transformations. For example, "bathing" would be stemmed to "bath", but would be lemmatized as "bathe".

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text