TEXTGAIN TEXT ANALYTICS API (1.1.9.7)

Download OpenAPI specification:Download

The TEXTGAIN API provides a set of URLs to which you can send GET and POST requests. Texts to be analyzed are sent through the q parameter. You also need to send your personal key with each request. The server responds with a JSON string, a standardized, compact data format.

The API description lists the specifications for the GET requests. For alphabetic languages, this request method should suffice. For some languages, the URL might become too large to handle (resulting in a HTTP error 414 error). In these cases you can send your requests through POST in JSON, using the same parameters as the GET requests. The response will be identical.

Administration

Check your available credits

This endpoints shows the credits you spent in the last 24 hours (USED) and your total available credits, according to your subscription plan (ALLOWANCE). Calling this endpoint doesn't spend any credits.

query Parameters
key
required
string

Your unique API key

Responses

200

The server returns a dictionary detailing credits used in the last 24 hours and total allowance

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /diagnostics

TEXTGAIN

https://api.textgain.com/diagnostics

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "used": 101,
  • "key": "example-key",
  • "allowance": 10000
}

Identification

language identification

Language identification detects the language a text is written in. Different languages use different characters. For example, Russian (Кирилица), Chinese (汉字) and Arabic (العربية) are easy to distinguish. Languages that use the same characters (e.g., Latin alphabet, abc) often have cues that set them apart (e.g., é ↔ ë).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

include
Array of strings (Language)
Items Enum: "af" "am" "an" "ar" "as" "ast" "av" "az" "ba" "bar" "bcl" "be" "bg" "bn" "bo" "bpy" "br" "bs" "bxr" "ca" "cbk" "ce" "ceb" "co" "cs" "cv" "cy" "da" "de" "diq" "dv" "el" "en" "eo" "es" "et" "eu" "fa" "fi" "fr" "fy" "ga" "gd" "gl" "gn" "gom" "gu" "gv" "he" "hi" "hif" "hr" "hsb" "ht" "hu" "hy" "ia" "id" "ie" "ilo" "io" "is" "it" "ja" "jbo" "jv" "ka" "kk" "km" "kn" "ko" "krc" "ku" "kv" "kw" "ky" "la" "lb" "lez" "lmo" "lo" "lrc" "lt" "lv" "mai" "mg" "mhr" "min" "mk" "ml" "mn" "mr" "mrj" "ms" "mt" "mwl" "my" "myv" "mzn" "nap" "nds" "ne" "new" "nl" "nn" "oc" "or" "os" "pa" "pam" "pfl" "pl" "pms" "pnb" "ps" "pt" "qu" "rm" "ro" "ru" "rue" "sa" "sah" "sc" "scn" "sco" "sd" "sh" "si" "sk" "sl" "so" "sq" "sr" "su" "sv" "sw" "ta" "te" "tg" "th" "tk" "tl" "tr" "tt" "tyv" "ug" "uk" "ur" "uz" "vec" "vep" "vi" "vo" "war" "wuu" "xal" "xmf" "yi" "yo" "yue" "zh"

Specify the languages you want to include in the results, in case you are sure about the languages a text may be in.

exclude
Array of strings (Language)
Items Enum: "af" "am" "an" "ar" "as" "ast" "av" "az" "ba" "bar" "bcl" "be" "bg" "bn" "bo" "bpy" "br" "bs" "bxr" "ca" "cbk" "ce" "ceb" "co" "cs" "cv" "cy" "da" "de" "diq" "dv" "el" "en" "eo" "es" "et" "eu" "fa" "fi" "fr" "fy" "ga" "gd" "gl" "gn" "gom" "gu" "gv" "he" "hi" "hif" "hr" "hsb" "ht" "hu" "hy" "ia" "id" "ie" "ilo" "io" "is" "it" "ja" "jbo" "jv" "ka" "kk" "km" "kn" "ko" "krc" "ku" "kv" "kw" "ky" "la" "lb" "lez" "lmo" "lo" "lrc" "lt" "lv" "mai" "mg" "mhr" "min" "mk" "ml" "mn" "mr" "mrj" "ms" "mt" "mwl" "my" "myv" "mzn" "nap" "nds" "ne" "new" "nl" "nn" "oc" "or" "os" "pa" "pam" "pfl" "pl" "pms" "pnb" "ps" "pt" "qu" "rm" "ro" "ru" "rue" "sa" "sah" "sc" "scn" "sco" "sd" "sh" "si" "sk" "sl" "so" "sq" "sr" "su" "sv" "sw" "ta" "te" "tg" "th" "tk" "tl" "tr" "tt" "tyv" "ug" "uk" "ur" "uz" "vec" "vep" "vi" "vo" "war" "wuu" "xal" "xmf" "yi" "yo" "yue" "zh"

List of languages you want to exclude from the results, in case you are sure your texts are not in that language.

key
required
string

Your unique API key

Responses

200

The server returns the predicted ISO 639-1 language code in lang, a confidence measure, the ISO 639-3 code and the canonical name for the language.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /language

TEXTGAIN

https://api.textgain.com/language

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "confidence": 0.96,
  • "language": "string",
  • "iso-693-3": "string",
  • "name": "string"
}

Arabic Dialect Identification

Arabic Dialect Identification detects the regional variant of Arabic a text is written in. This endpoint was developed in collaboration with RedCrow.co.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

200

The server returns the predicted Arabic dialect.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /adi

TEXTGAIN

https://api.textgain.com/adi

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "dialect": "Egypt"
}

Genre identification

Genre classification predicts the type of text, based on its length, tone of voice and content.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

200

The server returns the predicted genre and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /genre

TEXTGAIN

https://api.textgain.com/genre

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "confidence": 0.96,
  • "genre": "string"
}

Mediatopics

This endpoint predicts mediatopic-IDs (previously known as IPTC-codes) for a given text. The classifier was trained on Dutch, but is also able to predict topics for other languages. More information about the mediatopics taxonomy can be found at http://cv.iptc.org/newscodes/subjectcode.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

lang
required
string
Enum: "nl" "None"

Responses

200

The server returns the predicted mediatopic and its id, as well as a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /mediatopics

TEXTGAIN

https://api.textgain.com/mediatopics

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "confidence": 0.96,
  • "id": "string",
  • "mediatopic": "string"
}

Lexicon

Part-of-Speech Tagging

Part-of-speech tagging identifies sentence breaks and word types. Words have different roles depending on how they are used. For example, the word shop can be a noun (a shop, object) or a verb (to shop, action).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum: "en" "es" "cs" "da" "de" "fr" "it" "nl" "pt" "ru" "sv" "sw"
key
required
string

Your unique API key

Responses

200

The server returns a a list of sentences. Each sentence is a list of phrases (often also referred to as chunks. Each phrase is a list of dictionaries, with each dictionary describing the tag of the word.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /tag

TEXTGAIN

https://api.textgain.com/tag

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ],
  • "confidence": 0.96
}

Lemmatization

Lemmatization involves the morphological analysis of words to reduce them to their dictionary form (lemma). It is more powerful than stemming, which simply strips morphological prefixes, rather than taking into account a word's part-of-speech and allomorphic transformations. For example, "bathing" would be stemmed to "bath", but would be lemmatized as "bathe".

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum: "nl" "en" "de" "es" "pt" "fr" "pl"
key
required
string

Your unique API key

Responses

200

The server returns a a list of sentences. Each sentence is a list of phrases (often also referred to as chunks. Each phrase is a list of dictionaries, with each dictionary describing the lemma and the tag of the word.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /lemma

TEXTGAIN

https://api.textgain.com/lemma

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ]
}

Readability

Tangconstructie

A ‘tangconstructie’ in Dutch is a sentence construction in which grammatically conjoined words are far apart in the sentence, enclosing other words or clauses. For example, the combination of a determiner and a noun constitutes a grammatical entity, but this entity can be split up by several other items in the sentence. A ‘tangconstructie’ is not ungrammatical, but it is stylistically frowned upon. A substantial distance between the conjoining parts of a clause can make a sentence incomprehensible and reduce overall readability. This classifier identifies the separate parts of a ‘tangconstuction’.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Value: "nl"
key
required
string

Your unique API key

Responses

200

The server returns a list of sentences, consisting of a list of dictionaries, indicating for each word whether it is part of a 'tangconstructie' or not. Dictionary keys will be set to the word-string. If a word is identified as part of a 'tangconstructie', the corresponding dictionary value will be set to the sentence-index of the grammatically conjoined part of the 'tangconstructie'. Else, the corresponding dictionary value will be set to the Boolean variable False.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /tang

TEXTGAIN

https://api.textgain.com/tang

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ]
}

Passive Voice

The use of the passive voice helps you to draw attention away from the agent of the action. Stylistically, however, it is often frowned upon, because it reduces readability. This classifier identifies the verbs involved in the passive voice of a sentence.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum: "de" "en" "nl"
key
required
string

Your unique API key

Responses

200

The server returns a list of dictionaries, indicating for each word whether it is part of a verbal unit in the passive voice.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /passive

TEXTGAIN

https://api.textgain.com/passive

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ]
}

Syllable counts / Hyphenation

Readability metrics often rely on syllable counts. Hyphenation and syllabification go hand in hand. This classifier outputs hyphenation patterns and syllable counts. It is fairly robust to noisy language (see example *awsome).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum: "af" "as" "be" "bg" "bn" "ca" "cop" "cs" "cu" "cy" "da" "de" "de-1901" "de-1996" "de-ch-1901" "el-monoton" "el-polyton" "en" "en-gb" "en-us" "eo" "es" "et" "eu" "fi" "fr" "fur" "ga" "gl" "grc" "gu" "hi" "hr" "hsb" "hu" "hy" "ia" "id" "is" "it" "ka" "kmr" "kn" "la" "la-x-classic" "la-x-liturgic" "lt" "lv" "ml" "mn-cyrl" "mr" "mul-ethi" "nl" "nn" "oc" "or" "pa" "pl" "pms" "pt" "rm" "ro" "ru" "sa" "sh-cyrl" "sh-latn" "sk" "sl" "sr-cyrl" "sv" "ta" "te" "th" "tk" "tr" "uk" "zh-latn-pinyin"
key
required
string

Your unique API key

Responses

200

The server returns a list with a dictionary for each word, describing the number of syllables (n_syllables) and the hyphenation pattern.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /syllables

TEXTGAIN

https://api.textgain.com/syllables

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text":
    [