TEXTGAIN TEXT ANALYTICS API (1.1.6)

Download OpenAPI specification:Download

The TEXTGAIN API provides a set of URLs to which you can send GET and POST requests. Texts to be analyzed are sent through the q parameter. You also need to send your personal key with each request. The server responds with a JSON string, a standardized, compact data format.

The API description lists the specifications for the GET requests. For alphabetic languages, this request method should suffice. For some languages, the URL might become too large to handle (resulting in a HTTP error 414 error). In these cases you can send your requests through POST in JSON, using the same parameters as the GET requests. The response will be identical.

Identification

language identification

Language identification detects the language a text is written in. Different languages use different characters. For example, Russian (Кирилица), Chinese (汉字) and Arabic (العربية) are easy to distinguish. Languages that use the same characters (e.g., Latin alphabet, abc) often have cues that set them apart (e.g., é ↔ ë).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

include
Array of string (Language)
Items Enum:"af" "am" "an" "ar" "as" "ast" "av" "az" "ba" "bar" "bcl" "be" "bg" "bn" "bo" "bpy" "br" "bs" "bxr" "ca" "cbk" "ce" "ceb" "co" "cs" "cv" "cy" "da" "de" "diq" "dv" "el" "en" "eo" "es" "et" "eu" "fa" "fi" "fr" "fy" "ga" "gd" "gl" "gn" "gom" "gu" "gv" "he" "hi" "hif" "hr" "hsb" "ht" "hu" "hy" "ia" "id" "ie" "ilo" "io" "is" "it" "ja" "jbo" "jv" "ka" "kk" "km" "kn" "ko" "krc" "ku" "kv" "kw" "ky" "la" "lb" "lez" "lmo" "lo" "lrc" "lt" "lv" "mai" "mg" "mhr" "min" "mk" "ml" "mn" "mr" "mrj" "ms" "mt" "mwl" "my" "myv" "mzn" "nap" "nds" "ne" "new" "nl" "nn" false "oc" "or" "os" "pa" "pam" "pfl" "pl" "pms" "pnb" "ps" "pt" "qu" "rm" "ro" "ru" "rue" "sa" "sah" "sc" "scn" "sco" "sd" "sh" "si" "sk" "sl" "so" "sq" "sr" "su" "sv" "sw" "ta" "te" "tg" "th" "tk" "tl" "tr" "tt" "tyv" "ug" "uk" "ur" "uz" "vec" "vep" "vi" "vo" "war" "wuu" "xal" "xmf" "yi" "yo" "yue" "zh"

Specify the languages you want to include in the results, in case you are sure about the languages a text may be in.

exclude
Array of string (Language)
Items Enum:"af" "am" "an" "ar" "as" "ast" "av" "az" "ba" "bar" "bcl" "be" "bg" "bn" "bo" "bpy" "br" "bs" "bxr" "ca" "cbk" "ce" "ceb" "co" "cs" "cv" "cy" "da" "de" "diq" "dv" "el" "en" "eo" "es" "et" "eu" "fa" "fi" "fr" "fy" "ga" "gd" "gl" "gn" "gom" "gu" "gv" "he" "hi" "hif" "hr" "hsb" "ht" "hu" "hy" "ia" "id" "ie" "ilo" "io" "is" "it" "ja" "jbo" "jv" "ka" "kk" "km" "kn" "ko" "krc" "ku" "kv" "kw" "ky" "la" "lb" "lez" "lmo" "lo" "lrc" "lt" "lv" "mai" "mg" "mhr" "min" "mk" "ml" "mn" "mr" "mrj" "ms" "mt" "mwl" "my" "myv" "mzn" "nap" "nds" "ne" "new" "nl" "nn" false "oc" "or" "os" "pa" "pam" "pfl" "pl" "pms" "pnb" "ps" "pt" "qu" "rm" "ro" "ru" "rue" "sa" "sah" "sc" "scn" "sco" "sd" "sh" "si" "sk" "sl" "so" "sq" "sr" "su" "sv" "sw" "ta" "te" "tg" "th" "tk" "tl" "tr" "tt" "tyv" "ug" "uk" "ur" "uz" "vec" "vep" "vi" "vo" "war" "wuu" "xal" "xmf" "yi" "yo" "yue" "zh"

List of languages you want to exclude from the results, in case you are sure your texts are not in that language.

key
required
string

Your unique API key

Responses

200

The server returns the predicted ISO 639-1 language code in lang, a confidence measure, the ISO 639-3 code and the canonical name for the language.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /language
TEXTGAIN
https://api.textgain.com/language

Response samples

application/json
Copy
Expand all Collapse all
{
  • "confidence": 0.96,
  • "language": "string",
  • "iso-693-3": "string",
  • "name": "string"
}

Arabic Dialect Identification

Arabic Dialect Identification detects the regional variant of Arabic a text is written in. This endpoint was developed in collaboration with RedCrow.co.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

200

The server returns the predicted Arabic dialect.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /adi
TEXTGAIN
https://api.textgain.com/adi

Response samples

application/json
Copy
Expand all Collapse all
{
  • "dialect": "Egypt"
}

Genre identification

Genre classification predicts the type of text, based on its length, tone of voice and content.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

200

The server returns the predicted genre and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /genre
TEXTGAIN
https://api.textgain.com/genre

Response samples

application/json
Copy
Expand all Collapse all
{
  • "confidence": 0.96,
  • "genre": "string"
}

Lexicon

Part-of-Speech Tagging

Part-of-speech tagging identifies sentence breaks and word types. Words have different roles depending on how they are used. For example, the word shop can be a noun (a shop, object) or a verb (to shop, action).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"en" "es" "cs" "da" "de" "fr" "it" false "nl" "pt" "ru" "sv" "sw"
key
required
string

Your unique API key

Responses

200

The server returns a a list of sentences. Each sentence is a list of phrases (often also referred to as chunks. Each phrase is a list of dictionaries, with each dictionary describing the tag of the word.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /tag
TEXTGAIN
https://api.textgain.com/tag

Response samples

application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ],
  • "confidence": 0.96
}

Lemmatization

Lemmatization involves the morphological analysis of words to reduce them to their dictionary form (lemma). It is more powerful than stemming, which simply strips morphological prefixes, rather than taking into account a word's part-of-speech and allomorphic transformations. For example, "bathing" would be stemmed to "bath", but would be lemmatized as "bathe".

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"nl" "en" "de"
key
required
string

Your unique API key

Responses

200

The server returns a a list of sentences. Each sentence is a list of phrases (often also referred to as chunks. Each phrase is a list of dictionaries, with each dictionary describing the lemma and the tag of the word.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /lemma
TEXTGAIN
https://api.textgain.com/lemma

Response samples

application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ]
}

Readability

Passive Voice

The use of the passive voice helps you to draw attention away from the agent of the action. Stylistically, however, it is often frowned upon, because it reduces readability. This classifier identifies the verbs involved in the passive voice of a sentence.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"de" "en" "nl"
key
required
string

Your unique API key

Responses

200

The server returns a list of dictionaries, indicating for each word whether it is part of a verbal unit in the passive voice.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /passive
TEXTGAIN
https://api.textgain.com/passive

Response samples

application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ]
}

Syllable counts / Hyphenation

Readability metrics often rely on syllable counts. Hyphenation and syllabification go hand in hand. This classifier outputs hyphenation patterns and syllable counts. It is fairly robust to noisy language (see example *awsome).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"af" "as" "be" "bg" "bn" "ca" "cop" "cs" "cu" "cy" "da" "de" "de-1901" "de-1996" "de-ch-1901" "el-monoton" "el-polyton" "en" "en-gb" "en-us" "eo" "es" "et" "eu" "fi" "fr" "fur" "ga" "gl" "grc" "gu" "hi" "hr" "hsb" "hu" "hy" "ia" "id" "is" "it" "ka" "kmr" "kn" "la" "la-x-classic" "la-x-liturgic" "lt" "lv" "ml" "mn-cyrl" "mr" "mul-ethi" false "nl" "nn" "oc" "or" "pa" "pl" "pms" "pt" "rm" "ro" "ru" "sa" "sh-cyrl" "sh-latn" "sk" "sl" "sr-cyrl" "sv" "ta" "te" "th" "tk" "tr" "uk" "zh-latn-pinyin"
key
required
string

Your unique API key

Responses

200

The server returns a list with a dictionary for each word, describing the number of syllables (n_syllables) and the hyphenation pattern.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /syllables
TEXTGAIN
https://api.textgain.com/syllables

Response samples

application/json
Copy
Expand all Collapse all
{
  • "text":
    [
    ]
}

Concept Extraction & Conversion

Concept Extraction

Concept extraction identifies keywords, key phrases and ‘named entities’ – names of persons, products, organizations, locations, dates, and so on. Keywords are nouns that appear more often in a text, and often at the start of a text. Named entities frequently start with a capital letter (e.g., Barack Obama). Concept extraction can be used to summarize a text, or to compare if two texts discuss similar topics for example.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"ar" "en" "es" "de" "fr" "it" "nl"
top
integer [ 1 .. 50 ]
Default: 10
key
required
string

Your unique API key

Responses

200

The server returns a list of the most salient concepts in the submitted text.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /concepts
TEXTGAIN
https://api.textgain.com/concepts

Response samples

application/json
Copy
Expand all Collapse all
{
  • "concepts":
    [
    ]
}

Geocoding

Geocoding looks for place names in a text (in any language) and returns a list of possible locations, along with their longitude and latitude and country of origin. Note that the results are exhaustive! For example, Berlin, Germany as well as Berlin in Colombia (Berlín) will be returned, unless you specify incude or exclude parameters. The results are sorted according to population size (if known).

If you want to do language-specific filtering (for instance if you don't want to consider From, the town in Norway, you can combine this web service with the POS-tagger and only retain the NOUNs.

Try: Eindhoven is pretty far from Россия.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

include
Array of string (Country)
Items Enum:"ac" "ad" "ae" "af" "ag" "ai" "al" "am" "ao" "aq" "ar" "as" "at" "au" "aw" "ax" "az" "ba" "bb" "bd" "be" "bf" "bg" "bh" "bi" "bj" "bl" "bm" "bn" "bo" "bq" "br" "bs" "bt" "bw" "by" "bz" "ca" "cc" "cd" "cf" "cg" "ch" "ci" "ck" "cl" "cm" "cn" "co" "cr" "cu" "cv" "cw" "cx" "cy" "cz" "de" "dg" "dj" "dk" "dm" "do" "dz" "ea" "ec" "ee" "eg" "eh" "er" "es" "et" "ez" "fi" "fj" "fk" "fm" "fo" "fr" "ga" "gb" "gd" "ge" "gf" "gg" "gh" "gi" "gl" "gm" "gn" "gp" "gq" "gr" "gs" "gt" "gu" "gw" "gy" "hk" "hn" "hr" "ht" "hu" "ic" "id" "ie" "il" "im" "in" "io" "iq" "ir" "is" "it" "je" "jm" "jo" "jp" "ke" "kg" "kh" "ki" "km" "kn" "kp" "kr" "kw" "ky" "kz" "la" "lb" "lc" "li" "lk" "lr" "ls" "lt" "lu" "lv" "ly" "ma" "mc" "md" "me" "mf" "mg" "mh" "mk" "ml" "mm" "mn" "mo" "mp" "mq" "mr" "ms" "mt" "mu" "mv" "mw" "mx" "my" "mz" "na" "nc" "ne" "nf" "ng" "ni" "nl" false "np" "nr" "nu" "nz" "om" "pa" "pe" "pf" "pg" "ph" "pk" "pl" "pm" "pn" "pr" "ps" "pt" "pw" "py" "qa" "re" "ro" "rs" "ru" "rw" "sa" "sb" "sc" "sd" "se" "sg" "sh" "si" "sj" "sk" "sl" "sm" "sn" "so" "sr" "ss" "st" "sv" "sx" "sy" "sz" "ta" "tc" "td" "tf" "tg" "th" "tj" "tk" "tl" "tm" "tn" "to" "tr" "tt" "tv" "tw" "tz" "ua" "ug" "um" "un" "us" "uy" "uz" "va" "vc" "ve" "vg" "vi" "vn" "vu" "wf" "ws" "xk" "ye" "yt" "za" "zm" "zw"

Specify the alpha-2 code(s) of the country or countries you want to cover in the output .

exclude
Array of string (Country)
Items Enum:"ac" "ad" "ae" "af" "ag" "ai" "al" "am" "ao" "aq" "ar" "as" "at" "au" "aw" "ax" "az" "ba" "bb" "bd" "be" "bf" "bg" "bh" "bi" "bj" "bl" "bm" "bn" "bo" "bq" "br" "bs" "bt" "bw" "by" "bz" "ca" "cc" "cd" "cf" "cg" "ch" "ci" "ck" "cl" "cm" "cn" "co" "cr" "cu" "cv" "cw" "cx" "cy" "cz" "de" "dg" "dj" "dk" "dm" "do" "dz" "ea" "ec" "ee" "eg" "eh" "er" "es" "et" "ez" "fi" "fj" "fk" "fm" "fo" "fr" "ga" "gb" "gd" "ge" "gf" "gg" "gh" "gi" "gl" "gm" "gn" "gp" "gq" "gr" "gs" "gt" "gu" "gw" "gy" "hk" "hn" "hr" "ht" "hu" "ic" "id" "ie" "il" "im" "in" "io" "iq" "ir" "is" "it" "je" "jm" "jo" "jp" "ke" "kg" "kh" "ki" "km" "kn" "kp" "kr" "kw" "ky" "kz" "la" "lb" "lc" "li" "lk" "lr" "ls" "lt" "lu" "lv" "ly" "ma" "mc" "md" "me" "mf" "mg" "mh" "mk" "ml" "mm" "mn" "mo" "mp" "mq" "mr" "ms" "mt" "mu" "mv" "mw" "mx" "my" "mz" "na" "nc" "ne" "nf" "ng" "ni" "nl" false "np" "nr" "nu" "nz" "om" "pa" "pe" "pf" "pg" "ph" "pk" "pl" "pm" "pn" "pr" "ps" "pt" "pw" "py" "qa" "re" "ro" "rs" "ru" "rw" "sa" "sb" "sc" "sd" "se" "sg" "sh" "si" "sj" "sk" "sl" "sm" "sn" "so" "sr" "ss" "st" "sv" "sx" "sy" "sz" "ta" "tc" "td" "tf" "tg" "th" "tj" "tk" "tl" "tm" "tn" "to" "tr" "tt" "tv" "tw" "tz" "ua" "ug" "um" "un" "us" "uy" "uz" "va" "vc" "ve" "vg" "vi" "vn" "vu" "wf" "ws" "xk" "ye" "yt" "za" "zm" "zw"

Specify the alpha-2 code(s) of the country or countries you want to cover in the output.

key
required
string

Your unique API key

Responses

200

The server returns an exhaustive list of candidate locations found in the submitted data. Each location is represented as a dictionary describing population size (sort-parameter), longitude, latitude, location type and `country.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /geocode
TEXTGAIN
https://api.textgain.com/geocode

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    },
  • {
    },
  • {
    },
  • {
    }
]

Translation

A simple translation engine that finds English translations for words in a text. This is word-based translation model and should not be considered as a machine translation solution. Try: مقالة ويكيبيديا

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Value:"ar"
key
required
string

Your unique API key

Responses

200

The server returns a list of word, translation dictionaries, for only those words the pipeline was able to translate.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /translate
TEXTGAIN
https://api.textgain.com/translate

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    },
  • {
    }
]

Sentiment Analysis

Sentiment Analysis

Sentiment analysis predicts whether a text is objective (fact) or subjective (opinion). Subjective text contains adverbs and adjectives with a positive or negative ‘polarity’ that capture the author’s personal opinion (e.g., an excellent opportunity or a bad product).

Try: q: Loved this book! and lang: en

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"en" "es" "ar" "da" "de" "fr" "it" "ja" "nl" false "pl" "pt" "ru" "sv" "sw" "zh"
key
required
string

Your unique API key

Responses

200

The server returns the predicted sentiment polarity (between -1 (negative) and 1 (positive)) and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /sentiment
TEXTGAIN
https://api.textgain.com/sentiment

Response samples

application/json
Copy
Expand all Collapse all
{
  • "polarity": "0.88,",
  • "confidence": 0.7
}

Sentiment Tagging

Sentiment tagging provides for each word in a text a sentiment label (0 (neutral), 1 (positive) or -1 (negative). Note that these labels are assigned on a word-per-word basis and do not take into account contextual information, such as negation ("not good")

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"sw" "nl" "it" "fr" "en" "pt"
key
required
string

Your unique API key

Responses

200

The server returns a list of tokens, each token represented as a dictionary of word,sentiment-values

400

bad input parameter

403

forbidden (Available to Professional Subscriptions and up)

429

too many requests (upgrade your license or wait till tomorrow)

get /sentimenttag
TEXTGAIN
https://api.textgain.com/sentimenttag

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    },
  • {
    }
]

Profiling

Age Classification

Age prediction estimates whether a text is written by an adolescent or an adult. Online, adolescents use more informal language, including abbreviated utterances (omg, wow) and mood (awesome, lame). Adolescents tend to talk about school, parents, and partying. Adults tend talk about work, children, health, and use more complex sentence structures.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"en" "es" "de" "fr" "nl"
key
required
string

Your unique API key

Responses

200

The server returns the predicted age range (25- or 25+) and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /age
TEXTGAIN
https://api.textgain.com/age

Response samples

application/json
Copy
Expand all Collapse all
{
  • "age": "25-",
  • "confidence": 0.75
}

Gender Classification

Gender prediction estimates whether a text is written by a man or a woman. Statistically, women tend to talk more about people and relationships (family, friends), while men are more interested in objects and things (e.g., cars, games). As a result, women will use more personal pronouns (I, you, we) in a social context and men will use more determiners (a, an, the) and more quantifiers (one, many).

DISCLAIMER: We acknowledge that gender is not a binary matter, but we currently cannot predict a more fine-grained classification.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

by
string <utf-8>

Name, nickname, handle

lang
required
string
Enum:"en" "es" "da" "de" "fi" "fr" "it" "nl" false "pl" "pt" "sv"
key
required
string

Your unique API key

Responses

200

The server returns the predicted gender (male or female) and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /gender
TEXTGAIN
https://api.textgain.com/gender

Response samples

application/json
Copy
Expand all Collapse all
{
  • "age": "f",
  • "confidence": 0.95
}

Gender Tagging

Gender tagging provides for each word in a text a male, female or neutral tag. These tags are estimated on observed language usage by male and female writers. Gender tagging differs from gender prediction, in that it indicates which words the respective genders have been observed to use more in writing, as opposed to measuring typical male vs female writing style.

Disclaimer: these values are taken from large corpora and reflect the gender bias patterns present in the data.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"de" "en" "es" "fr" "it" "nl" "pl" "pt" "sv"
key
required
string

Your unique API key

Responses

200

The server returns a list of tokens, each token represented as a dictionary of word,gender-values

400

bad input parameter

403

forbidden (Available to Enterprise Subscriptions and up)

429

too many requests (upgrade your license or wait till tomorrow)

get /gendertag
TEXTGAIN
https://api.textgain.com/gendertag

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    },
  • {
    },
  • {
    }
]

Education Prediction

Education prediction estimates whether a text displays basic or advanced writing skills. Statistically, people with higher education will use more formal language and use more punctuation marks (, ; :), correct spelling and capitalization, longer words and sentences and less emoji (cf. idk lol just talkin ☺☺☺).

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

key
required
string

Your unique API key

Responses

200

The server returns the predicted education level (low(-) or high(+)) and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /education
TEXTGAIN
https://api.textgain.com/education

Response samples

application/json
Copy
Expand all Collapse all
{
  • "education": "-",
  • "confidence": 0.88
}

Personality Prediction

Personality prediction estimates whether a text is written by an extraverted or an introverted person. Extraverts tend to be more sociable, assertive and playful, while introverts are more solitary, reserved and shy. As a result, extraverts will use we more often, and more positive adjectives and less formal language. Introverts will use I more often, and they employ a broader vocabulary.

query Parameters
q
required
string <utf-8> <= 3000 characters

UTF-8 encoded text

lang
required
string
Enum:"en" "nl"
key
required
string

Your unique API key

Responses

200

The server returns the predicted personality-type (Introvert or Extravrt) and a confidence measure.

400

bad input parameter

429

too many requests (upgrade your license or wait till tomorrow)

get /personality
TEXTGAIN
https://api.textgain.com/personality

Response samples

application/json
Copy
Expand all Collapse all
{
  • "personality": "I",
  • "confidence": 0.88
}