Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!

A small module to compute textual lexical richness

Project description

LexicalRichness Documentation Status

A small python module to compute textual lexical richness measures


$ pip install lexicalrichness


>>> from lexicalrichness import LexicalRichness

# Generate object of readability statistics.
>>> text = """Measure of textual lexical diversity, computed as the mean length of sequential words in
                a text that maintains a minimum threshold TTR score.

                Iterates over words until TTR scores falls below a threshold, then increase factor
                counter by 1 and start over. McCarthy and Jarvis (2010, pg. 385) recommends a factor
                threshold in the range of [0.660, 0.750].
                (McCarthy 2005, McCarthy and Jarvis 2010)"""

# instantiate new text object (use use_TextBlob=True argument to use the textblob tokenizer)
>>> lex = lexicalrichness(text)

# Return word count.
>>> lex.words

# Return (unique) term count.
>>> lex.terms

# Return type-token ratio (TTR) of text.
>>> lex.ttr

# Return root type-token ratio (RTTR) of text.
>>> lex.rttr

# Return corrected type-token ratio (CTTR) of text.
>>> lex.cttr

# Return mean segmental type-token ratio (MSTTR).
>>> lex.msttr(segment_window=25)

# Return moving average type-token ratio (MATTR).
>>> lex.mattr(window_size=25)

# Return Measure of Textual Lexical Diversity (MTLD).
>>> lex.mtld(threshold=0.72)

# Return hypergeometric distribution diversity (HD-D) measure.
>>> lex.hdd(draws=42)

Attributes and properties

wordlist list of words
words number of words (w)
terms number of unique terms (t)
tokenizer tokenizer used
ttr type-token ratio computed as t / w (Chotlos 1944, Templin 1957)
rttr root TTR computed as t / sqrt(w) (Guiraud 1954, 1960)
cttr corrected TTR computed as t / sqrt(2w) (Carrol 1964)
Herdan log(t) / log(w) (Herdan 1960, 1964)
Summer log(log(t)) / log(log(w)) Summer (1966)
Dugast (log(w) ** 2) / (log(w) - log(t) Dugast (1978)
Maas (log(w) - log(t)) / (log(w) ** 2) Maas (1972)


msttr Mean segmental TTR (Johnson 1944)
mattr Moving average TTR (Covington 2007, Covington and McFall 2010)
mtld Measure of Lexical Diversity (McCarthy 2005, McCarthy and Jarvis 2010)
hdd HD-D (McCarthy and Jarvis 2007)

Assessing method docstrings

>>> import inspect

# docstring for hdd (HD-D)
>>> print(inspect.getdoc(LexicalRichness.hdd))

Hypergeometric distribution diversity (HD-D) score.

For each term (t) in the text, compute the probabiltiy (p) of getting at least one appearance
of t with a random draw of size n < N (text size). The contribution of t to the final HD-D
score is p * (1/n). The final HD-D score thus sums over p * (1/n) with p computed for
each term t. Described in McCarthy and Javis 2007, p.g. 465-466.
(McCarthy and Jarvis 2007)

draws: int
    Number of random draws in the hypergeometric distribution (default=42).



0.1.2 (2018-05-09)

  • First release on PyPI.

0.1.3 (2018-05-27)

  • Minor fix for compatibility issue with hyphens (ascii) in python 2.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for lexicalrichness, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size lexicalrichness-0.1.3.tar.gz (15.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page