Skip to main content

Calculates multiple readability metrics for large documents.

Project description

Readability Metrics

This project is based extremely heavily on mmautner's package. Modifications were made to support the analysis of large documents.

Inspiration

When using mmautner's package, the entire document needed to be passed in at once:

large_str = "...."
rd = Readability(large_str)
print('ARI: ', rd.ARI())

However, while doing an analysis of Supreme Court transcripts since 1956 across various metrics, my personal computer was not able to load and the needed documents all at once. In order to account for this, I created this package, which allows for pieces of documents to be passed in. Furthermore, the text is not stored, only the resulting calculations. Lastly, all metrics are calculated and returned each time, so individual calculations don't need to be performed.

Installation

Readability metrics can be installed from PyPi:

$ pip3 install readability-metrics

Usage

Readability metrics can be used as follows:

import metrics # import package

rdm = Readability()
rdm.analyze_text("This is a sentence.")
rdm.analyze_text("This is part of the same document.")
rdm.analyze_text("This is also part of the same document.")
rdm.get_results()

# can further modify
rdm.analyze_text("This is also part of the same document.")
rdm.get_results()

# can clear and start new analysis
rdm.clear()
rdm.analyze_text("This is also part of the same document.")
rdm.get_results()

You can also calculate readability metrics across multiple categories. For instance, if you had a transcript, you could calculate metrics for all speakers at once:

import metrics
from collections import defaultdict

let transcript = [
    ('John George', 'Words said by John George'),
    ('Apple Dunkin', 'Words said by Apple Dunkin'),
    # ...
]

readability_per_speaker = defaultdict(lambda: Readability())

# Calculate readability metrics
for dialogue in transcript:
    readability_per_speaker[dialogue[0]].analyze_text(dialogue[1])

# Calculate results
for speaker in readability_per_speaker:
    dic[key] = dic[key].get_results()

# readability_per_speaker now in form:
{
    "SPEAKER NAME": {
        {
            'ARI': 12.163787878787879,
            'FleschReadingEase': 58.2319, 'FleschKincaidGradeLevel': 11.2857,
            'GunningFogIndex': 14.5465,
            'SMOGIndex': 12.287087810503355,
            'ColemanLiauIndex': 9.5226,
            'LIX': 46.467171717171716,
            'RIX': 5.375
        }
    },
    # more speakers ...
}

Contribution

Contributions are welcome. Please create a pull request or email me at ericwiener3@gmail.com. Also feel free to create an issue if you need help with something.

Testing

Testing can be run with pytest. Simple navigate to the directory and run pytest.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readability-metrics-1.0.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

readability_metrics-1.0.0-py2.py3-none-any.whl (7.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file readability-metrics-1.0.0.tar.gz.

File metadata

  • Download URL: readability-metrics-1.0.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.7.1 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.2

File hashes

Hashes for readability-metrics-1.0.0.tar.gz
Algorithm Hash digest
SHA256 dc1af25268c167c3563a3ce3d85c6e761b48b72f12e519b829098607992ff9de
MD5 c3646206b8bf454e699c8892c3497c91
BLAKE2b-256 af829f259e1a0fc62e7e9b19dfbd5f77c07f71ffe28a272ef95f09ded84e3a94

See more details on using hashes here.

File details

Details for the file readability_metrics-1.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: readability_metrics-1.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.7.1 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.2

File hashes

Hashes for readability_metrics-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a835b2a445413866368ea3170de230eccf170e88757d069486a07239c050a9e7
MD5 359ca088c07f2d1279ba97aca7e53115
BLAKE2b-256 73b3bc1de4f28ef881c0bda159170f54ca4514a05c6d7bc03ca06cb178a73c81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page