Skip to main content

Comprehensive Python package for stylometric analysis

Project description

pystylometry

Python Version License: MIT Code style: ruff PyPI version

A comprehensive Python package for stylometric analysis with modular architecture and optional dependencies.

Features

pystylometry provides 50+ metrics across five analysis domains:

  • Lexical Diversity: TTR, MTLD, Yule's K, Hapax ratios, and more
  • Readability: Flesch, SMOG, Gunning Fog, Coleman-Liau, ARI
  • Syntactic Analysis: POS ratios, sentence statistics (requires spaCy)
  • Authorship Attribution: Burrows' Delta, Cosine Delta, Zeta scores
  • N-gram Analysis: Character and word bigram entropy, perplexity

Installation

Install only what you need:

# Core package (lexical metrics only)
pip install pystylometry

# With readability metrics
pip install pystylometry[readability]

# With syntactic metrics (requires spaCy)
pip install pystylometry[syntactic]

# With authorship metrics
pip install pystylometry[authorship]

# With n-gram analysis
pip install pystylometry[ngrams]

# Everything
pip install pystylometry[all]

Quick Start

Using Individual Modules

from pystylometry.lexical import compute_mtld, compute_yule
from pystylometry.readability import compute_flesch

text = "Your text here..."

# Lexical diversity
mtld = compute_mtld(text)
print(f"MTLD: {mtld.mtld_average:.2f}")

yule = compute_yule(text)
print(f"Yule's K: {yule.yule_k:.2f}")

# Readability
flesch = compute_flesch(text)
print(f"Reading Ease: {flesch.reading_ease:.1f}")
print(f"Grade Level: {flesch.grade_level:.1f}")

Using the Unified API

from pystylometry import analyze

text = "Your text here..."

# Analyze with multiple metrics at once
results = analyze(text, lexical=True, readability=True)

# Access results
print(f"MTLD: {results.lexical['mtld'].mtld_average:.2f}")
print(f"Flesch: {results.readability['flesch'].reading_ease:.1f}")

Checking Available Modules

from pystylometry import get_available_modules

available = get_available_modules()
print(available)
# {'lexical': True, 'readability': True, 'syntactic': False, ...}

API Design

Clean, Consistent Interface

Every metric function:

  • Takes text as input
  • Returns a rich result object (never just a float)
  • Includes metadata about the computation
  • Has comprehensive docstrings with formulas and references
from pystylometry.lexical import compute_yule

result = compute_yule(text)
# Returns: YuleResult(yule_k=..., yule_i=..., metadata={...})

Available Metrics

Lexical Diversity

  • TTR - Type-Token Ratio (via stylometry-ttr)
  • MTLD - Measure of Textual Lexical Diversity
  • Yule's K - Vocabulary repetitiveness
  • Hapax Legomena - Words appearing once/twice
  • Sichel's S - Hapax-based richness
  • Honoré's R - Vocabulary richness constant

Readability

  • Flesch Reading Ease - 0-100 difficulty scale
  • Flesch-Kincaid Grade - US grade level
  • SMOG Index - Years of education needed
  • Gunning Fog - Readability complexity
  • Coleman-Liau - Character-based grade level
  • ARI - Automated Readability Index

Syntactic (requires spaCy)

  • POS Ratios - Noun/verb/adjective/adverb ratios
  • Lexical Density - Content vs function words
  • Sentence Statistics - Length, variation, complexity

Authorship (requires scikit-learn, scipy)

  • Burrows' Delta - Author distance measure
  • Cosine Delta - Angular distance
  • Zeta Scores - Distinctive word usage

N-grams (requires nltk)

  • Character Bigram Entropy - Character predictability
  • Word Bigram Entropy - Word sequence predictability
  • Perplexity - Language model fit

Dependencies

Core (always installed):

  • stylometry-ttr

Optional:

  • readability: pronouncing (for syllable counting)
  • syntactic: spacy>=3.8.0
  • authorship: None (pure Python + stdlib)
  • ngrams: None (pure Python + stdlib)

Development

# Clone the repository
git clone https://github.com/craigtrim/pystylometry
cd pystylometry

# Install with dev dependencies
pip install -e ".[dev,all]"

# Run tests
make test

# Run linters
make lint

# Format code
make format

Project Status

🚧 Phase 1 - Core Lexical Metrics (In Progress)

  • Project structure
  • MTLD implementation
  • Yule's K implementation
  • Hapax ratios implementation
  • Tests
  • v0.1.0 release

See pystylometry-plan.md for the full roadmap.

Why pystylometry?

  • Modular: Install only what you need
  • Consistent: Uniform API across all metrics
  • Rich Results: Dataclass objects with metadata, not just numbers
  • Well-Documented: Formulas, references, and interpretations
  • Type-Safe: Full type hints for IDE support
  • Tested: Comprehensive test suite

References

See stylometry-metrics.md for the complete metrics reference table with formulas.

License

MIT License - see LICENSE file for details.

Author

Craig Trim (craigtrim@gmail.com)

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystylometry-0.1.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pystylometry-0.1.0-py3-none-any.whl (32.0 kB view details)

Uploaded Python 3

File details

Details for the file pystylometry-0.1.0.tar.gz.

File metadata

  • Download URL: pystylometry-0.1.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for pystylometry-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1ea52625a9cdca0752ea25a4a1363d84ebf45454cf638b8fa2882d94b19d84b
MD5 711b113d7765a8c24736babd2ebc4708
BLAKE2b-256 79ecc423792f0905a140134072c0140f7e119251e62250ec3c79716c5ee60914

See more details on using hashes here.

File details

Details for the file pystylometry-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pystylometry-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for pystylometry-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e80ad553ac66ec6f131df9255fbdde2a956e2720d93562780e7293dd1e9445b
MD5 6d37fab64250cd383a01a64cffca0a88
BLAKE2b-256 887ac7bf7599c9c4da697bc7bd4fb5ef3e4acd4d763dd3680e9778f683b92d96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page