Comprehensive Python package for stylometric analysis
Project description
pystylometry
A comprehensive Python package for stylometric analysis with modular architecture and optional dependencies.
Features
pystylometry provides 50+ metrics across five analysis domains:
- Lexical Diversity: TTR, MTLD, Yule's K, Hapax ratios, and more
- Readability: Flesch, SMOG, Gunning Fog, Coleman-Liau, ARI
- Syntactic Analysis: POS ratios, sentence statistics (requires spaCy)
- Authorship Attribution: Burrows' Delta, Cosine Delta, Zeta scores
- N-gram Analysis: Character and word bigram entropy, perplexity
Installation
Install only what you need:
# Core package (lexical metrics only)
pip install pystylometry
# With readability metrics
pip install pystylometry[readability]
# With syntactic metrics (requires spaCy)
pip install pystylometry[syntactic]
# With authorship metrics
pip install pystylometry[authorship]
# With n-gram analysis
pip install pystylometry[ngrams]
# Everything
pip install pystylometry[all]
Quick Start
Using Individual Modules
from pystylometry.lexical import compute_mtld, compute_yule
from pystylometry.readability import compute_flesch
text = "Your text here..."
# Lexical diversity
mtld = compute_mtld(text)
print(f"MTLD: {mtld.mtld_average:.2f}")
yule = compute_yule(text)
print(f"Yule's K: {yule.yule_k:.2f}")
# Readability
flesch = compute_flesch(text)
print(f"Reading Ease: {flesch.reading_ease:.1f}")
print(f"Grade Level: {flesch.grade_level:.1f}")
Using the Unified API
from pystylometry import analyze
text = "Your text here..."
# Analyze with multiple metrics at once
results = analyze(text, lexical=True, readability=True)
# Access results
print(f"MTLD: {results.lexical['mtld'].mtld_average:.2f}")
print(f"Flesch: {results.readability['flesch'].reading_ease:.1f}")
Checking Available Modules
from pystylometry import get_available_modules
available = get_available_modules()
print(available)
# {'lexical': True, 'readability': True, 'syntactic': False, ...}
API Design
Clean, Consistent Interface
Every metric function:
- Takes text as input
- Returns a rich result object (never just a float)
- Includes metadata about the computation
- Has comprehensive docstrings with formulas and references
from pystylometry.lexical import compute_yule
result = compute_yule(text)
# Returns: YuleResult(yule_k=..., yule_i=..., metadata={...})
Available Metrics
Lexical Diversity
- TTR - Type-Token Ratio (via stylometry-ttr)
- MTLD - Measure of Textual Lexical Diversity
- Yule's K - Vocabulary repetitiveness
- Hapax Legomena - Words appearing once/twice
- Sichel's S - Hapax-based richness
- Honoré's R - Vocabulary richness constant
Readability
- Flesch Reading Ease - 0-100 difficulty scale
- Flesch-Kincaid Grade - US grade level
- SMOG Index - Years of education needed
- Gunning Fog - Readability complexity
- Coleman-Liau - Character-based grade level
- ARI - Automated Readability Index
Syntactic (requires spaCy)
- POS Ratios - Noun/verb/adjective/adverb ratios
- Lexical Density - Content vs function words
- Sentence Statistics - Length, variation, complexity
Authorship (requires scikit-learn, scipy)
- Burrows' Delta - Author distance measure
- Cosine Delta - Angular distance
- Zeta Scores - Distinctive word usage
N-grams (requires nltk)
- Character Bigram Entropy - Character predictability
- Word Bigram Entropy - Word sequence predictability
- Perplexity - Language model fit
Dependencies
Core (always installed):
- stylometry-ttr
Optional:
readability: pronouncing (for syllable counting)syntactic: spacy>=3.8.0authorship: None (pure Python + stdlib)ngrams: None (pure Python + stdlib)
Development
# Clone the repository
git clone https://github.com/craigtrim/pystylometry
cd pystylometry
# Install with dev dependencies
pip install -e ".[dev,all]"
# Run tests
make test
# Run linters
make lint
# Format code
make format
Project Status
🚧 Phase 1 - Core Lexical Metrics (In Progress)
- Project structure
- MTLD implementation
- Yule's K implementation
- Hapax ratios implementation
- Tests
- v0.1.0 release
See pystylometry-plan.md for the full roadmap.
Why pystylometry?
- Modular: Install only what you need
- Consistent: Uniform API across all metrics
- Rich Results: Dataclass objects with metadata, not just numbers
- Well-Documented: Formulas, references, and interpretations
- Type-Safe: Full type hints for IDE support
- Tested: Comprehensive test suite
References
See stylometry-metrics.md for the complete metrics reference table with formulas.
License
MIT License - see LICENSE file for details.
Author
Craig Trim (craigtrim@gmail.com)
Contributing
Contributions welcome! Please open an issue or PR on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pystylometry-0.1.0.tar.gz.
File metadata
- Download URL: pystylometry-0.1.0.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1ea52625a9cdca0752ea25a4a1363d84ebf45454cf638b8fa2882d94b19d84b
|
|
| MD5 |
711b113d7765a8c24736babd2ebc4708
|
|
| BLAKE2b-256 |
79ecc423792f0905a140134072c0140f7e119251e62250ec3c79716c5ee60914
|
File details
Details for the file pystylometry-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pystylometry-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e80ad553ac66ec6f131df9255fbdde2a956e2720d93562780e7293dd1e9445b
|
|
| MD5 |
6d37fab64250cd383a01a64cffca0a88
|
|
| BLAKE2b-256 |
887ac7bf7599c9c4da697bc7bd4fb5ef3e4acd4d763dd3680e9778f683b92d96
|