Quantitative author fingerprinting & stylometric analysis - offline CLI tool

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Stylometry CLI (local/offline) — v1.0

This is a small, offline Python tool to extract stylometric artifacts/patterns from text and optionally compute simple similarity signals between corpora using character n-grams.

It’s designed to slot into your Stylometry Orchestrator workflow by emitting SAO-style ResultBundle_*.json files plus CSV artifacts.

What it does

For each document (and each chunk of a document), it computes:

Lexical
- word count, unique word count
- average word length
- MATTR lexical diversity (more length-robust than raw TTR)
Syntactic (proxy)
- average sentence length
- sentence length variation (population SD)
Habitual
- function word frequencies (configurable list)
- punctuation rates (commas/semicolons/etc per 1000 words and per sentence)

If 2+ corpora are provided and there are enough chunks, it also computes:

Char n-gram TF-IDF centroid cosine similarity across corpora (corpus_similarity_char_ngrams.csv)
Nearest-centroid chunk assignment (chunk_assignments_char_ngrams.csv)

Note: these are signals, not definitive authorship proof. Topic/genre/boilerplate can dominate.

Requirements

Windows, macOS, or Linux
Python 3.12+
pip install of dependencies

Install (Windows PowerShell)

py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

Quick check:

python -c "import numpy, pandas, sklearn; print('ok')"

Input formats

You provide one or more --corpus LABEL=PATH arguments.

PATH can be:

a single .txt / .md file
a folder containing .txt / .md files (recursively)
a .zip archive containing .txt / .md files (recursively)

Examples of folder layouts that work:

Single corpus

my_corpus/
  speech1.txt
  speech2.txt
  speech3.txt

Multiple corpora

corpora/
  A/
    doc1.txt
    doc2.txt
  B/
    doc3.txt
    doc4.txt

You can point each corpus to its subfolder:

--corpus A=corpora/A --corpus B=corpora/B

Run examples

1) Characterize a single document

python stylometry_run.py --task characterize --corpus TextA=./speech1.txt --output ./out_textA

2) Build a profile from many documents (single corpus)

python stylometry_run.py --task profile_build --corpus PersonX=./my_corpus --output ./out_personx

3) Compare two corpora

python stylometry_run.py --task compare --corpus A=./corpora/A --corpus B=./corpora/B --output ./out_compare

4) Use zip archives

python stylometry_run.py --task compare --corpus A=./A.zip --corpus B=./B.zip --output ./out_compare_zip

Outputs

The output folder contains:

manifest.json — corpus manifest (doc list + word counts + local provenance paths)
doc_metrics.csv — per-document metrics
chunk_metrics.csv — per-chunk metrics
ResultBundle_ArtifactExtractor.json — SAO-compatible bundle describing artifacts produced
run_metadata.json — parameters and reproducibility info

If 2+ corpora and enough chunks:

corpus_similarity_char_ngrams.csv
chunk_assignments_char_ngrams.csv
ResultBundle_Comparator.json

If matplotlib is installed and working, it also saves:

plot_avg_sentence_len_boxplot.png
plot_mattr_boxplot.png

Useful options

--chunk-words 1200 — set chunk size (default 1200)
--mattr-window 500 — MATTR window size (default 500)
--function-words-file path.txt — override function word list (newline-delimited)
--include-chunk-text — include chunk text in chunk_metrics.csv (can be large)
--char-analyzer char_wb|char — default char_wb (often better for stylometry)
--max-features 50000 and --min-df 2 — control n-gram feature size

Notes for political/public-figure corpora

Prepared remarks and official publications can reflect speechwriters, staff editing, or transcript normalization. Use “channel-specific” corpora where possible (e.g., floor speeches vs press releases vs prepared remarks).

Troubleshooting

If plots aren’t produced: ensure matplotlib installed and you have write permission.
If Unicode errors: convert source files to UTF-8, or the script will fall back to forgiving decodes.
If it’s slow on huge corpora: increase --min-df, reduce --max-features, or reduce corpus size.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

spectredeath

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.2

Jan 16, 2026

This version

1.0.0

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stylometry_cli-1.0.0.tar.gz (28.8 kB view details)

Uploaded Jan 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stylometry_cli-1.0.0-py3-none-any.whl (29.6 kB view details)

Uploaded Jan 15, 2026 Python 3

File details

Details for the file stylometry_cli-1.0.0.tar.gz.

File metadata

Download URL: stylometry_cli-1.0.0.tar.gz
Upload date: Jan 15, 2026
Size: 28.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stylometry_cli-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4513f192425d9acddeabe5d34279f146f779670c0dbd729ad54b5dc0c99eee2d`
MD5	`b1ed9322b77bd69421601a09e4241500`
BLAKE2b-256	`5d3fb87e73cbb2313e80530b59519fe22e305399b4df417c12ebefcd58ea0017`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stylometry_cli-1.0.0.tar.gz:

Publisher: publish.yml on SpectreDeath/stylometry-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stylometry_cli-1.0.0.tar.gz
- Subject digest: 4513f192425d9acddeabe5d34279f146f779670c0dbd729ad54b5dc0c99eee2d
- Sigstore transparency entry: 829114320
- Sigstore integration time: Jan 15, 2026
Source repository:
- Permalink: SpectreDeath/stylometry-cli@b323d0d6ad52bf1b3360d461eb6e33baaea63d12
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/SpectreDeath
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b323d0d6ad52bf1b3360d461eb6e33baaea63d12
- Trigger Event: release

File details

Details for the file stylometry_cli-1.0.0-py3-none-any.whl.

File metadata

Download URL: stylometry_cli-1.0.0-py3-none-any.whl
Upload date: Jan 15, 2026
Size: 29.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stylometry_cli-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`541adf6e270b47d6232d97c4be124817410ea521fef4a1a1bd18cfbfc21df92a`
MD5	`b4d8330dd7def0ce7cefcf9085742590`
BLAKE2b-256	`e401066c956387cb1a719cd467d5984d5af2fbc3f258a93cc945bf1612394c26`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stylometry_cli-1.0.0-py3-none-any.whl:

Publisher: publish.yml on SpectreDeath/stylometry-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stylometry_cli-1.0.0-py3-none-any.whl
- Subject digest: 541adf6e270b47d6232d97c4be124817410ea521fef4a1a1bd18cfbfc21df92a
- Sigstore transparency entry: 829114327
- Sigstore integration time: Jan 15, 2026
Source repository:
- Permalink: SpectreDeath/stylometry-cli@b323d0d6ad52bf1b3360d461eb6e33baaea63d12
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/SpectreDeath
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b323d0d6ad52bf1b3360d461eb6e33baaea63d12
- Trigger Event: release

stylometry-cli 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Stylometry CLI (local/offline) — v1.0

What it does

Requirements

Install (Windows PowerShell)

Input formats

Run examples

1) Characterize a single document

2) Build a profile from many documents (single corpus)

3) Compare two corpora

4) Use zip archives

Outputs

Useful options

Notes for political/public-figure corpora

Troubleshooting

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance