Skip to main content

Count words in LaTeX documents while ignoring commands, math, and common non-text regions.

Project description

latex-wc

A small CLI tool that counts words in LaTeX .tex files while trying to ignore LaTeX “noise”:

  • Removes comments (% ...)
  • Removes common math forms ($...$, $$...$$, \(...\), \[...\], and common math environments)
  • Drops common non-content commands (e.g., citations/refs/urls/labels)
  • Strips LaTeX command names while preserving human-visible brace text
  • Tokenizes words and reports totals + top-N frequencies

Optionally writes:

  • words.txt (one token per line)
  • top_words.csv (ranked word frequency table)

Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.


Install (recommended: isolated CLI via uv / pipx)

Distribution name: latex-word-count
CLI command: latex-wc
Import package: latex_wc

Option A: One-off run with uvx (no install)

uvx latex-wc ./paper.tex

If you want directory recursion:

uvx latex-wc ./tex/

Option B: Install as a persistent tool with uv

uv tool install latex-wc
latex-wc ./paper.tex

Upgrade later:

uv tool upgrade latex-wc

Option C: Install as a persistent tool with pipx

pipx install latex-wc
latex-wc ./paper.tex

Upgrade later:

pipx upgrade latex-wc

Usage

Basic

Pass either a file or a directory:

latex-wc ./paper.tex
latex-wc ./thesis/     # recursively counts all *.tex under ./thesis (one combined report)

Backwards-compatible flag

--document-path is still supported (positional PATH wins if both are provided):

latex-wc --document-path ./paper.tex
latex-wc ./paper.tex --document-path ./ignored.tex

Arguments

  • PATH (positional, optional) Path to a .tex file or a directory. If omitted: uses $DOCUMENT_PATH or searches the current directory recursively.

  • --top N Number of top words to display. Default: 100

  • --min-len N Minimum token length to include. Default: 1

  • --out-dir DIR If set, writes words.txt and top_words.csv into this directory. Default: $LOG_DIR if set; if empty, nothing is written.

  • --debug Enables verbose debug logging to stderr (stdout remains the main report output).

Examples

# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc ./paper.tex --top 50 --min-len 4

# Count all .tex files under a directory (combined report)
latex-wc ./tex/ --top 25

# Write outputs to ./logs/
latex-wc ./paper.tex --out-dir ./logs

# Use env vars (no args)
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc

# Verbose debug logs
latex-wc ./paper.tex --debug

Output

The CLI prints:

  • Document path (file mode) or directory + number of files (directory mode)
  • Total words
  • Unique words
  • Top-N word frequency list

If --out-dir is set, two files are written:

  • words.txt — one token per line
  • top_words.csvrank,word,count

Development (repo)

This section is for contributors; most users should use uvx, uv tool, or pipx above.

Requirements:

  • Python >=3.11
  • uv

Common commands:

make sync
make test
make lint
make build

Project layout:

.
├── src/
│   └── latex_wc/
│       ├── cli.py
│       ├── discovery.py
│       ├── latex_tokens.py
│       ├── counting.py
│       ├── writers.py
│       ├── io_utils.py
│       └── models.py
└── tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex_wc-0.2.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex_wc-0.2.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file latex_wc-0.2.0.tar.gz.

File metadata

  • Download URL: latex_wc-0.2.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.0.tar.gz
Algorithm Hash digest
SHA256 80881bc3967926d89cdaeff3569856d69b19faa63e805a84293f964ebce50aff
MD5 58ad1bb79af54dc9d54c20b4be2a7f46
BLAKE2b-256 e2e9008f1305ff6594088ea5a983b1c9d0ac89f2b33c06d0e24448bcbb3ed6aa

See more details on using hashes here.

File details

Details for the file latex_wc-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: latex_wc-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f52a55c311253a8eb8ce86e3d96e3a44b4a114286ac357333712578aa4c436e
MD5 20c12a087d89c848b234ed9fa835af5b
BLAKE2b-256 8b99e532b0db5a56eef5ade8da63daf7b02b6551a0a3116b199e52f1be400509

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page