Skip to main content

Count words in LaTeX documents while ignoring commands, math, and common non-text regions.

Project description

latex-wc

A small CLI tool that counts words in LaTeX .tex files while trying to ignore LaTeX “noise”:

  • Removes comments (% ...)
  • Removes common math forms ($...$, $$...$$, \(...\), \[...\], and common math environments)
  • Drops common non-content commands (e.g., citations/refs/urls/labels)
  • Strips LaTeX command names while preserving human-visible brace text
  • Tokenizes words and reports totals + top-N frequencies

Optionally writes:

  • words.txt (one token per line)
  • top_words.csv (ranked word frequency table)

Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.


Install (recommended: isolated CLI via uv / pipx)

Distribution name: latex-word-count
CLI command: latex-wc
Import package: latex_wc

Option A: One-off run with uvx (no install)

uvx latex-wc ./paper.tex

If you want directory recursion:

uvx latex-wc ./tex/

Option B: Install as a persistent tool with uv

uv tool install latex-wc
latex-wc ./paper.tex

Upgrade later:

uv tool upgrade latex-wc

Option C: Install as a persistent tool with pipx

pipx install latex-wc
latex-wc ./paper.tex

Upgrade later:

pipx upgrade latex-wc

Usage

Basic

Pass either a file or a directory:

latex-wc ./paper.tex
latex-wc ./thesis/     # recursively counts all *.tex under ./thesis (one combined report)

Backwards-compatible flag

--document-path is still supported (positional PATH wins if both are provided):

latex-wc --document-path ./paper.tex
latex-wc ./paper.tex --document-path ./ignored.tex

Arguments

  • PATH (positional, optional) Path to a .tex file or a directory. If omitted: uses $DOCUMENT_PATH or searches the current directory recursively.

  • --top N Number of top words to display. Default: 100

  • --min-len N Minimum token length to include. Default: 1

  • --out-dir DIR If set, writes words.txt and top_words.csv into this directory. Default: $LOG_DIR if set; if empty, nothing is written.

  • --debug Enables verbose debug logging to stderr (stdout remains the main report output).

Examples

# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc ./paper.tex --top 50 --min-len 4

# Count all .tex files under a directory (combined report)
latex-wc ./tex/ --top 25

# Write outputs to ./logs/
latex-wc ./paper.tex --out-dir ./logs

# Use env vars (no args)
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc

# Verbose debug logs
latex-wc ./paper.tex --debug

Output

The CLI prints:

  • Document path (file mode) or directory + number of files (directory mode)
  • Total words
  • Unique words
  • Top-N word frequency list

If --out-dir is set, two files are written:

  • words.txt — one token per line
  • top_words.csvrank,word,count

Development (repo)

This section is for contributors; most users should use uvx, uv tool, or pipx above.

Requirements:

  • Python >=3.11
  • uv

Common commands:

make sync
make test
make lint
make build

Project layout:

.
├── src/
│   └── latex_wc/
│       ├── cli.py
│       ├── discovery.py
│       ├── latex_tokens.py
│       ├── counting.py
│       ├── writers.py
│       ├── io_utils.py
│       └── models.py
└── tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex_wc-0.2.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex_wc-0.2.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file latex_wc-0.2.1.tar.gz.

File metadata

  • Download URL: latex_wc-0.2.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.1.tar.gz
Algorithm Hash digest
SHA256 733ccfc71cc42ec1507fa8f2899521d69fd0efd72959730dd20446edcbbeb7c2
MD5 6a55aad5e7191ae11018547f6f76609f
BLAKE2b-256 be42989f7a3f715b75bfa229984e731daa1b6ec70c7c7ff204b69c55be392473

See more details on using hashes here.

File details

Details for the file latex_wc-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: latex_wc-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6cdd0fac64368969ffa19ec3f974d9593f285233045a416759c862eb6ff82310
MD5 36037a29216bd9c9c1870b56a810171b
BLAKE2b-256 0c05d8ba70f04b2fa840548655510d728fad158480812d732b2318c8db513d5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page