Skip to main content

Count words in LaTeX documents while ignoring commands, math, and common non-text regions.

Project description

latex-wc

A small CLI tool that counts words in LaTeX .tex files while trying to ignore LaTeX “noise”:

  • Removes comments (% ...)
  • Removes common math forms ($...$, $$...$$, \(...\), \[...\], and common math environments)
  • Drops common non-content commands (e.g., citations/refs/urls/labels)
  • Strips LaTeX command names while preserving human-visible brace text
  • Tokenizes words and reports totals + top-N frequencies

Optionally writes:

  • words.txt (one token per line)
  • top_words.csv (ranked word frequency table)

Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.


Install (recommended: isolated CLI via uv / pipx)

Distribution name: latex-word-count
CLI command: latex-wc
Import package: latex_wc

Option A: One-off run with uvx (no install)

uvx latex-wc ./paper.tex

If you want directory recursion:

uvx latex-wc ./tex/

Option B: Install as a persistent tool with uv

uv tool install latex-wc
latex-wc ./paper.tex

Upgrade later:

uv tool upgrade latex-wc

Option C: Install as a persistent tool with pipx

pipx install latex-wc
latex-wc ./paper.tex

Upgrade later:

pipx upgrade latex-wc

Usage

Basic

Pass either a file or a directory:

latex-wc ./paper.tex
latex-wc ./thesis/     # recursively counts all *.tex under ./thesis (one combined report)

Backwards-compatible flag

--document-path is still supported (positional PATH wins if both are provided):

latex-wc --document-path ./paper.tex
latex-wc ./paper.tex --document-path ./ignored.tex

Arguments

  • PATH (positional, optional) Path to a .tex file or a directory. If omitted: uses $DOCUMENT_PATH or searches the current directory recursively.

  • --top N Number of top words to display. Default: 100

  • --min-len N Minimum token length to include. Default: 1

  • --out-dir DIR If set, writes words.txt and top_words.csv into this directory. Default: $LOG_DIR if set; if empty, nothing is written.

  • --debug Enables verbose debug logging to stderr (stdout remains the main report output).

Examples

# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc ./paper.tex --top 50 --min-len 4

# Count all .tex files under a directory (combined report)
latex-wc ./tex/ --top 25

# Write outputs to ./logs/
latex-wc ./paper.tex --out-dir ./logs

# Use env vars (no args)
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc

# Verbose debug logs
latex-wc ./paper.tex --debug

Output

The CLI prints:

  • Document path (file mode) or directory + number of files (directory mode)
  • Total words
  • Unique words
  • Top-N word frequency list

If --out-dir is set, two files are written:

  • words.txt — one token per line
  • top_words.csvrank,word,count

Development (repo)

This section is for contributors; most users should use uvx, uv tool, or pipx above.

Requirements:

  • Python >=3.11
  • uv

Common commands:

make sync
make test
make lint
make build

Project layout:

.
├── src/
│   └── latex_wc/
│       ├── cli.py
│       ├── discovery.py
│       ├── latex_tokens.py
│       ├── counting.py
│       ├── writers.py
│       ├── io_utils.py
│       └── models.py
└── tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex_wc-0.2.11.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex_wc-0.2.11-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file latex_wc-0.2.11.tar.gz.

File metadata

  • Download URL: latex_wc-0.2.11.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.11.tar.gz
Algorithm Hash digest
SHA256 3568d7104a6bd853d30ed705d3520cf8145dd90220e5d84e247f2f854bb15b24
MD5 37b93d653b1e14db0db27251b9a59df7
BLAKE2b-256 63be76de7a5e4c8247b5d58472f61cc39f4449971bf1d6b7dbfebc2a52d84fd1

See more details on using hashes here.

File details

Details for the file latex_wc-0.2.11-py3-none-any.whl.

File metadata

  • Download URL: latex_wc-0.2.11-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 38804d8d91118b70f1ec43c7929b1212caf1651369b0f5a222076619dc469dad
MD5 6c8299e2c6792de28ac124193a22b00a
BLAKE2b-256 979ddb57fb9dbb28d0f7088047370ffe91bb036ce7c655c425ef72aabb400da8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page