Skip to main content

Count words in LaTeX documents while ignoring commands, math, and common non-text regions.

Project description

latex-wc

A small CLI tool that counts words in LaTeX .tex files while trying to ignore LaTeX “noise”:

  • Removes comments (% ...)
  • Removes common math forms ($...$, $$...$$, \(...\), \[...\], and common math environments)
  • Drops common non-content commands (e.g., citations/refs/urls/labels)
  • Strips LaTeX command names while preserving human-visible brace text
  • Tokenizes words and reports totals + top-N frequencies

Optionally writes:

  • words.txt (one token per line)
  • top_words.csv (ranked word frequency table)

Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.


Install (recommended: isolated CLI via uv / pipx)

Distribution name: latex-word-count
CLI command: latex-wc
Import package: latex_wc

Option A: One-off run with uvx (no install)

uvx latex-wc ./paper.tex

If you want directory recursion:

uvx latex-wc ./tex/

Option B: Install as a persistent tool with uv

uv tool install latex-wc
latex-wc ./paper.tex

Upgrade later:

uv tool upgrade latex-wc

Option C: Install as a persistent tool with pipx

pipx install latex-wc
latex-wc ./paper.tex

Upgrade later:

pipx upgrade latex-wc

Usage

Basic

Pass either a file or a directory:

latex-wc ./paper.tex
latex-wc ./thesis/     # recursively counts all *.tex under ./thesis (one combined report)

Backwards-compatible flag

--document-path is still supported (positional PATH wins if both are provided):

latex-wc --document-path ./paper.tex
latex-wc ./paper.tex --document-path ./ignored.tex

Arguments

  • PATH (positional, optional) Path to a .tex file or a directory. If omitted: uses $DOCUMENT_PATH or searches the current directory recursively.

  • --top N Number of top words to display. Default: 100

  • --min-len N Minimum token length to include. Default: 1

  • --out-dir DIR If set, writes words.txt and top_words.csv into this directory. Default: $LOG_DIR if set; if empty, nothing is written.

  • --debug Enables verbose debug logging to stderr (stdout remains the main report output).

Examples

# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc ./paper.tex --top 50 --min-len 4

# Count all .tex files under a directory (combined report)
latex-wc ./tex/ --top 25

# Write outputs to ./logs/
latex-wc ./paper.tex --out-dir ./logs

# Use env vars (no args)
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc

# Verbose debug logs
latex-wc ./paper.tex --debug

Output

The CLI prints:

  • Document path (file mode) or directory + number of files (directory mode)
  • Total words
  • Unique words
  • Top-N word frequency list

If --out-dir is set, two files are written:

  • words.txt — one token per line
  • top_words.csvrank,word,count

Development (repo)

This section is for contributors; most users should use uvx, uv tool, or pipx above.

Requirements:

  • Python >=3.11
  • uv

Common commands:

make sync
make test
make lint
make build

Project layout:

.
├── src/
│   └── latex_wc/
│       ├── cli.py
│       ├── discovery.py
│       ├── latex_tokens.py
│       ├── counting.py
│       ├── writers.py
│       ├── io_utils.py
│       └── models.py
└── tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex_wc-0.2.9.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex_wc-0.2.9-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file latex_wc-0.2.9.tar.gz.

File metadata

  • Download URL: latex_wc-0.2.9.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.9.tar.gz
Algorithm Hash digest
SHA256 f860730e1c48c03622fcc7b10e1e52b870543599cecd0ca5c93fd484ae093ae8
MD5 c725a3568799ba35d7196512a53be8b5
BLAKE2b-256 e71d4ff7e0e0eb8cec5f2f16e9208ce250d45a12374a63a461ae931f89863d34

See more details on using hashes here.

File details

Details for the file latex_wc-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: latex_wc-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1b517b2c6e80e9345aef0bda2dbdb1db2891d716318044f1f620c1f47b03b811
MD5 235a2baf3af5eb2bb12b5ccece6d5913
BLAKE2b-256 f056db5c6a006b6297e07d28b6bc73757764694289fada3f9f33f9325b563980

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page