Skip to main content

Count words in LaTeX documents while ignoring commands, math, and common non-text regions.

Project description

latex-wc

A small CLI tool that counts words in LaTeX .tex files while trying to ignore LaTeX “noise”:

  • Removes comments (% ...)
  • Removes common math forms ($...$, $$...$$, \(...\), \[...\], and common math environments)
  • Drops common non-content commands (e.g., citations/refs/urls/labels)
  • Strips LaTeX command names while preserving human-visible brace text
  • Tokenizes words and reports totals + top-N frequencies

Optionally writes:

  • words.txt (one token per line)
  • top_words.csv (ranked word frequency table)

Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.


Install (recommended: isolated CLI via uv / pipx)

Distribution name: latex-word-count
CLI command: latex-wc
Import package: latex_wc

Option A: One-off run with uvx (no install)

uvx latex-wc ./paper.tex

If you want directory recursion:

uvx latex-wc ./tex/

Option B: Install as a persistent tool with uv

uv tool install latex-wc
latex-wc ./paper.tex

Upgrade later:

uv tool upgrade latex-wc

Option C: Install as a persistent tool with pipx

pipx install latex-wc
latex-wc ./paper.tex

Upgrade later:

pipx upgrade latex-wc

Usage

Basic

Pass either a file or a directory:

latex-wc ./paper.tex
latex-wc ./thesis/     # recursively counts all *.tex under ./thesis (one combined report)

Backwards-compatible flag

--document-path is still supported (positional PATH wins if both are provided):

latex-wc --document-path ./paper.tex
latex-wc ./paper.tex --document-path ./ignored.tex

Arguments

  • PATH (positional, optional) Path to a .tex file or a directory. If omitted: uses $DOCUMENT_PATH or searches the current directory recursively.

  • --top N Number of top words to display. Default: 100

  • --min-len N Minimum token length to include. Default: 1

  • --out-dir DIR If set, writes words.txt and top_words.csv into this directory. Default: $LOG_DIR if set; if empty, nothing is written.

  • --debug Enables verbose debug logging to stderr (stdout remains the main report output).

Examples

# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc ./paper.tex --top 50 --min-len 4

# Count all .tex files under a directory (combined report)
latex-wc ./tex/ --top 25

# Write outputs to ./logs/
latex-wc ./paper.tex --out-dir ./logs

# Use env vars (no args)
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc

# Verbose debug logs
latex-wc ./paper.tex --debug

Output

The CLI prints:

  • Document path (file mode) or directory + number of files (directory mode)
  • Total words
  • Unique words
  • Top-N word frequency list

If --out-dir is set, two files are written:

  • words.txt — one token per line
  • top_words.csvrank,word,count

Development (repo)

This section is for contributors; most users should use uvx, uv tool, or pipx above.

Requirements:

  • Python >=3.11
  • uv

Common commands:

make sync
make test
make lint
make build

Project layout:

.
├── src/
│   └── latex_wc/
│       ├── cli.py
│       ├── discovery.py
│       ├── latex_tokens.py
│       ├── counting.py
│       ├── writers.py
│       ├── io_utils.py
│       └── models.py
└── tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex_wc-0.2.13.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex_wc-0.2.13-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file latex_wc-0.2.13.tar.gz.

File metadata

  • Download URL: latex_wc-0.2.13.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.13.tar.gz
Algorithm Hash digest
SHA256 bbb6f13b307fceff6ccdb22349ba6d94db0cfb31ab0bfbf37a001b386af5b7bb
MD5 8b48cf00d1a258b86e30271f48668633
BLAKE2b-256 9eb7b4cf02f3996f49e455949edf9dff132d34559e4b819f288ddb9ddca59648

See more details on using hashes here.

File details

Details for the file latex_wc-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: latex_wc-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 2ed8949c6f5953ee9f65b7f6f12fada14e25b51f0c5dea237e41dc5122c3db3a
MD5 390e0681d25274bcbf10def54a02a96d
BLAKE2b-256 7f26b418f7bd787e8d3c0dec01ad7120cab9e823338360921cf1a4336b838b90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page