Count words in LaTeX documents while ignoring commands, math, and common non-text regions.
Project description
LaTeX Word Count
A small CLI tool that counts words in a LaTeX .tex file while trying to ignore LaTeX “noise”:
- Removes comments (
% ...) - Removes common math forms (
$...$,$$...$$,\(...\),\[...\], and common math environments) - Drops common non-content commands (e.g., citations/refs/urls/labels)
- Strips LaTeX command names while preserving human-visible brace text
- Tokenizes words and reports totals + top-N frequencies
It can also optionally write:
words.txt(one token per line)top_words.csv(ranked word frequency table)
Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.
Install (PyPI)
This project is designed to be used as an isolated CLI tool via uv.
One-off run (no project setup)
uvx latex-wc --document-path ./paper.tex
Install as a persistent tool
uv tool install latex-wc
latex-wc --document-path ./paper.tex
Distribution name:
latex-word-countCLI command:latex-wcImport package:latex_wc
Usage
Basic
latex-wc --document-path ./paper.tex
Arguments
-
--document-pathPath to the.texfile. Default:$DOCUMENT_PATHif set, otherwise./current_doc.tex. -
--topNumber of top words to display. Default:100. -
--min-lenMinimum token length to include. Default:1. -
--out-dirIf set, writeswords.txtandtop_words.csvinto this directory. Default:$LOG_DIRif set; if empty, nothing is written.
Examples:
# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc --document-path ./paper.tex --top 50 --min-len 4
# Write outputs to ./logs/
latex-wc --document-path ./paper.tex --out-dir ./logs
# Use environment variables instead of flags
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc
Output
The CLI prints:
- Document path
- Total words
- Unique words
- Top-N word frequency list
If --out-dir is set, two files are written:
words.txt— one token per linetop_words.csv—rank,word,count
Build / Local Development
Requirements
- Python
>=3.11 uvinstalled
Sync deps
make sync
Run locally (repo version)
make run
Or test against the included sample:
make sample
Lint / Format
make lint
Tests
make test
Build artifacts (wheel + sdist)
make build
Preflight checks (build + metadata)
make preflight
Using a local build in another repo (preflight install)
After building in this repo (make build), you can install the wheel or sdist into any other directory using uv only.
From the other repo:
uv init --layout=bare
uv add /ABS/PATH/TO/dist/latex_word_count-0.1.0-py3-none-any.whl
uv run -- latex-wc --document-path ./paper.tex
You can also install the sdist:
uv add /ABS/PATH/TO/dist/latex_word_count-0.1.0.tar.gz
Project Layout
.
├── current_doc.tex
├── Makefile
├── pyproject.toml
├── src/
│ └── latex_wc/
│ ├── cli.py
│ ├── latex_tokens.py
│ ├── counting.py
│ ├── writers.py
│ ├── io_utils.py
│ └── models.py
├── tests/
└── uv.lock
Notes / Behavior
- Encoding: reads as UTF-8, falls back to Latin-1 if needed.
- LaTeX handling is heuristic (by design). It aims to approximate a human word count and intentionally drops some content associated with references/URLs/etc.
- Tokenization is English-letter oriented (
A-Za-zwith optional apostrophes).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file latex_wc-0.1.0.tar.gz.
File metadata
- Download URL: latex_wc-0.1.0.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6f29e1b37ccfb3b9de7151d4dddedacf543efc5c1e1bc9df377227cf9ce643a
|
|
| MD5 |
688b080a5fc2934ee5f543a4b1d28733
|
|
| BLAKE2b-256 |
c3ae8d601e519b27e8610823f7d7958325099dfe45579d5f917574ab26c36288
|
File details
Details for the file latex_wc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: latex_wc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c0c28625eecb9d8b86c89652347b5b1f110365c75293ec2b2a95899c36b8cc7
|
|
| MD5 |
9a529ebf87057264983a01cd73c4413d
|
|
| BLAKE2b-256 |
938d3f121aab9e920f48842c6afbe810c408de995294b29d945d7d1d2b4dd515
|