Skip to main content

Count words in LaTeX documents while ignoring commands, math, and common non-text regions.

Project description

LaTeX Word Count

A small CLI tool that counts words in a LaTeX .tex file while trying to ignore LaTeX “noise”:

  • Removes comments (% ...)
  • Removes common math forms ($...$, $$...$$, \(...\), \[...\], and common math environments)
  • Drops common non-content commands (e.g., citations/refs/urls/labels)
  • Strips LaTeX command names while preserving human-visible brace text
  • Tokenizes words and reports totals + top-N frequencies

It can also optionally write:

  • words.txt (one token per line)
  • top_words.csv (ranked word frequency table)

Heuristic by design: the goal is a human-ish word count, not a TeX-perfect parse.


Install (PyPI)

This project is designed to be used as an isolated CLI tool via uv.

One-off run (no project setup)

uvx latex-wc --document-path ./paper.tex

Install as a persistent tool

uv tool install latex-wc
latex-wc --document-path ./paper.tex

Distribution name: latex-word-count CLI command: latex-wc Import package: latex_wc


Usage

Basic

latex-wc --document-path ./paper.tex

Arguments

  • --document-path Path to the .tex file. Default: $DOCUMENT_PATH if set, otherwise ./current_doc.tex.

  • --top Number of top words to display. Default: 100.

  • --min-len Minimum token length to include. Default: 1.

  • --out-dir If set, writes words.txt and top_words.csv into this directory. Default: $LOG_DIR if set; if empty, nothing is written.

Examples:

# Count words, show top 50, ignore tokens shorter than 4 chars
latex-wc --document-path ./paper.tex --top 50 --min-len 4

# Write outputs to ./logs/
latex-wc --document-path ./paper.tex --out-dir ./logs

# Use environment variables instead of flags
DOCUMENT_PATH=./paper.tex LOG_DIR=./logs latex-wc

Output

The CLI prints:

  • Document path
  • Total words
  • Unique words
  • Top-N word frequency list

If --out-dir is set, two files are written:

  • words.txt — one token per line
  • top_words.csvrank,word,count

Build / Local Development

Requirements

  • Python >=3.11
  • uv installed

Sync deps

make sync

Run locally (repo version)

make run

Or test against the included sample:

make sample

Lint / Format

make lint

Tests

make test

Build artifacts (wheel + sdist)

make build

Preflight checks (build + metadata)

make preflight

Using a local build in another repo (preflight install)

After building in this repo (make build), you can install the wheel or sdist into any other directory using uv only.

From the other repo:

uv init --layout=bare
uv add /ABS/PATH/TO/dist/latex_word_count-0.1.0-py3-none-any.whl
uv run -- latex-wc --document-path ./paper.tex

You can also install the sdist:

uv add /ABS/PATH/TO/dist/latex_word_count-0.1.0.tar.gz

Project Layout

.
├── current_doc.tex
├── Makefile
├── pyproject.toml
├── src/
│   └── latex_wc/
│       ├── cli.py
│       ├── latex_tokens.py
│       ├── counting.py
│       ├── writers.py
│       ├── io_utils.py
│       └── models.py
├── tests/
└── uv.lock

Notes / Behavior

  • Encoding: reads as UTF-8, falls back to Latin-1 if needed.
  • LaTeX handling is heuristic (by design). It aims to approximate a human word count and intentionally drops some content associated with references/URLs/etc.
  • Tokenization is English-letter oriented (A-Za-z with optional apostrophes).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex_wc-0.1.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex_wc-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file latex_wc-0.1.0.tar.gz.

File metadata

  • Download URL: latex_wc-0.1.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6f29e1b37ccfb3b9de7151d4dddedacf543efc5c1e1bc9df377227cf9ce643a
MD5 688b080a5fc2934ee5f543a4b1d28733
BLAKE2b-256 c3ae8d601e519b27e8610823f7d7958325099dfe45579d5f917574ab26c36288

See more details on using hashes here.

File details

Details for the file latex_wc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: latex_wc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for latex_wc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c0c28625eecb9d8b86c89652347b5b1f110365c75293ec2b2a95899c36b8cc7
MD5 9a529ebf87057264983a01cd73c4413d
BLAKE2b-256 938d3f121aab9e920f48842c6afbe810c408de995294b29d945d7d1d2b4dd515

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page