Skip to main content

AST-aware llms.txt generator for Python and JavaScript/TypeScript codebases

Project description

llmstxt-gen

AST-aware llms.txt generator for Python, JavaScript/TypeScript, and Go codebases.

PyPI version Python versions License: MIT CI

What problem this solves

LLM coding agents work best when they have an accurate, up-to-date map of the code they are working on. The llms.txt standard exists to give them exactly that: a single Markdown file at the root of a project that lists the public surface area and points at deeper documentation.

Most existing generators build that file by scraping a project's published docs site. Scrapers go stale the moment your code changes, they bring along marketing prose the agent does not need, and they cannot describe code that has not been documented yet. The result is an llms.txt that confidently lists deprecated APIs.

llmstxt-gen takes a different approach. It reads your Python, JavaScript/TypeScript, or Go source code directly, parses it with tree-sitter into an Abstract Syntax Tree, and extracts the things an agent actually needs: function signatures, type hints, docstrings, class hierarchies, and exported symbols. The result is a token-efficient, always-current Markdown file you can regenerate from a pre-commit hook or a CI job.

No scraping. No cloud calls. No framework lock-in.

Installation

pip install llmstxt-gen

Requires Python 3.11 or newer. The PyPI distribution name is llmstxt-gen; the installed CLI command and Python import name are both llmstxt-gen.

Quick start

From the root of any Python, JavaScript/TypeScript, or Go project:

llmstxt-gen generate

You will get two files in the project root:

  • llms.txt: a compact summary suitable for inclusion in an agent's initial context
  • llms-full.txt: the full detailed reference

To preview without writing files:

llmstxt-gen generate --dry-run

To get a quick read on what would be included:

llmstxt-gen stats

Example output

A small Python module like:

"""Tiny calculator module."""

def add(a: int, b: int = 0) -> int:
    """Return the sum of a and b."""
    return a + b

produces this entry in llms-full.txt:

## src/calc.py

Tiny calculator module.

### Functions

#### `add(a: int, b: int = 0) -> int`

Return the sum of a and b.

and a one-line entry in llms.txt:

calc: Tiny calculator module.

Configuration

All options live in your pyproject.toml under [tool.llmstxt_gen]. Every key is optional.

Option Type Default Description
name string directory name Project name shown in the heading
description string "" Short tagline shown as a blockquote
version string "" Project version
include list of strings [] (all) Paths to scan, relative to the repo root
exclude list of strings [] Additional patterns to skip, beyond .gitignore
extensions list of strings [".py", ".js", ".jsx", ".ts", ".tsx", ".go"] File extensions to consider
output_dir string "." Where to write the output files
output_summary string "llms.txt" Filename for the summary file
output_full string "llms-full.txt" Filename for the full reference
include_private bool false Include private or non-exported symbols
max_tokens_summary int 8000 Token budget for llms.txt
max_tokens_full int 32000 Token budget for llms-full.txt
languages list of strings ["python", "typescript", "go"] Parsers to activate

Example:

[tool.llmstxt_gen]
include = ["src/"]
exclude = ["src/internal/"]
include_private = false
max_tokens_summary = 6000

CI integration

Pre-commit hook

repos:
  - repo: local
    hooks:
      - id: llmstxt-gen
        name: llmstxt-gen
        entry: llmstxt-gen generate
        language: system
        pass_filenames: false
        always_run: true

GitHub Actions

name: Update llms.txt
on:
  push:
    branches: [main]

jobs:
  update:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install llmstxt-gen
      - run: llmstxt-gen generate
      - uses: stefanzweifel/git-auto-commit-action@v5
        with:
          commit_message: "chore: refresh llms.txt"

More integrations live in docs/ci-integration.md.

How it compares to scraper-based approaches

Scrapers like llmstxt.org generators crawl a published documentation site and concatenate the rendered HTML. They work without source access, which is their main advantage. The drawbacks are real:

  • They cannot describe undocumented code, so newer modules are invisible.
  • They drift the moment your code lands faster than your docs site rebuilds.
  • They include navigation chrome, marketing copy, and rendered examples that bloat the agent's context window.
  • They cannot reliably recover type information, since rendered HTML is lossy.

llmstxt-gen reads the source. It will always reflect what is actually in the repository, and it produces output that maps one-to-one with the symbols an agent will end up calling.

Contributing

See CONTRIBUTING.md. Bug reports and pull requests are welcome.

License

MIT. See LICENSE.

Roadmap (not yet implemented)

  • Rust port for large monorepos
  • Parser support for Ruby and Java
  • Optional semantic pruning via a local model
  • A hosted GitHub App for zero-config setup

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmstxt_gen-0.2.0.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmstxt_gen-0.2.0-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file llmstxt_gen-0.2.0.tar.gz.

File metadata

  • Download URL: llmstxt_gen-0.2.0.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmstxt_gen-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f13b9a95d2f0caa4667d9320e4d2527228082f4e3984b48f4f2f0242c59fa72a
MD5 c74bd1fa48357676b20682a23cec151a
BLAKE2b-256 51270b48c64895c1b6c4e885f87649491c6bde10d6d4a64de06d17433f18a16a

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmstxt_gen-0.2.0.tar.gz:

Publisher: publish.yml on wuzzzzaah/llmstxt-gen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmstxt_gen-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llmstxt_gen-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmstxt_gen-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a0b3f2ad31fdc5a17655054e00774f78c81b21f6f4d918ad76812178148bf3a
MD5 7b3d36d0935475d79d302a17e2848758
BLAKE2b-256 204c3610991aa8d502b330faa44a4b7cddde3b75254622b3d063566fa7c6da93

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmstxt_gen-0.2.0-py3-none-any.whl:

Publisher: publish.yml on wuzzzzaah/llmstxt-gen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page