Build a high-quality llms.txt for any website. Model-agnostic, SSRF-safe, no hallucinated URLs.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mackgrenfell

These details have not been verified by PyPI

Project links

Homepage

Project description

llmstxt-generator

Build a high-quality llms.txt for any website — from one command.

llmstxt-gen stripe.com

It crawls a site the same way an AI agent would — homepage, sitemap, robots.txt, the highest-signal pages — and writes a clean, spec-compliant llms.txt that gives models a faithful map of what the site is and where its important content lives. Every link in the output is one the generator actually saw: no invented URLs.

Model-agnostic by design. Runs against OpenAI, Anthropic, DeepSeek, Together, OpenRouter, Groq, or a local Ollama with a single flag.

Built by Trakkr — the AI visibility platform. This is the open-source engine behind Trakkr's free llms.txt tool.

What is `llms.txt`?

llms.txt is a simple Markdown file at a site's root (example.com/llms.txt) that tells AI models and agents what a site is about and which pages matter, without making them wade through navigation, scripts, and boilerplate. Think of it as robots.txt for meaning instead of access — a curated, machine-readable index of your most important content. The format is defined at llmstxt.org.

It's moving from convention to standard. In May 2026, Google added an llms.txt check to Lighthouse's new Agentic Browsing audit, putting it alongside the performance and accessibility signals teams already track. A good llms.txt is fast becoming table stakes for being well-represented in AI search and assistants.

Quickstart

pip install llmstxt-generator      # or: pipx install llmstxt-generator
# or install the latest straight from source:
#   pip install "git+https://github.com/trakkr-aisearch/llms-txt-generator"
export OPENAI_API_KEY=sk-...        # the only thing the default needs

llmstxt-gen stripe.com             # print to stdout
llmstxt-gen stripe.com -o llms.txt # write to a file
llmstxt-gen stripe.com --verbose   # watch the live discovery trace

As a library:

from llmstxt_generator import generate_llms_txt

result = generate_llms_txt("stripe.com")   # needs OPENAI_API_KEY
print(result.content)

print(result.pages_read, "pages read")
print(result.validation["link_count"], "links")
print(result.validation["dropped_invented_links"], "hallucinated URLs dropped")

Example output

Real output from llmstxt-gen stripe.com (default model, ~$0.001, ~20s, 23 links, 0 hallucinated URLs dropped). Trimmed for length — the full files for Stripe, Vercel, and Anthropic are in examples/.

# Stripe

> Stripe is a financial services platform that provides businesses with tools to
> accept payments, manage financial operations, and implement custom revenue
> models. It serves a diverse range of clients, from startups to large
> enterprises, across various industries.

## Payments Solutions

- [Stripe Payments](https://stripe.com/payments): Accept payments online and in person globally with a payments solution built for any business.
- [Payment methods](https://stripe.com/payments/payment-methods): Explore popular local payment methods to improve conversion rates for businesses.
- [Stripe Payments documentation](https://docs.stripe.com/payments.md): A guide to integrating Stripe's payments APIs.

## Connect Solutions

- [Stripe Connect](https://stripe.com/connect): Embed payments into products with seamless onboarding and global payouts.
- [Marketplace payments](https://stripe.com/connect/marketplaces): Tools for onboarding and paying out freelancers and sellers.

## Enterprise Solutions

- [Enterprise Payment Solutions for Large Businesses](https://stripe.com/enterprise): Tailored financial solutions for large enterprises.
- [Pricing & Fees](https://stripe.com/pricing): Details on Stripe's processing fees and pricing models for businesses.

## Optional

- [Stripe Newsroom](https://stripe.com/newsroom): Latest news and updates about Stripe's partnerships and innovations.
- [Legal](https://stripe.com/legal): Access Stripe's legal documents and policies.

Use any model

The default is OpenAI's gpt-4o-mini (cheap, fast, widely available). Switch providers with a flag or an env var — any OpenAI-compatible Chat Completions endpoint works, plus a native Anthropic adapter.

llmstxt-gen stripe.com --provider deepseek
llmstxt-gen stripe.com --provider anthropic --model claude-haiku-4-5-20251001
llmstxt-gen stripe.com --provider openrouter --model openai/gpt-4o-mini
LLMSTXT_PROVIDER=ollama llmstxt-gen stripe.com   # local, no key

Provider matrix

Provider	`--provider`	API key env	Default model	Notes
OpenAI	`openai` (default)	`OPENAI_API_KEY`	`gpt-4o-mini`	Works out of the box
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	`claude-haiku-4-5-20251001`	`pip install 'llmstxt-generator[anthropic]'`
DeepSeek	`deepseek`	`DEEPSEEK_API_KEY`	`deepseek-chat`	OpenAI-compatible
Together	`together`	`TOGETHER_API_KEY`	`meta-llama/Llama-3.3-70B-Instruct-Turbo`	OpenAI-compatible
OpenRouter	`openrouter`	`OPENROUTER_API_KEY`	`openai/gpt-4o-mini`	Any model on OpenRouter
Groq	`groq`	`GROQ_API_KEY`	`llama-3.3-70b-versatile`	OpenAI-compatible
Ollama	`ollama`	(none)	`llama3.1`	Local `http://localhost:11434/v1`
Any other	custom	`LLMSTXT_API_KEY`	set `--model`	Point `--base-url` at any OpenAI-compatible API

Override anything via env: LLMSTXT_PROVIDER, LLMSTXT_MODEL, LLMSTXT_BASE_URL, LLMSTXT_API_KEY. Arguments beat env; env beats defaults.

# A custom OpenAI-compatible gateway:
LLMSTXT_BASE_URL=https://my-gateway.internal/v1 \
LLMSTXT_API_KEY=... \
llmstxt-gen stripe.com --provider custom --model my-model

How it works

A fixed four-phase pipeline — no open-ended agent loop, so the cost and runtime are bounded and predictable (roughly a cent or less per site on the default model).

1. Discover  ──  fetch the homepage, robots.txt, sitemap.xml, and any existing
                 llms.txt; optionally ask the model what it knows about the brand
                 cold (to sharpen the summary, never to invent page content).

2. Enrich    ──  score every discovered URL (shallow + high-value slugs win),
                 then fetch the top pages for their real titles and descriptions.

3. Compose   ──  one streamed model call writes the llms.txt live, grounded only
                 in the pages we actually read.

4. Finalize  ──  strip code fences, validate every link against what we saw,
                 de-duplicate, drop emptied sections, and score the structure.

No hallucinated URLs. Phase 4 checks every link against the set of URLs the crawler actually discovered. When discovery is rich, on-site URLs the model assembled from real context are allowed; when discovery is sparse (a bot-walled or JS-only site), it switches to strict mode and keeps only URLs it literally saw — so the model can't fabricate a site map from memory. Duplicate links (the "eleven titles all pointing at the homepage" failure) are collapsed.

Same-site only, redirect-aware. Links are constrained to the apex domain and its subdomains. The effective host is taken from where the homepage actually resolved, so apex→www and rebrand redirects are handled correctly.

SSRF-safe. Every outbound fetch is screened by _safe_url: non-HTTP schemes, localhost, cloud metadata endpoints, and private / loopback / link-local / reserved IP ranges are all refused. Safe to point at user-supplied domains.

CLI reference

llmstxt-gen DOMAIN [options]

  -o, --output FILE       Write the file here instead of stdout.
  -v, --verbose           Print the live discovery/compose trace to stderr.
      --json              Emit the full result (file + stats) as JSON.

  --provider NAME         openai | anthropic | deepseek | together |
                          openrouter | groq | ollama | <custom>
  --model NAME            Override the provider's default model.
  --base-url URL          OpenAI-compatible base URL (for custom endpoints).
  --api-key KEY           API key (prefer env vars for secrets).

  --max-pages N           Max pages to read for real titles/metas (default 12).
  --no-cold-knowledge     Skip the cold-knowledge prior.
  --version

stdout receives only the llms.txt, so it pipes cleanly; the trace and diagnostics go to stderr.

Library API

from llmstxt_generator import (
    generate_llms_txt,         # sync, returns LlmsTxtResult
    generate_llms_txt_async,   # async, returns LlmsTxtResult
    generate_llms_txt_stream,  # async generator of trace events
    resolve_config,            # build a GeneratorConfig from env/args
    GeneratorConfig,
)

# Override provider/model/tuning inline:
result = generate_llms_txt("stripe.com", provider="deepseek", max_enrich_pages=20)

# Or stream the trace yourself:
import asyncio
async def main():
    async for event in generate_llms_txt_stream("stripe.com"):
        print(event["type"])
asyncio.run(main())

LlmsTxtResult carries content, structure, validation, pages_read, pages_discovered, tokens, cost_usd, elapsed_s, and more.

Development

git clone https://github.com/trakkr-aisearch/llms-txt-generator
cd llmstxt-generator
pip install -e ".[dev]"
pytest          # the test suite is fully offline — no network, no API key

See CONTRIBUTING.md.

License

MIT © Trakkr. See LICENSE.

Made by Trakkr — track and improve how your brand shows up in ChatGPT, Perplexity, Gemini, Google AI Overviews, and Claude. If this tool is useful, Trakkr is the platform behind it.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mackgrenfell

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

May 29, 2026

0.1.1

May 29, 2026

This version

0.1.0

May 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmstxt_generator-0.1.0.tar.gz (36.0 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmstxt_generator-0.1.0-py3-none-any.whl (30.3 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file llmstxt_generator-0.1.0.tar.gz.

File metadata

Download URL: llmstxt_generator-0.1.0.tar.gz
Upload date: May 29, 2026
Size: 36.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmstxt_generator-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1144d02813c9dccc73996e79dbf56d177833e8cbcd77b400853cad85e302b845`
MD5	`d1855e686f99d8f20aa65f6ea51a60a9`
BLAKE2b-256	`f4993ad3bbd78e083d4cb185931a0fe033e2d671e24f356fdfeca95a9bbbb35c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmstxt_generator-0.1.0.tar.gz:

Publisher: publish.yml on trakkr-aisearch/llms-txt-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmstxt_generator-0.1.0.tar.gz
- Subject digest: 1144d02813c9dccc73996e79dbf56d177833e8cbcd77b400853cad85e302b845
- Sigstore transparency entry: 1672220004
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: trakkr-aisearch/llms-txt-generator@1ef7aeadc3fb689aca3d1aa0c39bf531b033ed46
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/trakkr-aisearch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1ef7aeadc3fb689aca3d1aa0c39bf531b033ed46
- Trigger Event: release

File details

Details for the file llmstxt_generator-0.1.0-py3-none-any.whl.

File metadata

Download URL: llmstxt_generator-0.1.0-py3-none-any.whl
Upload date: May 29, 2026
Size: 30.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmstxt_generator-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b188c2b9ad92cf3b1a1ae2351a78a8b10331b1af8fe1f978d04717a9a019e54b`
MD5	`447a6f005552ff272f15a798422919a5`
BLAKE2b-256	`412a74eb2e342e811b6b2b4faa3b3dce282cf1648b0e0929264526965a3c62e3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmstxt_generator-0.1.0-py3-none-any.whl:

Publisher: publish.yml on trakkr-aisearch/llms-txt-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmstxt_generator-0.1.0-py3-none-any.whl
- Subject digest: b188c2b9ad92cf3b1a1ae2351a78a8b10331b1af8fe1f978d04717a9a019e54b
- Sigstore transparency entry: 1672220023
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: trakkr-aisearch/llms-txt-generator@1ef7aeadc3fb689aca3d1aa0c39bf531b033ed46
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/trakkr-aisearch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1ef7aeadc3fb689aca3d1aa0c39bf531b033ed46
- Trigger Event: release

llmstxt-generator 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmstxt-generator

What is llms.txt?

Quickstart

Example output

Use any model

Provider matrix

How it works

CLI reference

Library API

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

What is `llms.txt`?