Validate llms.txt files locally, over HTTP, and in CI.
Project description
llms-txt-audit
llms-txt-audit is a small Python CLI and GitHub Action for validating /llms.txt files before they ship.
It checks whether your file follows the shape described by the official llms.txt proposal: a Markdown file with a project title, short summary, curated resource sections, and links to the pages an LLM should read for context.
Need to create the file first? Use the AltRepo llms.txt Generator & Validator to draft a spec-shaped file, then use this CLI to protect it in CI.
Why this exists
A useful llms.txt file is curated. It should not be a blind sitemap dump, and it should not contain robots.txt permission rules.
This tool helps teams catch common mistakes:
- missing project title
- missing blockquote summary
- no H2 resource sections
- no Markdown resource links
robots.txtdirectives placed inllms.txt- private-looking URLs such as
/admin,/account,/checkout, or/wp-admin - duplicate links
- links with no explanation
- relative links when absolute URLs are required
Install
For local development from this repository:
python -m pip install -e .
After installation, the command is available as:
llms-txt-audit --help
You can also run it without installing by setting PYTHONPATH:
PYTHONPATH=src python -m llms_txt_audit.cli public/llms.txt
Basic usage
Audit a local file:
llms-txt-audit public/llms.txt
Audit a directory that contains llms.txt:
llms-txt-audit public/
Audit a hosted file:
llms-txt-audit https://example.com/llms.txt
Discover /llms.txt from a site root:
llms-txt-audit https://example.com --discover
Print JSON for scripts or dashboards:
llms-txt-audit public/llms.txt --json
Use strict CI behavior:
llms-txt-audit public/llms.txt --strict --fail-on-warning
Example output
llms.txt audit: public/llms.txt
PASS H1 title found. line 1
PASS Blockquote summary found. line 3
PASS 3 H2 section(s) found.
PASS 4 Markdown resource link(s) found.
PASS Optional section found.
Score: 100/100
Sections: 3 Links: 4 Optional: yes
GitHub Action
Use the action from inside this repository:
name: Validate llms.txt
on:
pull_request:
push:
branches: [main]
jobs:
llms-txt:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./llms-txt-audit
with:
target: public/llms.txt
strict: "true"
fail-on-warning: "true"
When this project is published as a standalone GitHub Action, replace ./llms-txt-audit with the public action reference.
The action runs directly from src, so it does not need to build or install the package during the workflow.
Action inputs
| Input | Default | Purpose |
|---|---|---|
target |
public/llms.txt |
Local path, directory, hosted /llms.txt URL, or site URL when discover is true. |
discover |
false |
Treat target as a site root and audit https://site.com/llms.txt. |
strict |
false |
Make recommended structure checks stricter. |
require-optional |
false |
Warn when ## Optional is missing. |
fail-on-warning |
false |
Exit non-zero if warnings are found. |
no-relative-links |
false |
Warn when resource URLs are relative. |
Exit codes
| Code | Meaning |
|---|---|
0 |
Audit completed without errors. |
1 |
Audit found errors, or warnings when --fail-on-warning is enabled. |
2 |
The target could not be read. |
What this checks
Required structure
A strong llms.txt file should start like this:
# Project Name
> One short summary explaining what this site or project is.
Optional notes that help an LLM understand how to use the linked resources.
## Core
- [Overview](https://example.com/index.html.md): Project overview
## Docs
- [Quick start](https://example.com/docs/start.html.md): First setup path
## Optional
- [Sitemap](https://example.com/sitemap.xml): Complete canonical URL list
The audit checks for:
- one H1 title
- a blockquote summary
- H2 sections
- Markdown list links
- an optional
## Optionalsection
Link hygiene
The audit also flags:
- private-looking URLs
- duplicate URLs
- missing link descriptions
- unsupported URL schemes
- relative URLs when
--no-relative-linksis enabled
robots.txt confusion
llms.txt is for context, not crawler permissions. If the file contains lines like this:
User-agent: *
Disallow: /admin
…the audit warns you to move those rules to robots.txt.
What this does not do
This tool does not crawl your whole website or auto-generate a final curated file from every URL. That is intentional. A good llms.txt file should be selected by humans or documentation owners.
Run tests
The test suite uses Python's standard library only:
PYTHONPATH=src python -m unittest discover -s tests -v
On Windows PowerShell:
$env:PYTHONPATH='src'
python -m unittest discover -s tests -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llms_txt_audit-0.1.0.tar.gz.
File metadata
- Download URL: llms_txt_audit-0.1.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2a1d9a04e2d1dfc372d5961372bda9bd4beff98a2d78d0766256b7148249d55
|
|
| MD5 |
9c79c88433b415f065aa2eca3e3627b9
|
|
| BLAKE2b-256 |
81b328f0c1cf6b1e069acad68d89b2bb537e03b35c8aa50e4ad21414cc11827b
|
File details
Details for the file llms_txt_audit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llms_txt_audit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3b5c98a00e2251e8754c56b3662081f24a6287b7d0ad3c71ac253a39ae16400
|
|
| MD5 |
f9c58751d68d31dfc8aae20d14a13b50
|
|
| BLAKE2b-256 |
ac1bc4d4af3594e696eb4f808f7e110bc4f86b3967a88ed756e11553ab0c32c8
|