Skip to main content

Validate llms.txt files locally, over HTTP, and in CI.

Project description

llms-txt-audit

llms-txt-audit is a small Python CLI and GitHub Action for validating /llms.txt files before they ship.

It checks whether your file follows the shape described by the official llms.txt proposal: a Markdown file with a project title, short summary, curated resource sections, and links to the pages an LLM should read for context.

Need to create the file first? Use the AltRepo llms.txt Generator & Validator to draft a spec-shaped file, then use this CLI to protect it in CI.

Why this exists

A useful llms.txt file is curated. It should not be a blind sitemap dump, and it should not contain robots.txt permission rules.

This tool helps teams catch common mistakes:

  • missing project title
  • missing blockquote summary
  • no H2 resource sections
  • no Markdown resource links
  • robots.txt directives placed in llms.txt
  • private-looking URLs such as /admin, /account, /checkout, or /wp-admin
  • duplicate links
  • links with no explanation
  • relative links when absolute URLs are required

Install

For local development from this repository:

python -m pip install -e .

After installation, the command is available as:

llms-txt-audit --help

You can also run it without installing by setting PYTHONPATH:

PYTHONPATH=src python -m llms_txt_audit.cli public/llms.txt

Basic usage

Audit a local file:

llms-txt-audit public/llms.txt

Audit a directory that contains llms.txt:

llms-txt-audit public/

Audit a hosted file:

llms-txt-audit https://example.com/llms.txt

Discover /llms.txt from a site root:

llms-txt-audit https://example.com --discover

Print JSON for scripts or dashboards:

llms-txt-audit public/llms.txt --json

Use strict CI behavior:

llms-txt-audit public/llms.txt --strict --fail-on-warning

Example output

llms.txt audit: public/llms.txt

PASS  H1 title found. line 1
PASS  Blockquote summary found. line 3
PASS  3 H2 section(s) found.
PASS  4 Markdown resource link(s) found.
PASS  Optional section found.

Score: 100/100
Sections: 3  Links: 4  Optional: yes

GitHub Action

Use the action from inside this repository:

name: Validate llms.txt

on:
  pull_request:
  push:
    branches: [main]

jobs:
  llms-txt:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./llms-txt-audit
        with:
          target: public/llms.txt
          strict: "true"
          fail-on-warning: "true"

When this project is published as a standalone GitHub Action, replace ./llms-txt-audit with the public action reference.

The action runs directly from src, so it does not need to build or install the package during the workflow.

Action inputs

Input Default Purpose
target public/llms.txt Local path, directory, hosted /llms.txt URL, or site URL when discover is true.
discover false Treat target as a site root and audit https://site.com/llms.txt.
strict false Make recommended structure checks stricter.
require-optional false Warn when ## Optional is missing.
fail-on-warning false Exit non-zero if warnings are found.
no-relative-links false Warn when resource URLs are relative.

Exit codes

Code Meaning
0 Audit completed without errors.
1 Audit found errors, or warnings when --fail-on-warning is enabled.
2 The target could not be read.

What this checks

Required structure

A strong llms.txt file should start like this:

# Project Name

> One short summary explaining what this site or project is.

Optional notes that help an LLM understand how to use the linked resources.

## Core
- [Overview](https://example.com/index.html.md): Project overview

## Docs
- [Quick start](https://example.com/docs/start.html.md): First setup path

## Optional
- [Sitemap](https://example.com/sitemap.xml): Complete canonical URL list

The audit checks for:

  • one H1 title
  • a blockquote summary
  • H2 sections
  • Markdown list links
  • an optional ## Optional section

Link hygiene

The audit also flags:

  • private-looking URLs
  • duplicate URLs
  • missing link descriptions
  • unsupported URL schemes
  • relative URLs when --no-relative-links is enabled

robots.txt confusion

llms.txt is for context, not crawler permissions. If the file contains lines like this:

User-agent: *
Disallow: /admin

…the audit warns you to move those rules to robots.txt.

What this does not do

This tool does not crawl your whole website or auto-generate a final curated file from every URL. That is intentional. A good llms.txt file should be selected by humans or documentation owners.

Run tests

The test suite uses Python's standard library only:

PYTHONPATH=src python -m unittest discover -s tests -v

On Windows PowerShell:

$env:PYTHONPATH='src'
python -m unittest discover -s tests -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llms_txt_audit-0.1.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llms_txt_audit-0.1.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file llms_txt_audit-0.1.0.tar.gz.

File metadata

  • Download URL: llms_txt_audit-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llms_txt_audit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e2a1d9a04e2d1dfc372d5961372bda9bd4beff98a2d78d0766256b7148249d55
MD5 9c79c88433b415f065aa2eca3e3627b9
BLAKE2b-256 81b328f0c1cf6b1e069acad68d89b2bb537e03b35c8aa50e4ad21414cc11827b

See more details on using hashes here.

File details

Details for the file llms_txt_audit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llms_txt_audit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llms_txt_audit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3b5c98a00e2251e8754c56b3662081f24a6287b7d0ad3c71ac253a39ae16400
MD5 f9c58751d68d31dfc8aae20d14a13b50
BLAKE2b-256 ac1bc4d4af3594e696eb4f808f7e110bc4f86b3967a88ed756e11553ab0c32c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page