Skip to main content

LLM-ready index generator for websites — spec, validator, and CLI tools

Project description

llmindex

License: AGPL-3.0 Spec: CC BY 4.0 Python 3.11+ Spec Version

A machine-readable index standard for LLM and AI search discovery.

llmindex defines a lightweight JSON manifest at /.well-known/llmindex.json that tells LLMs, AI search engines, and crawlers where to find structured information about your website — products, policies, FAQs, and more.

Think of it as robots.txt for the AI era: instead of telling crawlers what not to access, llmindex tells AI agents what is available and where to find it.

Why llmindex?

AI search engines (ChatGPT, Perplexity, Gemini, etc.) are increasingly answering user questions by reading website content directly. But today, there's no standard way for websites to tell AI agents:

  • What information is available (products, policies, FAQs)
  • Where to find it (structured URLs, not scattered HTML)
  • In what format (Markdown for reading, JSONL for data)

llmindex solves this. One JSON file at a well-known path gives AI agents everything they need to understand and represent your business accurately.

Without llmindex With llmindex
AI agents guess what pages matter AI agents know exactly where to look
Product info scattered across HTML Clean Markdown pages + structured JSONL feed
No way to verify domain ownership Built-in DNS/HTTP verification
No version control for AI-facing data Timestamped, versioned manifest

How It Works

1. LLM agent visits your site
2. Fetches /.well-known/llmindex.json    ← discovers the manifest
3. Reads endpoints → /llm/products       ← gets product listing
4. Reads feed → /llm/feed/products.jsonl ← gets structured data
5. Answers user's question accurately    ← better AI responses

Your website structure:

/.well-known/llmindex.json   ← Entry point (manifest)
/llm/products                ← Product listing (Markdown)
/llm/policies                ← Shipping, returns, warranty
/llm/faq                     ← Frequently asked questions
/llm/about                   ← Brand & contact info
/llm/feed/products.jsonl     ← Machine-readable product feed

Example Manifest

{
  "version": "0.1",
  "updated_at": "2026-02-17T10:00:00Z",
  "entity": {
    "name": "ACME Outdoor Gear",
    "canonical_url": "https://acme-outdoor.com"
  },
  "language": "en",
  "topics": ["outdoor-gear", "camping", "hiking"],
  "endpoints": {
    "products": "https://acme-outdoor.com/llm/products",
    "policies": "https://acme-outdoor.com/llm/policies",
    "faq": "https://acme-outdoor.com/llm/faq",
    "about": "https://acme-outdoor.com/llm/about"
  },
  "feeds": {
    "products_jsonl": "https://acme-outdoor.com/llm/feed/products.jsonl"
  }
}

Quick Start

For Website Owners (Adopt the Standard)

You don't need the CLI tool. Just create a JSON file at /.well-known/llmindex.json on your server:

{
  "version": "0.1",
  "updated_at": "2026-02-20T00:00:00Z",
  "entity": {
    "name": "Your Brand Name",
    "canonical_url": "https://your-site.com"
  },
  "language": "en",
  "topics": ["your-industry"],
  "endpoints": {
    "products": "https://your-site.com/llm/products",
    "policies": "https://your-site.com/llm/policies",
    "faq": "https://your-site.com/llm/faq",
    "about": "https://your-site.com/llm/about"
  }
}

Then create the /llm/* pages as simple Markdown or HTML files. That's it.

Validate your manifest against the JSON Schema:

python -c "
import json, jsonschema
schema = json.load(open('spec/schemas/llmindex-0.1.schema.json'))
data = json.load(open('your-manifest.json'))
jsonschema.validate(data, schema)
print('Valid!')
"

For Developers (Use the CLI Generator)

The CLI tool auto-generates all llmindex files from a product CSV:

# Install
pip install -e .

# Generate from CSV
llmindex generate \
  --site "ACME Outdoor" \
  --url https://acme-outdoor.com \
  --input-csv products.csv \
  --topic outdoor-gear \
  --topic camping \
  --output-dir dist

This generates:

dist/
├── .well-known/llmindex.json      # The manifest
├── llm/
│   ├── products.md                # Product listing grouped by category
│   ├── policies.md                # Policies page (template)
│   ├── faq.md                     # FAQ page (template)
│   ├── about.md                   # About page (template)
│   └── feed/
│       └── products.jsonl         # Machine-readable feed (1 JSON per line)

Run the Demo

# Run the interactive demo
python examples/demo.py

# Run the test suite
pip install -e ".[dev]"
pytest

CLI Reference

llmindex generate

llmindex generate [OPTIONS]

Input (one required):
  -i, --input-csv         PATH   Products CSV file
      --input-json        PATH   Products JSON file (array of objects)
      --input-shopify-csv PATH   Shopify product export CSV

Options:
  -s, --site        TEXT   Entity/brand name (required)
  -u, --url         TEXT   Canonical HTTPS URL (required)
  -o, --output-dir  PATH   Output directory (default: dist)
  -l, --language    TEXT   Primary language, BCP-47 (default: en)
  -t, --topic       TEXT   Category topics (repeatable)
      --base-url    TEXT   Base URL for endpoints (defaults to --url)
      --currency    TEXT   Default currency for Shopify imports (default: USD)

llmindex validate

llmindex validate MANIFEST_PATH [OPTIONS]

Arguments:
  MANIFEST_PATH     PATH   Path to llmindex.json file (required)

Options:
  -f, --feed        PATH   Path to products.jsonl (auto-detected if omitted)

Example:

llmindex validate dist/.well-known/llmindex.json
llmindex validate dist/.well-known/llmindex.json --feed dist/llm/feed/products.jsonl

Input Formats

CSV (default)

Standard CSV with columns matching the product schema.

Column Required Description
id Yes Unique product ID
title Yes Product name
url Yes Product page URL
image_url No Product image URL
price Yes Numeric price
currency Yes ISO 4217 code (USD, EUR, etc.)
availability Yes in_stock, out_of_stock, or preorder
brand No Brand name
category No Product category
updated_at No ISO 8601 datetime

See cli/sample_data/sample.csv for a working example with 20 products.

JSON

A JSON array of product objects with the same fields:

llmindex generate --site "TechCo" --url https://techco.com --input-json products.json

See cli/sample_data/sample.json for an example.

Shopify CSV Export

Import directly from Shopify's product CSV export format. Handles are deduplicated (one product per handle), and product URLs are auto-constructed from your store URL:

llmindex generate \
  --site "My Store" \
  --url https://mystore.com \
  --input-shopify-csv shopify_products.csv \
  --currency USD

See cli/sample_data/sample_shopify.csv for an example.

Industry Examples

Each example includes a complete llmindex.json manifest and /llm content pages.

E-commerce — Outdoor Gear Store

Full example with product feed and domain verification.

spec/examples/ecommerce/
├── llmindex.json              # Manifest with feeds + verify
├── llm/
│   ├── products.md            # 10 products, grouped by category
│   ├── policies.md            # Shipping, returns, warranty
│   ├── faq.md                 # Customer FAQ
│   └── about.md               # Brand story
└── feed/
    └── products.jsonl         # 10-line JSONL product feed
View manifest
{
  "version": "0.1",
  "updated_at": "2026-02-17T10:00:00Z",
  "entity": {
    "name": "ACME Outdoor Gear",
    "canonical_url": "https://acme-outdoor.com"
  },
  "language": "en",
  "topics": ["outdoor-gear", "camping", "hiking"],
  "endpoints": {
    "products": "https://acme-outdoor.com/llm/products",
    "policies": "https://acme-outdoor.com/llm/policies",
    "faq": "https://acme-outdoor.com/llm/faq",
    "about": "https://acme-outdoor.com/llm/about"
  },
  "feeds": {
    "products_jsonl": "https://acme-outdoor.com/llm/feed/products.jsonl"
  },
  "verify": {
    "method": "dns_txt",
    "value": "llmindex-verify-a1b2c3d4e5"
  }
}

Local Business — Bakery

Minimal manifest with required fields only. No product feed needed.

spec/examples/local-business/
├── llmindex.json              # Minimal required fields
└── llm/
    ├── products.md            # Menu items
    ├── policies.md            # Store policies
    ├── faq.md                 # Common questions
    └── about.md               # About the bakery
View manifest
{
  "version": "0.1",
  "updated_at": "2026-02-15T09:00:00Z",
  "entity": {
    "name": "Sunrise Bakery",
    "canonical_url": "https://sunrise-bakery.com"
  },
  "language": "en",
  "topics": ["bakery", "pastry", "local-business"],
  "endpoints": {
    "products": "https://sunrise-bakery.com/llm/products",
    "policies": "https://sunrise-bakery.com/llm/policies",
    "faq": "https://sunrise-bakery.com/llm/faq",
    "about": "https://sunrise-bakery.com/llm/about"
  }
}

SaaS — Productivity Tool

Software product with license field, no product feed.

spec/examples/saas/
├── llmindex.json              # Manifest with license field
└── llm/
    ├── products.md            # Pricing plans
    ├── policies.md            # Terms, privacy, SLA
    ├── faq.md                 # Product FAQ
    └── about.md               # Company info
View manifest
{
  "version": "0.1",
  "updated_at": "2026-02-10T14:30:00Z",
  "entity": {
    "name": "TaskFlow",
    "canonical_url": "https://taskflow.io"
  },
  "language": "en",
  "topics": ["saas", "productivity", "project-management"],
  "endpoints": {
    "products": "https://taskflow.io/llm/products",
    "policies": "https://taskflow.io/llm/policies",
    "faq": "https://taskflow.io/llm/faq",
    "about": "https://taskflow.io/llm/about"
  },
  "license": "CC-BY-4.0"
}

Specification

The full llmindex v0.1 specification: spec/spec.md

Required Manifest Fields

Field Type Description
version string "0.1"
updated_at string ISO 8601 datetime
entity.name string Brand/company name
entity.canonical_url string Homepage (HTTPS)
language string BCP-47 language code
topics array Category tags (1+ required)
endpoints object URLs: products, policies, faq, about

Optional Fields

Field Description
feeds Machine-readable data feeds (products_jsonl, offers_json)
verify Domain ownership proof (dns_txt or http_file)
sig Cryptographic signature (JWS with EdDSA)
license SPDX license identifier or URL

JSON Schema

Machine-readable schema for validation: spec/schemas/llmindex-0.1.schema.json

Comparison with llms.txt

llmindex llms.txt
Format JSON (structured, machine-parseable) Plain text
Location /.well-known/llmindex.json /llms.txt
Schema validation JSON Schema provided No formal schema
Structured data feeds JSONL product feeds Not specified
Domain verification DNS TXT / HTTP file Not specified
Versioning Semantic versioning built-in Not specified
Focus Structured discovery + data Guidance for LLMs

llmindex and llms.txt serve complementary purposes. llmindex focuses on structured, machine-parseable discovery; llms.txt focuses on human-readable guidance. You can use both.

Packages

Package Registry Description
llmindex PyPI CLI generator + validator (Python)
@llmindex/schema npm JSON Schema + TypeScript types

Integrations

  • Next.js — Static files, API routes, or middleware rewrite
  • WordPress — Static file, rewrite rules, or WooCommerce product feed

Project Structure

openllmindex/
├── spec/                        # The llmindex specification
│   ├── spec.md                  # v0.1 specification document
│   ├── schemas/                 # JSON Schema for validation
│   │   └── llmindex-0.1.schema.json
│   ├── examples/                # Industry examples
│   │   ├── ecommerce/           #   Full store with feed + verify
│   │   ├── local-business/      #   Minimal bakery
│   │   └── saas/                #   SaaS with license
│   └── test-vectors/            # Invalid manifests for testing
├── cli/                         # Generator CLI tool
│   ├── llmindex_cli/            # CLI application (Typer)
│   │   ├── main.py              # Entry point
│   │   ├── models.py            # Pydantic data models
│   │   ├── validators.py        # Schema + feed validation
│   │   └── generators/          # Output generators
│   ├── importers/               # Data importers (CSV, JSON, Shopify)
│   ├── sample_data/             # Sample data for testing
│   └── tests/                   # Test suite (70+ tests)
├── packages/                    # Published packages
│   └── schema/                  # @llmindex/schema (npm)
├── integrations/                # Platform integrations
│   ├── nextjs/                  #   Next.js middleware + examples
│   └── wordpress/               #   WordPress + WooCommerce
├── docs/                        # GitHub Pages documentation site
├── examples/                    # Usage examples
├── pyproject.toml               # Python package config
├── LICENSE                      # AGPL-3.0
└── README.md

License

Contributing

Contributions welcome. Please open an issue first to discuss what you'd like to change.

Areas for Contribution

  • New industry examples — Healthcare, education, real estate, etc.
  • New importers — Shopify, WooCommerce, JSON, XML
  • Validators — Standalone validation tools for manifest + feeds
  • Integrations — WordPress plugin, Next.js middleware, etc.
  • Translations — Spec and examples in other languages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openllmindex-0.1.0.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openllmindex-0.1.0-py3-none-any.whl (53.5 kB view details)

Uploaded Python 3

File details

Details for the file openllmindex-0.1.0.tar.gz.

File metadata

  • Download URL: openllmindex-0.1.0.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openllmindex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4367eadadade4fd4e52fd865c946e95046ab193eae4002a15b3485ee53b1c5c8
MD5 150d02d6f9449d7187a51654062b7541
BLAKE2b-256 5a582aa5faa5bbcbdc0df792b3d88049095d6ef82ada37d60a71fe949c0ff056

See more details on using hashes here.

File details

Details for the file openllmindex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openllmindex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openllmindex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4098d9f793e671db9021bda273834f998a338cab6498949097d2b387ae4448e
MD5 f39eeb15781c31b3b60d73d1da0ebb42
BLAKE2b-256 1a9d36863c7c67437966819a34b9e607f284cacb84aadff7755446e7c7631972

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page