LLM-ready index generator for websites — spec, validator, and CLI tools
Project description
llmindex
A machine-readable index standard for LLM and AI search discovery.
llmindex defines a lightweight JSON manifest at /.well-known/llmindex.json that tells LLMs, AI search engines, and crawlers where to find structured information about your website — products, policies, FAQs, and more.
Think of it as robots.txt for the AI era: instead of telling crawlers what not to access, llmindex tells AI agents what is available and where to find it.
Why llmindex?
AI search engines (ChatGPT, Perplexity, Gemini, etc.) are increasingly answering user questions by reading website content directly. But today, there's no standard way for websites to tell AI agents:
- What information is available (products, policies, FAQs)
- Where to find it (structured URLs, not scattered HTML)
- In what format (Markdown for reading, JSONL for data)
llmindex solves this. One JSON file at a well-known path gives AI agents everything they need to understand and represent your business accurately.
| Without llmindex | With llmindex |
|---|---|
| AI agents guess what pages matter | AI agents know exactly where to look |
| Product info scattered across HTML | Clean Markdown pages + structured JSONL feed |
| No way to verify domain ownership | Built-in DNS/HTTP verification |
| No version control for AI-facing data | Timestamped, versioned manifest |
How It Works
1. LLM agent visits your site
2. Fetches /.well-known/llmindex.json ← discovers the manifest
3. Reads endpoints → /llm/products ← gets product listing
4. Reads feed → /llm/feed/products.jsonl ← gets structured data
5. Answers user's question accurately ← better AI responses
Your website structure:
/.well-known/llmindex.json ← Entry point (manifest)
/llm/products ← Product listing (Markdown)
/llm/policies ← Shipping, returns, warranty
/llm/faq ← Frequently asked questions
/llm/about ← Brand & contact info
/llm/feed/products.jsonl ← Machine-readable product feed
Example Manifest
{
"version": "0.1",
"updated_at": "2026-02-17T10:00:00Z",
"entity": {
"name": "ACME Outdoor Gear",
"canonical_url": "https://acme-outdoor.com"
},
"language": "en",
"topics": ["outdoor-gear", "camping", "hiking"],
"endpoints": {
"products": "https://acme-outdoor.com/llm/products",
"policies": "https://acme-outdoor.com/llm/policies",
"faq": "https://acme-outdoor.com/llm/faq",
"about": "https://acme-outdoor.com/llm/about"
},
"feeds": {
"products_jsonl": "https://acme-outdoor.com/llm/feed/products.jsonl"
}
}
Quick Start
For Website Owners (Adopt the Standard)
You don't need the CLI tool. Just create a JSON file at /.well-known/llmindex.json on your server:
{
"version": "0.1",
"updated_at": "2026-02-20T00:00:00Z",
"entity": {
"name": "Your Brand Name",
"canonical_url": "https://your-site.com"
},
"language": "en",
"topics": ["your-industry"],
"endpoints": {
"products": "https://your-site.com/llm/products",
"policies": "https://your-site.com/llm/policies",
"faq": "https://your-site.com/llm/faq",
"about": "https://your-site.com/llm/about"
}
}
Then create the /llm/* pages as simple Markdown or HTML files. That's it.
Validate your manifest against the JSON Schema:
python -c "
import json, jsonschema
schema = json.load(open('spec/schemas/llmindex-0.1.schema.json'))
data = json.load(open('your-manifest.json'))
jsonschema.validate(data, schema)
print('Valid!')
"
For Developers (Use the CLI Generator)
The CLI tool auto-generates all llmindex files from a product CSV:
# Install
pip install -e .
# Generate from CSV
llmindex generate \
--site "ACME Outdoor" \
--url https://acme-outdoor.com \
--input-csv products.csv \
--topic outdoor-gear \
--topic camping \
--output-dir dist
This generates:
dist/
├── .well-known/llmindex.json # The manifest
├── llm/
│ ├── products.md # Product listing grouped by category
│ ├── policies.md # Policies page (template)
│ ├── faq.md # FAQ page (template)
│ ├── about.md # About page (template)
│ └── feed/
│ └── products.jsonl # Machine-readable feed (1 JSON per line)
Run the Demo
# Run the interactive demo
python examples/demo.py
# Run the test suite
pip install -e ".[dev]"
pytest
CLI Reference
llmindex generate
llmindex generate [OPTIONS]
Input (one required):
-i, --input-csv PATH Products CSV file
--input-json PATH Products JSON file (array of objects)
--input-shopify-csv PATH Shopify product export CSV
Options:
-s, --site TEXT Entity/brand name (required)
-u, --url TEXT Canonical HTTPS URL (required)
-o, --output-dir PATH Output directory (default: dist)
-l, --language TEXT Primary language, BCP-47 (default: en)
-t, --topic TEXT Category topics (repeatable)
--base-url TEXT Base URL for endpoints (defaults to --url)
--currency TEXT Default currency for Shopify imports (default: USD)
llmindex validate
llmindex validate MANIFEST_PATH [OPTIONS]
Arguments:
MANIFEST_PATH PATH Path to llmindex.json file (required)
Options:
-f, --feed PATH Path to products.jsonl (auto-detected if omitted)
Example:
llmindex validate dist/.well-known/llmindex.json
llmindex validate dist/.well-known/llmindex.json --feed dist/llm/feed/products.jsonl
Input Formats
CSV (default)
Standard CSV with columns matching the product schema.
| Column | Required | Description |
|---|---|---|
id |
Yes | Unique product ID |
title |
Yes | Product name |
url |
Yes | Product page URL |
image_url |
No | Product image URL |
price |
Yes | Numeric price |
currency |
Yes | ISO 4217 code (USD, EUR, etc.) |
availability |
Yes | in_stock, out_of_stock, or preorder |
brand |
No | Brand name |
category |
No | Product category |
updated_at |
No | ISO 8601 datetime |
See cli/sample_data/sample.csv for a working example with 20 products.
JSON
A JSON array of product objects with the same fields:
llmindex generate --site "TechCo" --url https://techco.com --input-json products.json
See cli/sample_data/sample.json for an example.
Shopify CSV Export
Import directly from Shopify's product CSV export format. Handles are deduplicated (one product per handle), and product URLs are auto-constructed from your store URL:
llmindex generate \
--site "My Store" \
--url https://mystore.com \
--input-shopify-csv shopify_products.csv \
--currency USD
See cli/sample_data/sample_shopify.csv for an example.
Industry Examples
Each example includes a complete llmindex.json manifest and /llm content pages.
E-commerce — Outdoor Gear Store
Full example with product feed and domain verification.
spec/examples/ecommerce/
├── llmindex.json # Manifest with feeds + verify
├── llm/
│ ├── products.md # 10 products, grouped by category
│ ├── policies.md # Shipping, returns, warranty
│ ├── faq.md # Customer FAQ
│ └── about.md # Brand story
└── feed/
└── products.jsonl # 10-line JSONL product feed
View manifest
{
"version": "0.1",
"updated_at": "2026-02-17T10:00:00Z",
"entity": {
"name": "ACME Outdoor Gear",
"canonical_url": "https://acme-outdoor.com"
},
"language": "en",
"topics": ["outdoor-gear", "camping", "hiking"],
"endpoints": {
"products": "https://acme-outdoor.com/llm/products",
"policies": "https://acme-outdoor.com/llm/policies",
"faq": "https://acme-outdoor.com/llm/faq",
"about": "https://acme-outdoor.com/llm/about"
},
"feeds": {
"products_jsonl": "https://acme-outdoor.com/llm/feed/products.jsonl"
},
"verify": {
"method": "dns_txt",
"value": "llmindex-verify-a1b2c3d4e5"
}
}
Local Business — Bakery
Minimal manifest with required fields only. No product feed needed.
spec/examples/local-business/
├── llmindex.json # Minimal required fields
└── llm/
├── products.md # Menu items
├── policies.md # Store policies
├── faq.md # Common questions
└── about.md # About the bakery
View manifest
{
"version": "0.1",
"updated_at": "2026-02-15T09:00:00Z",
"entity": {
"name": "Sunrise Bakery",
"canonical_url": "https://sunrise-bakery.com"
},
"language": "en",
"topics": ["bakery", "pastry", "local-business"],
"endpoints": {
"products": "https://sunrise-bakery.com/llm/products",
"policies": "https://sunrise-bakery.com/llm/policies",
"faq": "https://sunrise-bakery.com/llm/faq",
"about": "https://sunrise-bakery.com/llm/about"
}
}
SaaS — Productivity Tool
Software product with license field, no product feed.
spec/examples/saas/
├── llmindex.json # Manifest with license field
└── llm/
├── products.md # Pricing plans
├── policies.md # Terms, privacy, SLA
├── faq.md # Product FAQ
└── about.md # Company info
View manifest
{
"version": "0.1",
"updated_at": "2026-02-10T14:30:00Z",
"entity": {
"name": "TaskFlow",
"canonical_url": "https://taskflow.io"
},
"language": "en",
"topics": ["saas", "productivity", "project-management"],
"endpoints": {
"products": "https://taskflow.io/llm/products",
"policies": "https://taskflow.io/llm/policies",
"faq": "https://taskflow.io/llm/faq",
"about": "https://taskflow.io/llm/about"
},
"license": "CC-BY-4.0"
}
Specification
The full llmindex v0.1 specification: spec/spec.md
Required Manifest Fields
| Field | Type | Description |
|---|---|---|
version |
string | "0.1" |
updated_at |
string | ISO 8601 datetime |
entity.name |
string | Brand/company name |
entity.canonical_url |
string | Homepage (HTTPS) |
language |
string | BCP-47 language code |
topics |
array | Category tags (1+ required) |
endpoints |
object | URLs: products, policies, faq, about |
Optional Fields
| Field | Description |
|---|---|
feeds |
Machine-readable data feeds (products_jsonl, offers_json) |
verify |
Domain ownership proof (dns_txt or http_file) |
sig |
Cryptographic signature (JWS with EdDSA) |
license |
SPDX license identifier or URL |
JSON Schema
Machine-readable schema for validation: spec/schemas/llmindex-0.1.schema.json
Comparison with llms.txt
| llmindex | llms.txt | |
|---|---|---|
| Format | JSON (structured, machine-parseable) | Plain text |
| Location | /.well-known/llmindex.json |
/llms.txt |
| Schema validation | JSON Schema provided | No formal schema |
| Structured data feeds | JSONL product feeds | Not specified |
| Domain verification | DNS TXT / HTTP file | Not specified |
| Versioning | Semantic versioning built-in | Not specified |
| Focus | Structured discovery + data | Guidance for LLMs |
llmindex and llms.txt serve complementary purposes. llmindex focuses on structured, machine-parseable discovery; llms.txt focuses on human-readable guidance. You can use both.
Packages
| Package | Registry | Description |
|---|---|---|
llmindex |
PyPI | CLI generator + validator (Python) |
@llmindex/schema |
npm | JSON Schema + TypeScript types |
Integrations
- Next.js — Static files, API routes, or middleware rewrite
- WordPress — Static file, rewrite rules, or WooCommerce product feed
Project Structure
openllmindex/
├── spec/ # The llmindex specification
│ ├── spec.md # v0.1 specification document
│ ├── schemas/ # JSON Schema for validation
│ │ └── llmindex-0.1.schema.json
│ ├── examples/ # Industry examples
│ │ ├── ecommerce/ # Full store with feed + verify
│ │ ├── local-business/ # Minimal bakery
│ │ └── saas/ # SaaS with license
│ └── test-vectors/ # Invalid manifests for testing
├── cli/ # Generator CLI tool
│ ├── llmindex_cli/ # CLI application (Typer)
│ │ ├── main.py # Entry point
│ │ ├── models.py # Pydantic data models
│ │ ├── validators.py # Schema + feed validation
│ │ └── generators/ # Output generators
│ ├── importers/ # Data importers (CSV, JSON, Shopify)
│ ├── sample_data/ # Sample data for testing
│ └── tests/ # Test suite (70+ tests)
├── packages/ # Published packages
│ └── schema/ # @llmindex/schema (npm)
├── integrations/ # Platform integrations
│ ├── nextjs/ # Next.js middleware + examples
│ └── wordpress/ # WordPress + WooCommerce
├── docs/ # GitHub Pages documentation site
├── examples/ # Usage examples
├── pyproject.toml # Python package config
├── LICENSE # AGPL-3.0
└── README.md
License
- Specification (
spec/): CC BY 4.0 — free to adopt, reference, and build upon - CLI Tools (
cli/): AGPL-3.0-or-later
Contributing
Contributions welcome. Please open an issue first to discuss what you'd like to change.
Areas for Contribution
- New industry examples — Healthcare, education, real estate, etc.
- New importers — Shopify, WooCommerce, JSON, XML
- Validators — Standalone validation tools for manifest + feeds
- Integrations — WordPress plugin, Next.js middleware, etc.
- Translations — Spec and examples in other languages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openllmindex-0.1.0.tar.gz.
File metadata
- Download URL: openllmindex-0.1.0.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4367eadadade4fd4e52fd865c946e95046ab193eae4002a15b3485ee53b1c5c8
|
|
| MD5 |
150d02d6f9449d7187a51654062b7541
|
|
| BLAKE2b-256 |
5a582aa5faa5bbcbdc0df792b3d88049095d6ef82ada37d60a71fe949c0ff056
|
File details
Details for the file openllmindex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: openllmindex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4098d9f793e671db9021bda273834f998a338cab6498949097d2b387ae4448e
|
|
| MD5 |
f39eeb15781c31b3b60d73d1da0ebb42
|
|
| BLAKE2b-256 |
1a9d36863c7c67437966819a34b9e607f284cacb84aadff7755446e7c7631972
|