LLMRing public model registry CLI and tools (manual curation workflow)
Project description
LLMRing Registry
โ ๏ธ Pre-release notice
The pricing, token limits, and capabilities in this registry are under active validation and may be inaccurate. Do not rely on these numbers for production decisions. Always verify against the providers' official documentation.
Complies with source-of-truth v3.5
The official model registry for LLMRing - providing up-to-date pricing, capabilities, and metadata for all major LLM providers.
Overview
The LLMRing Registry is the source of truth for model information across the LLMRing ecosystem. It automatically extracts and maintains accurate model data from provider documentation, serving it through GitHub Pages for global, free access.
Key Features:
- ๐ Daily automated extraction from provider documentation
- ๐ Dual extraction approach (HTML + PDF) for accuracy
- ๐ฆ Versioned JSON files with historical snapshots
- ๐ Served via GitHub Pages at
https://llmring.github.io/registry/ - ๐ No API keys required for access
Architecture
Registry (This Repo)
โโโ Extraction Pipeline
โ โโโ HTML Scraping (BeautifulSoup + Regex)
โ โโโ PDF Analysis (via LLMRing's unified interface)
โ โโโ OpenAI: Assistants API
โ โโโ Anthropic: Direct PDF support
โ โโโ Google: Direct PDF support
โโโ Output
โโโ models/
โ โโโ openai.json
โ โโโ anthropic.json
โ โโโ google.json
โโโ manifest.json
Quick Start
Installation
# Clone the repository
git clone https://github.com/llmring/registry.git
cd registry
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .
Manual Curation Workflow (Human-validated)
- Gather source materials (optional but recommended):
# Show where to get docs and how to save PDFs
uv run registry sources
# Fetch pricing/docs HTML (lightweight)
uv run registry fetch-html --provider openai --output-dir html_cache
# Generate PDFs with a headless browser
# (Playwright is installed via dependencies; install browsers once per machine)
uv run playwright install chromium
uv run registry fetch --provider openai --output-dir pdfs
- Generate a draft JSON using the extractor (best-effort), or by hand:
# Best-effort draft generation from PDFs (writes to drafts/ only)
uv run registry extract --provider openai --pdfs-dir pdfs
- Review differences vs current curated file:
uv run registry review-draft --provider openai --draft drafts/openai.2025-08-20.json
# Inspect the generated .diff.json report
# Optionally accept all to create a reviewed file
uv run registry review-draft --provider openai --draft drafts/openai.2025-08-20.json --accept-all
- Promote the reviewed file to publish and archive:
uv run registry promote --provider openai --reviewed drafts/openai.reviewed.json
This will:
- Bump
versionandupdated_at - Write
openai/v/<version>/models.json - Replace
openai/models.json
Legacy Automation (deprecated)
Commands like fetch, fetch-html, and extract* remain for reference but are deprecated. The official process is manual, human-validated curation.
# View available commands
uv run registry --help
# Fetch latest documentation
uv run registry fetch-html --provider all
# Extract models with comprehensive dual-source validation
uv run registry extract-comprehensive --provider all
# List all extracted models
uv run registry list
# Export to markdown for documentation
uv run registry export --output markdown > models.md
Extraction System
The registry uses a dual extraction approach for maximum accuracy:
1. HTML Extraction
- Fast regex-based extraction from provider websites
- Captures current pricing and basic model information
- No API keys required
2. PDF Extraction
- Uses LLMRing's unified interface (requires API keys)
- Extracts detailed capabilities and specifications
- Automatically uses optimal method per provider:
- OpenAI: Assistants API for PDF processing
- Anthropic: Native PDF support with Claude
- Google: Direct PDF support with Gemini
3. Validation & Consensus
- Compares both sources for each field
- Marks confidence levels:
- Certain: Both sources agree
- Probable: Single source only
- Uncertain: Sources conflict
- Interactive mode available for manual resolution
Model Schema
Each provider's JSON file contains models with this structure (dictionary, not list):
{
"provider": "openai",
"version": 2,
"updated_at": "2025-08-20T00:00:00Z",
"models": {
"openai:gpt-4o-mini": {
"provider": "openai",
"model_name": "gpt-4o-mini",
"display_name": "GPT-4 Optimized Mini",
"max_input_tokens": 128000,
"max_output_tokens": 16384,
"dollars_per_million_tokens_input": 0.15,
"dollars_per_million_tokens_output": 0.60,
"supports_vision": true,
"supports_function_calling": true,
"supports_json_mode": true,
"supports_parallel_tool_calls": true,
"is_active": true
}
}
}
Commands Reference
Fetching Documentation
# Fetch HTML pages (no browser required)
uv run registry fetch-html --provider openai
# Fetch as PDFs (requires Playwright)
uv run registry fetch --provider all
Extraction
# Extract from HTML only
uv run registry extract-html --provider all
# Extract from PDFs only (requires LLM API keys)
uv run registry extract --provider all
# Comprehensive extraction (recommended)
uv run registry extract-comprehensive --provider all
# Interactive mode for conflict resolution
uv run registry extract-comprehensive --provider all --interactive
Data Management
# List all models with pricing
uv run registry list
# Validate JSON structure
uv run registry validate
# Export for documentation
uv run registry export --output markdown
uv run registry export --output json
Environment Variables
For PDF extraction (optional but recommended):
# Choose one or more providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
The system will automatically use the best available model for extraction.
Automation
GitHub Actions Workflow
The registry updates automatically via GitHub Actions:
# .github/workflows/update-registry.yml
name: Update Registry
on:
schedule:
- cron: '0 6 * * *' # Daily at 6 AM UTC
workflow_dispatch:
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13'
- run: pip install uv
- run: uv sync
- run: uv run registry fetch-html --provider all
- run: uv run registry extract-comprehensive --provider all
- run: |
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add models/
git commit -m "Update model registry $(date +%Y-%m-%d)" || true
git push
Development
Project Structure
registry/
โโโ src/registry/
โ โโโ __main__.py # CLI entry point
โ โโโ extract_comprehensive.py # Dual-source extraction
โ โโโ extract_from_html.py # HTML regex patterns
โ โโโ extraction/
โ โ โโโ pdf_parser.py # LLMRing-based PDF extraction
โ โ โโโ model_curator.py # Model selection logic
โ โโโ fetch_html.py # Web scraping
โโโ models/ # Output JSON files
โโโ pdfs/ # Cached PDF documentation
โโโ html_cache/ # Cached HTML pages
Adding a New Provider
- Add URL mappings in
fetch_html.py:
PROVIDER_URLS = {
"newprovider": {
"pricing": "https://newprovider.com/pricing",
"models": "https://newprovider.com/docs/models"
}
}
- Add extraction patterns in
extract_from_html.py:
def extract_newprovider_models(html: str) -> List[Dict[str, Any]]:
# Add regex patterns for the provider's HTML structure
pass
- Test extraction:
uv run registry fetch-html --provider newprovider
uv run registry extract-comprehensive --provider newprovider
Testing
# Run tests
uv run pytest
# Test extraction for a specific provider
uv run registry extract-comprehensive --provider openai --interactive
# Validate output
uv run registry validate --models-dir models
Integration with LLMRing
The registry serves as the data source for the entire LLMRing ecosystem:
- Static Hosting: JSON files are served via GitHub Pages
- Registry URL:
https://llmring.github.io/registry/ - Manifest: Contains version info and provider index
- Updates: Daily via GitHub Actions
Client usage:
from llmring import LLMRing
# Automatically fetches latest registry
ring = LLMRing()
# Get available models
models = ring.get_available_models()
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Priority Areas
- Add more providers (Cohere, AI21, etc.)
- Improve extraction patterns for better accuracy
- Add support for embedding models
- Enhance capability detection
License
MIT License - see LICENSE for details.
Links
- Registry Data: https://llmring.github.io/registry/
- Main Project: https://github.com/llmring/llmring
- Documentation: https://llmring.ai/docs
- API Reference: https://api.llmring.ai
Built with โค๏ธ by the LLMRing team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmring_registry-0.1.0.tar.gz.
File metadata
- Download URL: llmring_registry-0.1.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0147fd93084d7427fe7d2ed54b1206d2d8e2d128170e4063bd5433d6a901d0ec
|
|
| MD5 |
a0b1d63272b7f50993b3b111a42d8ab7
|
|
| BLAKE2b-256 |
2d2feb0aa767251e70d404a895c8229fd7c32dc5ec159b34155ab1ed7751115b
|
File details
Details for the file llmring_registry-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmring_registry-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd0c456b7fb0b22e3a08f8764b58e725f5aeebbc354b3bc2b478015ff25190d3
|
|
| MD5 |
72c0cc45470e655f5d16094c1f71f9e5
|
|
| BLAKE2b-256 |
962135448c7b8ecb618e2ebbc60dc7902340d7dc04592811634dfa7083bfa737
|