Skip to main content

LLMRing public model registry CLI and tools (manual curation workflow)

Project description

LLMRing Registry

โš ๏ธ Pre-release notice

The pricing, token limits, and capabilities in this registry are under active validation and may be inaccurate. Do not rely on these numbers for production decisions. Always verify against the providers' official documentation.

Complies with source-of-truth v3.5

The official model registry for LLMRing - providing up-to-date pricing, capabilities, and metadata for all major LLM providers.

Overview

The LLMRing Registry is the source of truth for model information across the LLMRing ecosystem. It automatically extracts and maintains accurate model data from provider documentation, serving it through GitHub Pages for global, free access.

Key Features:

  • ๐Ÿ“… Daily automated extraction from provider documentation
  • ๐Ÿ” Dual extraction approach (HTML + PDF) for accuracy
  • ๐Ÿ“ฆ Versioned JSON files with historical snapshots
  • ๐ŸŒ Served via GitHub Pages at https://llmring.github.io/registry/
  • ๐Ÿ”“ No API keys required for access

Architecture

Registry (This Repo)
โ”œโ”€โ”€ Extraction Pipeline
โ”‚   โ”œโ”€โ”€ HTML Scraping (BeautifulSoup + Regex)
โ”‚   โ””โ”€โ”€ PDF Analysis (via LLMRing's unified interface)
โ”‚       โ”œโ”€โ”€ OpenAI: Assistants API
โ”‚       โ”œโ”€โ”€ Anthropic: Direct PDF support
โ”‚       โ””โ”€โ”€ Google: Direct PDF support
โ””โ”€โ”€ Output
    โ”œโ”€โ”€ models/
    โ”‚   โ”œโ”€โ”€ openai.json
    โ”‚   โ”œโ”€โ”€ anthropic.json
    โ”‚   โ””โ”€โ”€ google.json
    โ””โ”€โ”€ manifest.json

Quick Start

Installation

# Clone the repository
git clone https://github.com/llmring/registry.git
cd registry

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Manual Curation Workflow (Human-validated)

  1. Gather source materials (optional but recommended):
# Show where to get docs and how to save PDFs
uv run registry sources

# Fetch pricing/docs HTML (lightweight)
uv run registry fetch-html --provider openai --output-dir html_cache

# Generate PDFs with a headless browser
# (Playwright is installed via dependencies; install browsers once per machine)
uv run playwright install chromium
uv run registry fetch --provider openai --output-dir pdfs
  1. Generate a draft JSON using the extractor (best-effort), or by hand:
# Best-effort draft generation from PDFs (writes to drafts/ only)
uv run registry extract --provider openai --pdfs-dir pdfs
  1. Review differences vs current curated file:
uv run registry review-draft --provider openai --draft drafts/openai.2025-08-20.json
# Inspect the generated .diff.json report

# Optionally accept all to create a reviewed file
uv run registry review-draft --provider openai --draft drafts/openai.2025-08-20.json --accept-all
  1. Promote the reviewed file to publish and archive:
uv run registry promote --provider openai --reviewed drafts/openai.reviewed.json

This will:

  • Bump version and updated_at
  • Write openai/v/<version>/models.json
  • Replace openai/models.json

Legacy Automation (deprecated)

Commands like fetch, fetch-html, and extract* remain for reference but are deprecated. The official process is manual, human-validated curation.

# View available commands
uv run registry --help

# Fetch latest documentation
uv run registry fetch-html --provider all

# Extract models with comprehensive dual-source validation
uv run registry extract-comprehensive --provider all

# List all extracted models
uv run registry list

# Export to markdown for documentation
uv run registry export --output markdown > models.md

Extraction System

The registry uses a dual extraction approach for maximum accuracy:

1. HTML Extraction

  • Fast regex-based extraction from provider websites
  • Captures current pricing and basic model information
  • No API keys required

2. PDF Extraction

  • Uses LLMRing's unified interface (requires API keys)
  • Extracts detailed capabilities and specifications
  • Automatically uses optimal method per provider:
    • OpenAI: Assistants API for PDF processing
    • Anthropic: Native PDF support with Claude
    • Google: Direct PDF support with Gemini

3. Validation & Consensus

  • Compares both sources for each field
  • Marks confidence levels:
    • Certain: Both sources agree
    • Probable: Single source only
    • Uncertain: Sources conflict
  • Interactive mode available for manual resolution

Model Schema

Each provider's JSON file contains models with this structure (dictionary, not list):

{
  "provider": "openai",
  "version": 2,
  "updated_at": "2025-08-20T00:00:00Z",
  "models": {
    "openai:gpt-4o-mini": {
      "provider": "openai",
      "model_name": "gpt-4o-mini",
      "display_name": "GPT-4 Optimized Mini",
      "max_input_tokens": 128000,
      "max_output_tokens": 16384,
      "dollars_per_million_tokens_input": 0.15,
      "dollars_per_million_tokens_output": 0.60,
      "supports_vision": true,
      "supports_function_calling": true,
      "supports_json_mode": true,
      "supports_parallel_tool_calls": true,
      "is_active": true
    }
  }
}

Commands Reference

Fetching Documentation

# Fetch HTML pages (no browser required)
uv run registry fetch-html --provider openai

# Fetch as PDFs (requires Playwright)
uv run registry fetch --provider all

Extraction

# Extract from HTML only
uv run registry extract-html --provider all

# Extract from PDFs only (requires LLM API keys)
uv run registry extract --provider all

# Comprehensive extraction (recommended)
uv run registry extract-comprehensive --provider all

# Interactive mode for conflict resolution
uv run registry extract-comprehensive --provider all --interactive

Data Management

# List all models with pricing
uv run registry list

# Validate JSON structure
uv run registry validate

# Export for documentation
uv run registry export --output markdown
uv run registry export --output json

Environment Variables

For PDF extraction (optional but recommended):

# Choose one or more providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

The system will automatically use the best available model for extraction.

Automation

GitHub Actions Workflow

The registry updates automatically via GitHub Actions:

# .github/workflows/update-registry.yml
name: Update Registry
on:
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM UTC
  workflow_dispatch:

jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.13'
      - run: pip install uv
      - run: uv sync
      - run: uv run registry fetch-html --provider all
      - run: uv run registry extract-comprehensive --provider all
      - run: |
          git config user.name "GitHub Actions"
          git config user.email "actions@github.com"
          git add models/
          git commit -m "Update model registry $(date +%Y-%m-%d)" || true
          git push

Development

Project Structure

registry/
โ”œโ”€โ”€ src/registry/
โ”‚   โ”œโ”€โ”€ __main__.py           # CLI entry point
โ”‚   โ”œโ”€โ”€ extract_comprehensive.py  # Dual-source extraction
โ”‚   โ”œโ”€โ”€ extract_from_html.py  # HTML regex patterns
โ”‚   โ”œโ”€โ”€ extraction/
โ”‚   โ”‚   โ”œโ”€โ”€ pdf_parser.py     # LLMRing-based PDF extraction
โ”‚   โ”‚   โ””โ”€โ”€ model_curator.py  # Model selection logic
โ”‚   โ””โ”€โ”€ fetch_html.py         # Web scraping
โ”œโ”€โ”€ models/                   # Output JSON files
โ”œโ”€โ”€ pdfs/                     # Cached PDF documentation
โ””โ”€โ”€ html_cache/               # Cached HTML pages

Adding a New Provider

  1. Add URL mappings in fetch_html.py:
PROVIDER_URLS = {
    "newprovider": {
        "pricing": "https://newprovider.com/pricing",
        "models": "https://newprovider.com/docs/models"
    }
}
  1. Add extraction patterns in extract_from_html.py:
def extract_newprovider_models(html: str) -> List[Dict[str, Any]]:
    # Add regex patterns for the provider's HTML structure
    pass
  1. Test extraction:
uv run registry fetch-html --provider newprovider
uv run registry extract-comprehensive --provider newprovider

Testing

# Run tests
uv run pytest

# Test extraction for a specific provider
uv run registry extract-comprehensive --provider openai --interactive

# Validate output
uv run registry validate --models-dir models

Integration with LLMRing

The registry serves as the data source for the entire LLMRing ecosystem:

  1. Static Hosting: JSON files are served via GitHub Pages
  2. Registry URL: https://llmring.github.io/registry/
  3. Manifest: Contains version info and provider index
  4. Updates: Daily via GitHub Actions

Client usage:

from llmring import LLMRing

# Automatically fetches latest registry
ring = LLMRing()

# Get available models
models = ring.get_available_models()

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Priority Areas

  • Add more providers (Cohere, AI21, etc.)
  • Improve extraction patterns for better accuracy
  • Add support for embedding models
  • Enhance capability detection

License

MIT License - see LICENSE for details.

Links


Built with โค๏ธ by the LLMRing team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmring_registry-0.1.0.tar.gz (34.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmring_registry-0.1.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file llmring_registry-0.1.0.tar.gz.

File metadata

  • Download URL: llmring_registry-0.1.0.tar.gz
  • Upload date:
  • Size: 34.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for llmring_registry-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0147fd93084d7427fe7d2ed54b1206d2d8e2d128170e4063bd5433d6a901d0ec
MD5 a0b1d63272b7f50993b3b111a42d8ab7
BLAKE2b-256 2d2feb0aa767251e70d404a895c8229fd7c32dc5ec159b34155ab1ed7751115b

See more details on using hashes here.

File details

Details for the file llmring_registry-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llmring_registry-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd0c456b7fb0b22e3a08f8764b58e725f5aeebbc354b3bc2b478015ff25190d3
MD5 72c0cc45470e655f5d16094c1f71f9e5
BLAKE2b-256 962135448c7b8ecb618e2ebbc60dc7902340d7dc04592811634dfa7083bfa737

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page