LLMRing public model registry CLI and tools (manual curation workflow)

These details have not been verified by PyPI

Project links

Project description

LLMRing Registry

⚠️ Pre-release notice

The pricing, token limits, and capabilities in this registry are under active validation and may be inaccurate. Do not rely on these numbers for production decisions. Always verify against the providers' official documentation.

Complies with source-of-truth v3.5

The official model registry for LLMRing - providing up-to-date pricing, capabilities, and metadata for all major LLM providers.

Overview

The LLMRing Registry is the source of truth for model information across the LLMRing ecosystem. It automatically extracts and maintains accurate model data from provider documentation, serving it through GitHub Pages for global, free access.

Key Features:

📅 Daily automated extraction from provider documentation
🔍 Dual extraction approach (HTML + PDF) for accuracy
📦 Versioned JSON files with historical snapshots
🌐 Served via GitHub Pages at https://llmring.github.io/registry/
🔓 No API keys required for access

Architecture

Registry (This Repo)
├── Extraction Pipeline
│   ├── HTML Scraping (BeautifulSoup + Regex)
│   └── PDF Analysis (via LLMRing's unified interface)
│       ├── OpenAI: Assistants API
│       ├── Anthropic: Direct PDF support
│       └── Google: Direct PDF support
└── Output
    ├── models/
    │   ├── openai.json
    │   ├── anthropic.json
    │   └── google.json
    └── manifest.json

Quick Start

Installation

# Clone the repository
git clone https://github.com/llmring/registry.git
cd registry

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Manual Curation Workflow (Human-validated)

Gather source materials (optional but recommended):

# Show where to get docs and how to save PDFs
uv run registry sources

# Fetch pricing/docs HTML (lightweight)
uv run registry fetch-html --provider openai --output-dir html_cache

# Generate PDFs with a headless browser
# (Playwright is installed via dependencies; install browsers once per machine)
uv run playwright install chromium
uv run registry fetch --provider openai --output-dir pdfs

Generate a draft JSON using the extractor (best-effort), or by hand:

# Best-effort draft generation from PDFs (writes to drafts/ only)
uv run registry extract --provider openai --pdfs-dir pdfs

Review differences vs current curated file:

uv run registry review-draft --provider openai --draft drafts/openai.2025-08-20.json
# Inspect the generated .diff.json report

# Optionally accept all to create a reviewed file
uv run registry review-draft --provider openai --draft drafts/openai.2025-08-20.json --accept-all

Promote the reviewed file to publish and archive:

uv run registry promote --provider openai --reviewed drafts/openai.reviewed.json

This will:

Bump version and updated_at
Write openai/v/<version>/models.json
Replace openai/models.json

Legacy Automation (deprecated)

Commands like fetch, fetch-html, and extract* remain for reference but are deprecated. The official process is manual, human-validated curation.

# View available commands
uv run registry --help

# Fetch latest documentation
uv run registry fetch-html --provider all

# Extract models with comprehensive dual-source validation
uv run registry extract-comprehensive --provider all

# List all extracted models
uv run registry list

# Export to markdown for documentation
uv run registry export --output markdown > models.md

Extraction System

The registry uses a dual extraction approach for maximum accuracy:

1. HTML Extraction

Fast regex-based extraction from provider websites
Captures current pricing and basic model information
No API keys required

2. PDF Extraction

Uses LLMRing's unified interface (requires API keys)
Extracts detailed capabilities and specifications
Automatically uses optimal method per provider:
- OpenAI: Assistants API for PDF processing
- Anthropic: Native PDF support with Claude
- Google: Direct PDF support with Gemini

3. Validation & Consensus

Compares both sources for each field
Marks confidence levels:
- Certain: Both sources agree
- Probable: Single source only
- Uncertain: Sources conflict
Interactive mode available for manual resolution

Model Schema

Each provider's JSON file contains models with this structure (dictionary, not list):

{
  "provider": "openai",
  "version": 2,
  "updated_at": "2025-08-20T00:00:00Z",
  "models": {
    "openai:gpt-4o-mini": {
      "provider": "openai",
      "model_name": "gpt-4o-mini",
      "display_name": "GPT-4 Optimized Mini",
      "max_input_tokens": 128000,
      "max_output_tokens": 16384,
      "dollars_per_million_tokens_input": 0.15,
      "dollars_per_million_tokens_output": 0.60,
      "supports_vision": true,
      "supports_function_calling": true,
      "supports_json_mode": true,
      "supports_parallel_tool_calls": true,
      "is_active": true
    }
  }
}

Commands Reference

Fetching Documentation

# Fetch HTML pages (no browser required)
uv run registry fetch-html --provider openai

# Fetch as PDFs (requires Playwright)
uv run registry fetch --provider all

Extraction

# Extract from HTML only
uv run registry extract-html --provider all

# Extract from PDFs only (requires LLM API keys)
uv run registry extract --provider all

# Comprehensive extraction (recommended)
uv run registry extract-comprehensive --provider all

# Interactive mode for conflict resolution
uv run registry extract-comprehensive --provider all --interactive

Data Management

# List all models with pricing
uv run registry list

# Validate JSON structure
uv run registry validate

# Export for documentation
uv run registry export --output markdown
uv run registry export --output json

Environment Variables

For PDF extraction (optional but recommended):

# Choose one or more providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

The system will automatically use the best available model for extraction.

Automation

GitHub Actions Workflow

The registry updates automatically via GitHub Actions:

# .github/workflows/update-registry.yml
name: Update Registry
on:
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM UTC
  workflow_dispatch:

jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.13'
      - run: pip install uv
      - run: uv sync
      - run: uv run registry fetch-html --provider all
      - run: uv run registry extract-comprehensive --provider all
      - run: |
          git config user.name "GitHub Actions"
          git config user.email "actions@github.com"
          git add models/
          git commit -m "Update model registry $(date +%Y-%m-%d)" || true
          git push

Development

Project Structure

registry/
├── src/registry/
│   ├── __main__.py           # CLI entry point
│   ├── extract_comprehensive.py  # Dual-source extraction
│   ├── extract_from_html.py  # HTML regex patterns
│   ├── extraction/
│   │   ├── pdf_parser.py     # LLMRing-based PDF extraction
│   │   └── model_curator.py  # Model selection logic
│   └── fetch_html.py         # Web scraping
├── models/                   # Output JSON files
├── pdfs/                     # Cached PDF documentation
└── html_cache/               # Cached HTML pages

Adding a New Provider

Add URL mappings in fetch_html.py:

PROVIDER_URLS = {
    "newprovider": {
        "pricing": "https://newprovider.com/pricing",
        "models": "https://newprovider.com/docs/models"
    }
}

Add extraction patterns in extract_from_html.py:

def extract_newprovider_models(html: str) -> List[Dict[str, Any]]:
    # Add regex patterns for the provider's HTML structure
    pass

Test extraction:

uv run registry fetch-html --provider newprovider
uv run registry extract-comprehensive --provider newprovider

Testing

# Run tests
uv run pytest

# Test extraction for a specific provider
uv run registry extract-comprehensive --provider openai --interactive

# Validate output
uv run registry validate --models-dir models

Integration with LLMRing

The registry serves as the data source for the entire LLMRing ecosystem:

Static Hosting: JSON files are served via GitHub Pages
Registry URL: https://llmring.github.io/registry/
Manifest: Contains version info and provider index
Updates: Daily via GitHub Actions

Client usage:

from llmring import LLMRing

# Automatically fetches latest registry
ring = LLMRing()

# Get available models
models = ring.get_available_models()

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Priority Areas

Add more providers (Cohere, AI21, etc.)
Improve extraction patterns for better accuracy
Add support for embedding models
Enhance capability detection

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Aug 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmring_registry-0.1.0.tar.gz (34.4 kB view details)

Uploaded Aug 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmring_registry-0.1.0-py3-none-any.whl (33.8 kB view details)

Uploaded Aug 20, 2025 Python 3

File details

Details for the file llmring_registry-0.1.0.tar.gz.

File metadata

Download URL: llmring_registry-0.1.0.tar.gz
Upload date: Aug 20, 2025
Size: 34.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for llmring_registry-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0147fd93084d7427fe7d2ed54b1206d2d8e2d128170e4063bd5433d6a901d0ec`
MD5	`a0b1d63272b7f50993b3b111a42d8ab7`
BLAKE2b-256	`2d2feb0aa767251e70d404a895c8229fd7c32dc5ec159b34155ab1ed7751115b`

See more details on using hashes here.

File details

Details for the file llmring_registry-0.1.0-py3-none-any.whl.

File metadata

Download URL: llmring_registry-0.1.0-py3-none-any.whl
Upload date: Aug 20, 2025
Size: 33.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for llmring_registry-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd0c456b7fb0b22e3a08f8764b58e725f5aeebbc354b3bc2b478015ff25190d3`
MD5	`72c0cc45470e655f5d16094c1f71f9e5`
BLAKE2b-256	`962135448c7b8ecb618e2ebbc60dc7902340d7dc04592811634dfa7083bfa737`

See more details on using hashes here.

llmring-registry 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMRing Registry

Overview

Architecture

Quick Start

Installation

Manual Curation Workflow (Human-validated)

Legacy Automation (deprecated)

Extraction System

1. HTML Extraction

2. PDF Extraction

3. Validation & Consensus

Model Schema

Commands Reference

Fetching Documentation

Extraction

Data Management

Environment Variables

Automation

GitHub Actions Workflow

Development

Project Structure

Adding a New Provider

Testing

Integration with LLMRing

Contributing

Priority Areas

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes