Skip to main content

Official Python SDK for the Refyne API - LLM-powered web extraction

Project description

Refyne SDK for Python

Official Python SDK for the Refyne API - LLM-powered web extraction that transforms unstructured websites into clean, typed data.

API Endpoint: https://api.refyne.uk | Documentation: refyne.uk/docs

PyPI version CI

Features

  • Async-First: Built on httpx for async/await support
  • Type-Safe: Full type hints and dataclasses
  • Smart Caching: Respects Cache-Control headers automatically
  • Auto-Retry: Handles rate limits and transient errors with exponential backoff
  • SOLID Design: Dependency injection for loggers, HTTP clients, and caches
  • API Version Compatibility: Warns about breaking changes
  • Python 3.9+: Supports Python 3.9 through 3.13

Installation

pip install refyne

Quick Start

import asyncio
from refyne import Refyne

async def main():
    # Create client
    client = Refyne(api_key="your_api_key")

    # Extract structured data from a web page
    result = await client.extract(
        url="https://example.com/product/123",
        schema={
            "name": {"type": "string", "description": "Product name"},
            "price": {"type": "number", "description": "Price in USD"},
            "in_stock": {"type": "boolean"},
        },
    )

    print(result.data)
    # {"name": "Example Product", "price": 29.99, "in_stock": True}

    # Don't forget to close the client
    await client.close()

asyncio.run(main())

Using Context Manager

async with Refyne(api_key="your_api_key") as client:
    result = await client.extract(url=url, schema=schema)

Crawl Jobs

Extract data from multiple pages:

from refyne import Refyne, JobStatus

async with Refyne(api_key="your_api_key") as client:
    # Start a crawl job
    job = await client.crawl(
        url="https://example.com/products",
        schema={"name": "string", "price": "number"},
        options={
            "followSelector": "a.product-link",
            "maxPages": 20,
            "delay": "1s",
        },
    )

    print(f"Job started: {job.job_id}")

    # Poll for completion
    status = await client.jobs.get(job.job_id)
    while status.status in (JobStatus.PENDING, JobStatus.RUNNING):
        await asyncio.sleep(2)
        status = await client.jobs.get(job.job_id)
        print(f"Progress: {status.page_count} pages")

    # Get results
    results = await client.jobs.get_results(job.job_id)
    print(f"Extracted {results.page_count} pages")

Configuration

from refyne import Refyne

client = Refyne(
    api_key="your_api_key",
    base_url="https://api.refyne.uk",  # Override API URL
    timeout=60.0,                       # Request timeout (seconds)
    max_retries=3,                      # Retry attempts
    logger=my_logger,                   # Custom logger
    cache=my_cache,                     # Custom cache
    cache_enabled=True,                 # Enable/disable caching
    user_agent_suffix="MyApp/1.0",     # Custom User-Agent
    verify_ssl=True,                    # SSL verification
)

Custom Logger

Inject your own logger:

from refyne import Logger

class MyLogger:
    def debug(self, msg: str, meta: dict | None = None) -> None:
        print(f"[DEBUG] {msg}")

    def info(self, msg: str, meta: dict | None = None) -> None:
        print(f"[INFO] {msg}")

    def warn(self, msg: str, meta: dict | None = None) -> None:
        print(f"[WARN] {msg}")

    def error(self, msg: str, meta: dict | None = None) -> None:
        print(f"[ERROR] {msg}")

client = Refyne(api_key="...", logger=MyLogger())

Custom Cache

The SDK respects Cache-Control headers. Provide a custom cache:

from refyne import Cache, CacheEntry

class RedisCache:
    async def get(self, key: str) -> CacheEntry | None:
        # Fetch from Redis
        ...

    async def set(self, key: str, entry: CacheEntry) -> None:
        # Store in Redis with TTL from entry.expires_at
        ...

    async def delete(self, key: str) -> None:
        # Delete from Redis
        ...

client = Refyne(api_key="...", cache=RedisCache())

BYOK (Bring Your Own Key)

Use your own LLM provider API keys:

# Configure your OpenAI key
await client.llm.upsert_key(
    provider="openai",
    api_key="sk-...",
    default_model="gpt-4o",
)

# Set fallback chain
await client.llm.set_chain([
    {"provider": "openai", "model": "gpt-4o"},
    {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
    {"provider": "credits", "model": "default"},
])

# Extract using your keys
result = await client.extract(
    url="https://example.com/product",
    schema={"title": "string"},
    llm_config={
        "provider": "openai",
        "model": "gpt-4o-mini",
    },
)

print(f"Used BYOK: {result.usage.is_byok}")

Error Handling

from refyne import (
    RefyneError,
    RateLimitError,
    ValidationError,
    AuthenticationError,
)

try:
    await client.extract(url=url, schema=schema)
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except ValidationError as e:
    print(f"Validation errors: {e.errors}")
except AuthenticationError:
    print("Invalid API key")
except RefyneError as e:
    print(f"API error: {e.message} ({e.status})")

API Reference

Main Client

Method Description
client.extract(url, schema) Extract data from a single page
client.crawl(url, schema, options) Start an async crawl job
client.analyze(url, depth) Analyze a site and suggest schema
client.get_usage() Get usage statistics

Sub-Clients

Client Methods
client.jobs list(), get(id), get_results(id)
client.schemas list(), get(id), create(), update(), delete()
client.sites list(), get(id), create(), update(), delete()
client.keys list(), create(), revoke(id)
client.llm list_providers(), list_keys(), upsert_key(), get_chain(), set_chain()

Documentation

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linter
ruff check src tests

# Run type checker
mypy src

Testing with Demo Site

A demo site is available at demo.refyne.uk for testing SDK functionality. The site contains realistic data across multiple content types:

Endpoint Content Type Example Use Case
https://demo.refyne.uk/products Product catalog Extract prices, descriptions, ratings
https://demo.refyne.uk/jobs Job listings Extract salaries, requirements, companies
https://demo.refyne.uk/blog Blog posts Extract articles, authors, tags
https://demo.refyne.uk/news News articles Extract headlines, sources, timestamps

Example:

result = await client.extract(
    url="https://demo.refyne.uk/products/1",
    schema={
        "name": "string",
        "price": "number",
        "description": "string",
        "brand": "string",
        "rating": "number",
    },
)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refyne-0.1.5.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refyne-0.1.5-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file refyne-0.1.5.tar.gz.

File metadata

  • Download URL: refyne-0.1.5.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for refyne-0.1.5.tar.gz
Algorithm Hash digest
SHA256 81bf5fbfb70580b3ff271625b6f41583f21f8720e01d51db4a8c1f0e12e1c7a6
MD5 159024f8ebec56af00e93b6237fccb32
BLAKE2b-256 587b49547a69403c41ef0175992955b176d4419680de89b28dc904a6eb0e0f56

See more details on using hashes here.

Provenance

The following attestation bundles were made for refyne-0.1.5.tar.gz:

Publisher: release.yml on jmylchreest/refyne-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file refyne-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: refyne-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for refyne-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7773881048217ba3b52cbbef1bba9f0b62e1b123067b20b73b8c8f64b0eacebe
MD5 3060addca0313f3576872931fe65589c
BLAKE2b-256 a14f113ae5bcab75ded97640a1de5f4c8837a670e86308255080197d8f288dde

See more details on using hashes here.

Provenance

The following attestation bundles were made for refyne-0.1.5-py3-none-any.whl:

Publisher: release.yml on jmylchreest/refyne-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page