Skip to main content

Official Python SDK for the Refyne API - LLM-powered web extraction

Project description

Refyne SDK for Python

Official Python SDK for the Refyne API - LLM-powered web extraction that transforms unstructured websites into clean, typed data.

API Endpoint: https://api.refyne.uk | Documentation: refyne.uk/docs

PyPI version CI

Features

  • Async-First: Built on httpx for async/await support
  • Type-Safe: Full type hints and dataclasses
  • Smart Caching: Respects Cache-Control headers automatically
  • Auto-Retry: Handles rate limits and transient errors with exponential backoff
  • SOLID Design: Dependency injection for loggers, HTTP clients, and caches
  • API Version Compatibility: Warns about breaking changes
  • Python 3.9+: Supports Python 3.9 through 3.13

Installation

pip install refyne

Quick Start

import asyncio
from refyne import Refyne

async def main():
    # Create client
    client = Refyne(api_key="your_api_key")

    # Extract structured data from a web page
    result = await client.extract(
        url="https://example.com/product/123",
        schema={
            "name": {"type": "string", "description": "Product name"},
            "price": {"type": "number", "description": "Price in USD"},
            "in_stock": {"type": "boolean"},
        },
    )

    print(result.data)
    # {"name": "Example Product", "price": 29.99, "in_stock": True}

    # Don't forget to close the client
    await client.close()

asyncio.run(main())

Using Context Manager

async with Refyne(api_key="your_api_key") as client:
    result = await client.extract(url=url, schema=schema)

Crawl Jobs

Extract data from multiple pages:

from refyne import Refyne, JobStatus

async with Refyne(api_key="your_api_key") as client:
    # Start a crawl job
    job = await client.crawl(
        url="https://example.com/products",
        schema={"name": "string", "price": "number"},
        options={
            "followSelector": "a.product-link",
            "maxPages": 20,
            "delay": "1s",
        },
    )

    print(f"Job started: {job.job_id}")

    # Poll for completion
    status = await client.jobs.get(job.job_id)
    while status.status in (JobStatus.PENDING, JobStatus.RUNNING):
        await asyncio.sleep(2)
        status = await client.jobs.get(job.job_id)
        print(f"Progress: {status.page_count} pages")

    # Get results
    results = await client.jobs.get_results(job.job_id)
    print(f"Extracted {results.page_count} pages")

Configuration

from refyne import Refyne

client = Refyne(
    api_key="your_api_key",
    base_url="https://api.refyne.uk",  # Override API URL
    timeout=60.0,                       # Request timeout (seconds)
    max_retries=3,                      # Retry attempts
    logger=my_logger,                   # Custom logger
    cache=my_cache,                     # Custom cache
    cache_enabled=True,                 # Enable/disable caching
    user_agent_suffix="MyApp/1.0",     # Custom User-Agent
    verify_ssl=True,                    # SSL verification
)

Custom Logger

Inject your own logger:

from refyne import Logger

class MyLogger:
    def debug(self, msg: str, meta: dict | None = None) -> None:
        print(f"[DEBUG] {msg}")

    def info(self, msg: str, meta: dict | None = None) -> None:
        print(f"[INFO] {msg}")

    def warn(self, msg: str, meta: dict | None = None) -> None:
        print(f"[WARN] {msg}")

    def error(self, msg: str, meta: dict | None = None) -> None:
        print(f"[ERROR] {msg}")

client = Refyne(api_key="...", logger=MyLogger())

Custom Cache

The SDK respects Cache-Control headers. Provide a custom cache:

from refyne import Cache, CacheEntry

class RedisCache:
    async def get(self, key: str) -> CacheEntry | None:
        # Fetch from Redis
        ...

    async def set(self, key: str, entry: CacheEntry) -> None:
        # Store in Redis with TTL from entry.expires_at
        ...

    async def delete(self, key: str) -> None:
        # Delete from Redis
        ...

client = Refyne(api_key="...", cache=RedisCache())

BYOK (Bring Your Own Key)

Use your own LLM provider API keys:

# Configure your OpenAI key
await client.llm.upsert_key(
    provider="openai",
    api_key="sk-...",
    default_model="gpt-4o",
)

# Set fallback chain
await client.llm.set_chain([
    {"provider": "openai", "model": "gpt-4o"},
    {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
    {"provider": "credits", "model": "default"},
])

# Extract using your keys
result = await client.extract(
    url="https://example.com/product",
    schema={"title": "string"},
    llm_config={
        "provider": "openai",
        "model": "gpt-4o-mini",
    },
)

print(f"Used BYOK: {result.usage.is_byok}")

Error Handling

from refyne import (
    RefyneError,
    RateLimitError,
    ValidationError,
    AuthenticationError,
)

try:
    await client.extract(url=url, schema=schema)
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except ValidationError as e:
    print(f"Validation errors: {e.errors}")
except AuthenticationError:
    print("Invalid API key")
except RefyneError as e:
    print(f"API error: {e.message} ({e.status})")

API Reference

Main Client

Method Description
client.extract(url, schema) Extract data from a single page
client.crawl(url, schema, options) Start an async crawl job
client.analyze(url, depth) Analyze a site and suggest schema
client.get_usage() Get usage statistics

Sub-Clients

Client Methods
client.jobs list(), get(id), get_results(id)
client.schemas list(), get(id), create(), update(), delete()
client.sites list(), get(id), create(), update(), delete()
client.keys list(), create(), revoke(id)
client.llm list_providers(), list_keys(), upsert_key(), get_chain(), set_chain()

Documentation

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linter
ruff check src tests

# Run type checker
mypy src

Testing with Demo Site

A demo site is available at demo.refyne.uk for testing SDK functionality. The site contains realistic data across multiple content types:

Endpoint Content Type Example Use Case
https://demo.refyne.uk/products Product catalog Extract prices, descriptions, ratings
https://demo.refyne.uk/jobs Job listings Extract salaries, requirements, companies
https://demo.refyne.uk/blog Blog posts Extract articles, authors, tags
https://demo.refyne.uk/news News articles Extract headlines, sources, timestamps

Example:

result = await client.extract(
    url="https://demo.refyne.uk/products/1",
    schema={
        "name": "string",
        "price": "number",
        "description": "string",
        "brand": "string",
        "rating": "number",
    },
)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refyne-0.1.46.tar.gz (103.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refyne-0.1.46-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file refyne-0.1.46.tar.gz.

File metadata

  • Download URL: refyne-0.1.46.tar.gz
  • Upload date:
  • Size: 103.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for refyne-0.1.46.tar.gz
Algorithm Hash digest
SHA256 b00590baeec77c210b4a72c80b24072b49d16f92d519201ca833523677313656
MD5 8568d2626fdc2440e441c99db9d11f70
BLAKE2b-256 b9b3eb5f6aacf554257e161a5d8ab57417e05309612a6b6c2076be182d91973f

See more details on using hashes here.

Provenance

The following attestation bundles were made for refyne-0.1.46.tar.gz:

Publisher: release.yml on jmylchreest/refyne-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file refyne-0.1.46-py3-none-any.whl.

File metadata

  • Download URL: refyne-0.1.46-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for refyne-0.1.46-py3-none-any.whl
Algorithm Hash digest
SHA256 5109a621c54ed791ba1503bc749384ac3b33680f4015e2b06d77fcd557f80612
MD5 27a4a2a3a35d5e00111db531240098bf
BLAKE2b-256 c276400563a8a306bae218773d5cbe89eca021f1efa104e5082decf3f93a2c87

See more details on using hashes here.

Provenance

The following attestation bundles were made for refyne-0.1.46-py3-none-any.whl:

Publisher: release.yml on jmylchreest/refyne-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page