Skip to main content

A Python package for uncertainAPI

Project description

UncertainAPI

AI-Powered Web Extraction Framework - Extract data from any website using multi-agent AI orchestration.

Python Version Django Async License

โšก Fully Async: Built with async/await for 10-20x performance improvement on concurrent operations

Overview

UncertainAPI is a Django/DRF framework that uses AI agents to intelligently navigate websites, handle authentication, solve captchas, and extract structured dataโ€”no manual selectors required. Built on AG2 (formerly AutoGen) for robust multi-agent orchestration.

Key Features

  • ๐Ÿค– AI-Orchestrated Extraction: Multi-agent system handles the entire extraction workflow
  • ๐Ÿ” Smart Authentication: Automatic login, OAuth, token, and session management
  • ๐Ÿงฉ Zero Configuration: Works without explicit selectors or DOM knowledge
  • ๐Ÿ”„ Pagination Support: Automatically detects and navigates through pages
  • ๐Ÿ“ฆ Pydantic Schemas: Define data structure, AI handles extraction
  • โšก Background Tasks: Schedule extraction with Celery, Django-Q, or custom backends
  • ๐Ÿ”Œ Pluggable: Swap browser backends (Playwright, Selenium, httpx), AI providers (OpenAI, Anthropic, local models)
  • ๐ŸŽฏ DRF Integration: Extractors as async API endpoints with caching and serialization
  • โšก Async-First: All I/O operations use async/await for maximum performance
  • ๐Ÿ“ Loguru Logging: Beautiful, structured logging with automatic context

Installation

# Basic installation
pip install uncertainapi

# With specific backends
pip install uncertainapi[playwright]      # Playwright (recommended)
pip install uncertainapi[selenium]        # Selenium
pip install uncertainapi[requests]        # Lightweight requests+BeautifulSoup

# With AI providers
pip install uncertainapi[openai]          # OpenAI
pip install uncertainapi[anthropic]       # Anthropic Claude
pip install uncertainapi[litellm]         # LiteLLM (Ollama, local models)

# With task schedulers
pip install uncertainapi[celery]          # Celery
pip install uncertainapi[django-q]        # Django-Q

# Full installation (all extras)
pip install uncertainapi[full]

Quick Start

1. Create a New Project

uncertainapi createproject myproject
cd myproject
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure Settings

Edit myproject/settings.py:

INSTALLED_APPS = [
    # ...
    'rest_framework',
    'uncertainAPI',
    'myapp',  # Your extractor app
]

UNCERTAINAPI = {
    'AI_PROVIDER': {
        'BACKEND': 'uncertainAPI.ai.providers.openai.OpenAIProvider',
        'API_KEY': 'your-openai-api-key',
        'MODEL': 'gpt-4o',
    },
    'BROWSER_BACKEND': {
        'BACKEND': 'uncertainAPI.browsers.playwright.PlaywrightBackend',
        'HEADLESS': True,
        'TIMEOUT': 30000,
    },
}

3. Define Your Schema

Create myapp/schemas.py:

from uncertainAPI.schemas import ExtractionSchema
from typing import Optional

class ProductSchema(ExtractionSchema):
    """Product data schema."""
    
    name: str
    price: float
    description: str
    in_stock: bool
    rating: Optional[float] = None
    image_url: Optional[str] = None

4. Create an Extractor View

Create myapp/views.py:

from uncertainAPI.views import ExtractorView
from .schemas import ProductSchema

class ProductScraperView(ExtractorView):
    """Extract product data from e-commerce sites."""
    
    url = "https://example.com/products"
    schema_class = ProductSchema
    enable_caching = True
    cache_ttl = 3600
    pagination = True  # Automatically handle pagination

5. Add URL Pattern

In myapp/urls.py:

from django.urls import path
from .views import ProductScraperView

urlpatterns = [
    path('extract/products/', ProductScraperView.as_view(), name='extract-products'),
]

6. Run and Test

python manage.py migrate

# Use ASGI server (required for async views)
pip install uvicorn[standard]
uvicorn myproject.asgi:application --reload

Visit: http://localhost:8000/extract/products/

Note: UncertainAPI requires an ASGI server (Uvicorn, Daphne) for async support. Traditional WSGI servers won't work.

Architecture

UncertainAPI uses a multi-agent architecture powered by AG2:

Request โ†’ ExtractorView โ†’ ExtractorOrchestrator โ†’ Multi-Agent System
                                                โ”œโ”€ CoordinatorAgent (orchestrates workflow)
                                                โ”œโ”€ NavigatorAgent (browser control, auth, captcha)
                                                โ”œโ”€ ExtractionAgent (data extraction)
                                                โ”œโ”€ ValidationAgent (quality assurance)
                                                โ””โ”€ PaginationAgent (handles pagination)
                                                         โ†“
                                            Structured Data โ†’ DRF Response

Advanced Usage

Custom Authentication

from uncertainAPI.auth.basic import BasicAuthHandler
from uncertainAPI.views import ExtractorView

class AuthenticatedScraperView(ExtractorView):
    url = "https://members.example.com/dashboard"
    schema_class = DashboardSchema
    auth_handler_class = BasicAuthHandler
    
    def get_auth_handler(self, request):
        credentials = {
            "username": request.user.username,
            "password": request.data.get("password"),
        }
        return BasicAuthHandler(
            credentials=credentials,
            username_selector="#username",
            password_selector="#password",
            submit_selector="button[type='submit']",
        )

Background Extraction

class BackgroundScraperView(ExtractorView):
    url = "https://example.com/large-dataset"
    schema_class = DataSchema
    enable_background = True  # Schedule task instead of blocking
    
    def post(self, request):
        # Task scheduled, returns immediately
        return super().post(request)

Persistence to Database

from uncertainAPI.mixins import PersistenceMixin
from uncertainAPI.views import ExtractorView
from .models import Product

class PersistentScraperView(PersistenceMixin, ExtractorView):
    url = "https://example.com/products"
    schema_class = ProductSchema
    model_class = Product  # Auto-saves to database

Custom Agent Configuration

from uncertainAPI.agents.orchestrator import ExtractorOrchestrator
from uncertainAPI.agents.roles import NavigatorAgent, ExtractionAgent

class CustomScraperView(ExtractorView):
    def get_orchestrator(self, url, request):
        orchestrator = super().get_orchestrator(url, request)
        
        # Add custom agents
        custom_agents = [
            MyCustomAgent(name="custom"),
            *orchestrator.get_agents(),
        ]
        orchestrator.agents = custom_agents
        
        return orchestrator

Multiple Browser Backends

# In settings.py - switch to Selenium
UNCERTAINAPI = {
    'BROWSER_BACKEND': {
        'BACKEND': 'uncertainAPI.browsers.selenium.SeleniumBackend',
        'BROWSER_TYPE': 'chrome',
        'HEADLESS': True,
    },
}

# Or use lightweight requests for static pages
UNCERTAINAPI = {
    'BROWSER_BACKEND': {
        'BACKEND': 'uncertainAPI.browsers.requests.RequestsBackend',
        'TIMEOUT': 30,
    },
}

AI Provider Options

# OpenAI (default)
UNCERTAINAPI = {
    'AI_PROVIDER': {
        'BACKEND': 'uncertainAPI.ai.providers.openai.OpenAIProvider',
        'API_KEY': 'sk-...',
        'MODEL': 'gpt-4o',
    },
}

# Anthropic Claude
UNCERTAINAPI = {
    'AI_PROVIDER': {
        'BACKEND': 'uncertainAPI.ai.providers.anthropic.AnthropicProvider',
        'API_KEY': 'sk-ant-...',
        'MODEL': 'claude-3-5-sonnet-20241022',
    },
}

# Local models via Ollama
UNCERTAINAPI = {
    'AI_PROVIDER': {
        'BACKEND': 'uncertainAPI.ai.providers.litellm.LiteLLMProvider',
        'MODEL': 'ollama/llama2',
        'API_BASE': 'http://localhost:11434',
    },
}

Configuration Reference

Settings Structure

UNCERTAINAPI = {
    'AI_PROVIDER': {
        'BACKEND': str,              # AI provider class path
        'API_KEY': str,              # API key
        'MODEL': str,                # Model name
        'BASE_URL': str,             # Optional: custom API base URL
    },
    'BROWSER_BACKEND': {
        'BACKEND': str,              # Browser backend class path
        'HEADLESS': bool,            # Run headless
        'TIMEOUT': int,              # Timeout in milliseconds
        'BROWSER_TYPE': str,         # Browser type (for Selenium)
    },
    'ORCHESTRATOR': {
        'BACKEND': str,              # Custom orchestrator class
        'MAX_ROUNDS': int,           # Max agent conversation rounds
        'PATTERN': str,              # Orchestration pattern ('auto', 'sequential')
        'ENABLE_HUMAN_VALIDATION': bool,  # Human-in-the-loop
    },
    'TASK_SCHEDULER': {
        'BACKEND': str,              # Task scheduler class path
    },
    'CACHE_BACKEND': str,            # Django cache alias
    'DEFAULT_AUTH_STORAGE': str,     # 'database', 'cache', or 'memory'
}

CLI Commands

# Create new project
uncertainapi createproject myproject

# Create new extractor app
uncertainapi startapp myapp

Testing

# Install dev dependencies
pip install uncertainapi[dev]

# Run tests
pytest

# With coverage
pytest --cov=uncertainAPI --cov-report=term-missing

Project Structure

myproject/
โ”œโ”€โ”€ manage.py
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ myproject/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ settings.py
โ”‚   โ”œโ”€โ”€ urls.py
โ”‚   โ””โ”€โ”€ wsgi.py
โ””โ”€โ”€ myapp/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ apps.py
    โ”œโ”€โ”€ schemas.py      # Pydantic extraction schemas
    โ”œโ”€โ”€ views.py        # Extractor views
    โ”œโ”€โ”€ models.py       # Optional: persistence models
    โ””โ”€โ”€ urls.py

How It Works

  1. Request: User hits extractor endpoint
  2. Coordination: CoordinatorAgent plans the extraction workflow
  3. Navigation: NavigatorAgent uses browser tools to fetch the page
  4. Authentication: If needed, handles login/OAuth/captcha automatically
  5. Extraction: ExtractionAgent analyzes HTML and extracts data matching schema
  6. Validation: ValidationAgent checks data quality and completeness
  7. Pagination: If enabled, PaginationAgent detects and navigates additional pages
  8. Response: Validated data returned as JSON via DRF

All agent interactions are orchestrated by AG2 for robust, adaptive behavior.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

License

MIT License - see LICENSE for details.

Acknowledgments

  • Built on AG2 for multi-agent orchestration
  • Inspired by the need for intelligent, adaptable web extraction
  • Designed with SOLID principles and extensibility in mind

Documentation

๐Ÿ“š Complete Documentation Index - All guides organized by topic

Quick Links:

Support


UncertainAPI - Because web extraction shouldn't require certainty about the DOM structure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uncertainapi-0.1.3.dev0.tar.gz (42.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uncertainapi-0.1.3.dev0-py3-none-any.whl (63.4 kB view details)

Uploaded Python 3

File details

Details for the file uncertainapi-0.1.3.dev0.tar.gz.

File metadata

  • Download URL: uncertainapi-0.1.3.dev0.tar.gz
  • Upload date:
  • Size: 42.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uncertainapi-0.1.3.dev0.tar.gz
Algorithm Hash digest
SHA256 351ea7e6ad0a0e3d534639f3e84658be46a18fffea1af0c9e3460707f048fc47
MD5 676c9e04465cfe26874566e636501f55
BLAKE2b-256 e454a30b3ba9b62d6a13e9df297e91d50b752f8b25e03ea9554450c65b9a690f

See more details on using hashes here.

Provenance

The following attestation bundles were made for uncertainapi-0.1.3.dev0.tar.gz:

Publisher: publish.yml on Lux-speed-labs/uncertainAPI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uncertainapi-0.1.3.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for uncertainapi-0.1.3.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ad6bf88c7d95d4744b7b8dbdd3777ee7d3fa18c85b46a133bafad099e935df2
MD5 1f1677041069ca03d22f461fb9823818
BLAKE2b-256 7d0e61696bdbccfb2b0cb4223d44a55cc15c465f83f7b0ed04a21d90fe2ca432

See more details on using hashes here.

Provenance

The following attestation bundles were made for uncertainapi-0.1.3.dev0-py3-none-any.whl:

Publisher: publish.yml on Lux-speed-labs/uncertainAPI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page