A Python package for uncertainAPI
Project description
UncertainAPI
AI-Powered Web Extraction Framework - Extract data from any website using multi-agent AI orchestration.
โก Fully Async: Built with async/await for 10-20x performance improvement on concurrent operations
Overview
UncertainAPI is a Django/DRF framework that uses AI agents to intelligently navigate websites, handle authentication, solve captchas, and extract structured dataโno manual selectors required. Built on AG2 (formerly AutoGen) for robust multi-agent orchestration.
Key Features
- ๐ค AI-Orchestrated Extraction: Multi-agent system handles the entire extraction workflow
- ๐ Smart Authentication: Automatic login, OAuth, token, and session management
- ๐งฉ Zero Configuration: Works without explicit selectors or DOM knowledge
- ๐ Pagination Support: Automatically detects and navigates through pages
- ๐ฆ Pydantic Schemas: Define data structure, AI handles extraction
- โก Background Tasks: Schedule extraction with Celery, Django-Q, or custom backends
- ๐ Pluggable: Swap browser backends (Playwright, Selenium, httpx), AI providers (OpenAI, Anthropic, local models)
- ๐ฏ DRF Integration: Extractors as async API endpoints with caching and serialization
- โก Async-First: All I/O operations use async/await for maximum performance
- ๐ Loguru Logging: Beautiful, structured logging with automatic context
Installation
# Basic installation
pip install uncertainapi
# With specific backends
pip install uncertainapi[playwright] # Playwright (recommended)
pip install uncertainapi[selenium] # Selenium
pip install uncertainapi[requests] # Lightweight requests+BeautifulSoup
# With AI providers
pip install uncertainapi[openai] # OpenAI
pip install uncertainapi[anthropic] # Anthropic Claude
pip install uncertainapi[litellm] # LiteLLM (Ollama, local models)
# With task schedulers
pip install uncertainapi[celery] # Celery
pip install uncertainapi[django-q] # Django-Q
# Full installation (all extras)
pip install uncertainapi[full]
Quick Start
1. Create a New Project
uncertainapi createproject myproject
cd myproject
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
2. Configure Settings
Edit myproject/settings.py:
INSTALLED_APPS = [
# ...
'rest_framework',
'uncertainAPI',
'myapp', # Your extractor app
]
UNCERTAINAPI = {
'AI_PROVIDER': {
'BACKEND': 'uncertainAPI.ai.providers.openai.OpenAIProvider',
'API_KEY': 'your-openai-api-key',
'MODEL': 'gpt-4o',
},
'BROWSER_BACKEND': {
'BACKEND': 'uncertainAPI.browsers.playwright.PlaywrightBackend',
'HEADLESS': True,
'TIMEOUT': 30000,
},
}
3. Define Your Schema
Create myapp/schemas.py:
from uncertainAPI.schemas import ExtractionSchema
from typing import Optional
class ProductSchema(ExtractionSchema):
"""Product data schema."""
name: str
price: float
description: str
in_stock: bool
rating: Optional[float] = None
image_url: Optional[str] = None
4. Create an Extractor View
Create myapp/views.py:
from uncertainAPI.views import ExtractorView
from .schemas import ProductSchema
class ProductScraperView(ExtractorView):
"""Extract product data from e-commerce sites."""
url = "https://example.com/products"
schema_class = ProductSchema
enable_caching = True
cache_ttl = 3600
pagination = True # Automatically handle pagination
5. Add URL Pattern
In myapp/urls.py:
from django.urls import path
from .views import ProductScraperView
urlpatterns = [
path('extract/products/', ProductScraperView.as_view(), name='extract-products'),
]
6. Run and Test
python manage.py migrate
# Use ASGI server (required for async views)
pip install uvicorn[standard]
uvicorn myproject.asgi:application --reload
Visit: http://localhost:8000/extract/products/
Note: UncertainAPI requires an ASGI server (Uvicorn, Daphne) for async support. Traditional WSGI servers won't work.
Architecture
UncertainAPI uses a multi-agent architecture powered by AG2:
Request โ ExtractorView โ ExtractorOrchestrator โ Multi-Agent System
โโ CoordinatorAgent (orchestrates workflow)
โโ NavigatorAgent (browser control, auth, captcha)
โโ ExtractionAgent (data extraction)
โโ ValidationAgent (quality assurance)
โโ PaginationAgent (handles pagination)
โ
Structured Data โ DRF Response
Advanced Usage
Custom Authentication
from uncertainAPI.auth.basic import BasicAuthHandler
from uncertainAPI.views import ExtractorView
class AuthenticatedScraperView(ExtractorView):
url = "https://members.example.com/dashboard"
schema_class = DashboardSchema
auth_handler_class = BasicAuthHandler
def get_auth_handler(self, request):
credentials = {
"username": request.user.username,
"password": request.data.get("password"),
}
return BasicAuthHandler(
credentials=credentials,
username_selector="#username",
password_selector="#password",
submit_selector="button[type='submit']",
)
Background Extraction
class BackgroundScraperView(ExtractorView):
url = "https://example.com/large-dataset"
schema_class = DataSchema
enable_background = True # Schedule task instead of blocking
def post(self, request):
# Task scheduled, returns immediately
return super().post(request)
Persistence to Database
from uncertainAPI.mixins import PersistenceMixin
from uncertainAPI.views import ExtractorView
from .models import Product
class PersistentScraperView(PersistenceMixin, ExtractorView):
url = "https://example.com/products"
schema_class = ProductSchema
model_class = Product # Auto-saves to database
Custom Agent Configuration
from uncertainAPI.agents.orchestrator import ExtractorOrchestrator
from uncertainAPI.agents.roles import NavigatorAgent, ExtractionAgent
class CustomScraperView(ExtractorView):
def get_orchestrator(self, url, request):
orchestrator = super().get_orchestrator(url, request)
# Add custom agents
custom_agents = [
MyCustomAgent(name="custom"),
*orchestrator.get_agents(),
]
orchestrator.agents = custom_agents
return orchestrator
Multiple Browser Backends
# In settings.py - switch to Selenium
UNCERTAINAPI = {
'BROWSER_BACKEND': {
'BACKEND': 'uncertainAPI.browsers.selenium.SeleniumBackend',
'BROWSER_TYPE': 'chrome',
'HEADLESS': True,
},
}
# Or use lightweight requests for static pages
UNCERTAINAPI = {
'BROWSER_BACKEND': {
'BACKEND': 'uncertainAPI.browsers.requests.RequestsBackend',
'TIMEOUT': 30,
},
}
AI Provider Options
# OpenAI (default)
UNCERTAINAPI = {
'AI_PROVIDER': {
'BACKEND': 'uncertainAPI.ai.providers.openai.OpenAIProvider',
'API_KEY': 'sk-...',
'MODEL': 'gpt-4o',
},
}
# Anthropic Claude
UNCERTAINAPI = {
'AI_PROVIDER': {
'BACKEND': 'uncertainAPI.ai.providers.anthropic.AnthropicProvider',
'API_KEY': 'sk-ant-...',
'MODEL': 'claude-3-5-sonnet-20241022',
},
}
# Local models via Ollama
UNCERTAINAPI = {
'AI_PROVIDER': {
'BACKEND': 'uncertainAPI.ai.providers.litellm.LiteLLMProvider',
'MODEL': 'ollama/llama2',
'API_BASE': 'http://localhost:11434',
},
}
Configuration Reference
Settings Structure
UNCERTAINAPI = {
'AI_PROVIDER': {
'BACKEND': str, # AI provider class path
'API_KEY': str, # API key
'MODEL': str, # Model name
'BASE_URL': str, # Optional: custom API base URL
},
'BROWSER_BACKEND': {
'BACKEND': str, # Browser backend class path
'HEADLESS': bool, # Run headless
'TIMEOUT': int, # Timeout in milliseconds
'BROWSER_TYPE': str, # Browser type (for Selenium)
},
'ORCHESTRATOR': {
'BACKEND': str, # Custom orchestrator class
'MAX_ROUNDS': int, # Max agent conversation rounds
'PATTERN': str, # Orchestration pattern ('auto', 'sequential')
'ENABLE_HUMAN_VALIDATION': bool, # Human-in-the-loop
},
'TASK_SCHEDULER': {
'BACKEND': str, # Task scheduler class path
},
'CACHE_BACKEND': str, # Django cache alias
'DEFAULT_AUTH_STORAGE': str, # 'database', 'cache', or 'memory'
}
CLI Commands
# Create new project
uncertainapi createproject myproject
# Create new extractor app
uncertainapi startapp myapp
Testing
# Install dev dependencies
pip install uncertainapi[dev]
# Run tests
pytest
# With coverage
pytest --cov=uncertainAPI --cov-report=term-missing
Project Structure
myproject/
โโโ manage.py
โโโ requirements.txt
โโโ myproject/
โ โโโ __init__.py
โ โโโ settings.py
โ โโโ urls.py
โ โโโ wsgi.py
โโโ myapp/
โโโ __init__.py
โโโ apps.py
โโโ schemas.py # Pydantic extraction schemas
โโโ views.py # Extractor views
โโโ models.py # Optional: persistence models
โโโ urls.py
How It Works
- Request: User hits extractor endpoint
- Coordination: CoordinatorAgent plans the extraction workflow
- Navigation: NavigatorAgent uses browser tools to fetch the page
- Authentication: If needed, handles login/OAuth/captcha automatically
- Extraction: ExtractionAgent analyzes HTML and extracts data matching schema
- Validation: ValidationAgent checks data quality and completeness
- Pagination: If enabled, PaginationAgent detects and navigates additional pages
- Response: Validated data returned as JSON via DRF
All agent interactions are orchestrated by AG2 for robust, adaptive behavior.
Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
License
MIT License - see LICENSE for details.
Acknowledgments
- Built on AG2 for multi-agent orchestration
- Inspired by the need for intelligent, adaptable web extraction
- Designed with SOLID principles and extensibility in mind
Documentation
๐ Complete Documentation Index - All guides organized by topic
Quick Links:
- ๐ Installation Guide - Install locally (pre-PyPI)
- ๐ Getting Started - Complete walkthrough
- โก Quick Reference - One-page cheat sheet
- ๐ฏ Quick Start - 5-minute tutorial
- โ๏ธ Async Architecture - Async patterns
- ๐ Deployment - Production setup
Support
- Issues: GitHub Issues
- Examples: See
/examplesdirectory
UncertainAPI - Because web extraction shouldn't require certainty about the DOM structure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uncertainapi-0.1.3.dev0.tar.gz.
File metadata
- Download URL: uncertainapi-0.1.3.dev0.tar.gz
- Upload date:
- Size: 42.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
351ea7e6ad0a0e3d534639f3e84658be46a18fffea1af0c9e3460707f048fc47
|
|
| MD5 |
676c9e04465cfe26874566e636501f55
|
|
| BLAKE2b-256 |
e454a30b3ba9b62d6a13e9df297e91d50b752f8b25e03ea9554450c65b9a690f
|
Provenance
The following attestation bundles were made for uncertainapi-0.1.3.dev0.tar.gz:
Publisher:
publish.yml on Lux-speed-labs/uncertainAPI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uncertainapi-0.1.3.dev0.tar.gz -
Subject digest:
351ea7e6ad0a0e3d534639f3e84658be46a18fffea1af0c9e3460707f048fc47 - Sigstore transparency entry: 1340082681
- Sigstore integration time:
-
Permalink:
Lux-speed-labs/uncertainAPI@13ba332de8cd5c5d1e08d60430cf481fe72c89d9 -
Branch / Tag:
refs/tags/0.1.3-dev - Owner: https://github.com/Lux-speed-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@13ba332de8cd5c5d1e08d60430cf481fe72c89d9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file uncertainapi-0.1.3.dev0-py3-none-any.whl.
File metadata
- Download URL: uncertainapi-0.1.3.dev0-py3-none-any.whl
- Upload date:
- Size: 63.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ad6bf88c7d95d4744b7b8dbdd3777ee7d3fa18c85b46a133bafad099e935df2
|
|
| MD5 |
1f1677041069ca03d22f461fb9823818
|
|
| BLAKE2b-256 |
7d0e61696bdbccfb2b0cb4223d44a55cc15c465f83f7b0ed04a21d90fe2ca432
|
Provenance
The following attestation bundles were made for uncertainapi-0.1.3.dev0-py3-none-any.whl:
Publisher:
publish.yml on Lux-speed-labs/uncertainAPI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uncertainapi-0.1.3.dev0-py3-none-any.whl -
Subject digest:
1ad6bf88c7d95d4744b7b8dbdd3777ee7d3fa18c85b46a133bafad099e935df2 - Sigstore transparency entry: 1340082686
- Sigstore integration time:
-
Permalink:
Lux-speed-labs/uncertainAPI@13ba332de8cd5c5d1e08d60430cf481fe72c89d9 -
Branch / Tag:
refs/tags/0.1.3-dev - Owner: https://github.com/Lux-speed-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@13ba332de8cd5c5d1e08d60430cf481fe72c89d9 -
Trigger Event:
release
-
Statement type: