Skip to main content

Python SDK for Maxun - web automation and data extraction

Project description

Maxun Python SDK

The official Python SDK for Maxun — turn any website into an API.

Works with both Maxun Cloud and Maxun Open Source.

What can you do with Maxun SDK?

  • Extract structured data from any website
  • Scrape entire pages as Markdown or HTML
  • Crawl multiple pages automatically to discover and scrape content
  • Perform web searches and extract results as metadata or full content
  • Use AI to extract data with natural language prompts
  • Capture screenshots (visible area or full page)
  • Automate workflows with clicks, form fills, and navigation
  • Schedule recurring jobs to keep your data fresh
  • Get webhooks when extractions complete
  • Handle pagination automatically (scroll, click, load more)

Installation

pip install maxun

With LLM support:

pip install "maxun[anthropic]"   # Anthropic Claude
pip install "maxun[openai]"      # OpenAI GPT
pip install "maxun[all]"         # All LLM providers

Local Development

Dependencies are declared in pyproject.toml.

To install the SDK locally in editable mode:

cd python-sdk
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -e .               # core only
pip install -e ".[all]"        # core + all LLM providers

Configuration

from maxun import Config

config = Config(
    api_key="your-api-key",            # Required
    base_url="https://app.maxun.dev/api/sdk/",  # Optional, defaults to localhost
    team_id="your-team-uuid",          # Optional, for team-scoped robots
)

Environment variables are supported via a .env file (uses python-dotenv):

MAXUN_API_KEY=your-api-key
MAXUN_BASE_URL=https://app.maxun.dev/api/sdk/
MAXUN_TEAM_ID=your-team-uuid

Core Classes

Scrape

Scrape full pages as Markdown, HTML, or screenshots.

from maxun import Scrape, Config

scraper = Scrape(Config(api_key="..."))

robot = await scraper.create(
    "Page Scraper",
    "https://example.com",
    formats=["markdown", "html"],
)

result = await robot.run()
print(result["data"]["markdown"])

Available formats: "markdown", "html", "screenshot-visible", "screenshot-fullpage"

Crawl

Crawl multiple pages starting from a URL.

from maxun import Crawl, CrawlConfig, Config

crawler = Crawl(Config(api_key="..."))

robot = await crawler.create(
    "Site Crawler",
    "https://example.com",
    CrawlConfig(
        mode="domain",       # "domain" | "subdomain" | "path"
        limit=50,
        max_depth=3,
        use_sitemap=True,
        follow_links=True,
        respect_robots=True,
    ),
)

result = await robot.run()

Search

Search the web and collect results.

from maxun import Search, SearchConfig, Config

searcher = Search(Config(api_key="..."))

robot = await searcher.create(
    "AI News Search",
    SearchConfig(
        query="artificial intelligence 2025",
        mode="discover",     # "discover" | "scrape"
        limit=10,
    ),
)

result = await robot.run()

Extract

Build robots that extract structured data from pages.

from maxun import Extract, Config

extractor = Extract(Config(api_key="..."))

# Capture specific text fields
robot = await (
    extractor
    .create("My Robot")
    .navigate("https://example.com")
    .capture_text({"Title": "h1", "Price": ".price"})
)

# Capture a list of items with optional pagination
robot = await (
    extractor
    .create("Product List")
    .navigate("https://shop.example.com")
    .capture_list({
        "selector": "article.product",
        "pagination": {"type": "clickNext", "selector": "a.next"},
        "maxItems": 100,
    })
)

result = await robot.run()

LLM Extraction

Use a natural language prompt to extract data.

from maxun import Extract, Config

extractor = Extract(Config(api_key="..."))

robot = await extractor.extract(
    prompt="Extract the product name, price, and rating",
    url="https://shop.example.com/product/123",
    llm_provider="anthropic",
    llm_model="claude-3-5-sonnet-20241022",
    llm_api_key="your-anthropic-key",
)

result = await robot.run()

Robot Management

All robot types return a Robot instance with a consistent API:

# Run the robot
result = await robot.run()

# Schedule recurring runs
await robot.schedule({
    "runEvery": 1,
    "runEveryUnit": "DAYS",
    "timezone": "UTC",
})

# Add a webhook
await robot.add_webhook({
    "url": "https://your-server.com/webhook",
    "events": ["run.completed", "run.failed"],
})

# Get execution history
runs = await robot.get_runs()
latest = await robot.get_latest_run()
specific = await robot.get_run("run-id")

# Update metadata or workflow
await robot.update({"meta": {"name": "New Name"}})
await robot.refresh()   # reload from server

# Delete
await robot.delete()

Scheduling

from maxun import ScheduleConfig

await robot.schedule({
    "runEvery": 6,
    "runEveryUnit": "HOURS",   # MINUTES | HOURS | DAYS | WEEKS | MONTHS
    "timezone": "America/New_York",
})

# Stop scheduling
await robot.unschedule()

# Read current schedule
schedule = robot.get_schedule()

Webhooks

await robot.add_webhook({
    "url": "https://your-server.com/webhook",
    "events": ["run.completed", "run.failed"],
})

webhooks = robot.get_webhooks()
await robot.remove_webhooks()

Error Handling

from maxun import MaxunError

try:
    result = await robot.run()
except MaxunError as e:
    print(f"Error {e.status_code}: {e}")
    print(f"Details: {e.details}")

Types Reference

Type Description
Config SDK configuration (api_key, base_url, team_id)
CrawlConfig Crawl robot configuration
SearchConfig Search robot configuration
ScheduleConfig Schedule configuration
WebhookConfig Webhook configuration
ExtractListConfig List capture configuration
PaginationConfig Pagination strategy
MaxunError SDK exception with status_code and details

Examples

See the examples/ directory for complete working examples.

Requirements

  • Python 3.8+
  • httpx >= 0.24.0
  • python-dotenv >= 1.0.0
  • Optional: anthropic >= 0.18.0, openai >= 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maxun-0.0.3.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maxun-0.0.3-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file maxun-0.0.3.tar.gz.

File metadata

  • Download URL: maxun-0.0.3.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.5

File hashes

Hashes for maxun-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2c504d74abaa414567961e1b1612d9d5d1d663a29bd074c20a143d31ca6a764a
MD5 2851a226aedce277f3a8f4d7d3ca04a9
BLAKE2b-256 cb8ecab7e054d676314c7c4c472b2c36459fcb328d9a23ec65b6d0f7742b77b4

See more details on using hashes here.

File details

Details for the file maxun-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: maxun-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.5

File hashes

Hashes for maxun-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 36e59918f58b697612813517ef0754dd25f25169f5d397f28a822be1e5f02f86
MD5 e605a7b4daab0a77ddb88c31b5dcf618
BLAKE2b-256 fbec55198fb91af1780a505bcde320633392730237d3d9364f41144afe089be0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page