Skip to main content

The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.

Project description

Thordata Python SDK

Official Python Client for Thordata APIs

Proxy Network • SERP API • Web Unlocker • Web Scraper API

PyPI version Python Versions License


📦 Installation

pip install thordata-sdk

Optional dependencies for Scraping Browser examples:

pip install playwright

🔐 Configuration

Set the following environment variables (recommended):

# Required for SERP, Universal, and Proxy Network
export THORDATA_SCRAPER_TOKEN="your_scraper_token"

# Required for Web Scraper Tasks & Account Management
export THORDATA_PUBLIC_TOKEN="your_public_token"
export THORDATA_PUBLIC_KEY="your_public_key"

# Optional: Default Proxy Credentials (for Proxy Network)
export THORDATA_RESIDENTIAL_USERNAME="user"
export THORDATA_RESIDENTIAL_PASSWORD="pass"

🚀 Quick Start

from thordata import ThordataClient

# Initialize (credentials loaded from env)
client = ThordataClient(scraper_token="...") 

# 1. SERP Search
print("--- SERP Search ---")
results = client.serp_search("python tutorial", engine="google")
print(f"Title: {results['organic'][0]['title']}")

# 2. Universal Scrape (Web Unlocker)
print("\n--- Universal Scrape ---")
html = client.universal_scrape("https://httpbin.org/html")
print(f"HTML Length: {len(html)}")

📚 Core Features

🌐 Proxy Network

Easily generate proxy URLs with geo-targeting and sticky sessions. The SDK handles connection pooling automatically.

from thordata import ProxyConfig, ProxyProduct

# Create a proxy configuration
proxy = ProxyConfig(
    username="user",
    password="pass",
    product=ProxyProduct.RESIDENTIAL,
    country="us",
    city="new_york",
    session_id="session123",
    session_duration=10  # Sticky for 10 mins
)

# Use with the client (high performance)
response = client.get("https://httpbin.org/ip", proxy_config=proxy)
print(response.json())

# Or get the URL string for other libs (requests, scrapy, etc.)
proxy_url = proxy.build_proxy_url()
print(f"Proxy URL: {proxy_url}")

🔍 SERP API

Real-time search results from Google, Bing, Yandex, etc.

from thordata import SerpRequest, Engine

# Simple
results = client.serp_search(
    query="pizza near me",
    engine=Engine.GOOGLE_MAPS,
    country="us"
)

# Advanced (Strongly Typed)
request = SerpRequest(
    query="AI news",
    engine="google_news",
    num=50,
    time_filter="week",
    location="San Francisco",
    render_js=True
)
results = client.serp_search_advanced(request)

🔓 Universal Scraping API (Web Unlocker)

Bypass Cloudflare, CAPTCHAs, and antibot systems.

html = client.universal_scrape(
    url="https://example.com/protected",
    js_render=True,
    wait_for=".content",
    country="gb",
    output_format="html"
)

🕷️ Web Scraper API (Async Tasks)

Manage asynchronous scraping tasks for massive scale.

# 1. Create Task
task_id = client.create_scraper_task(
    file_name="my_task",
    spider_id="universal",
    spider_name="universal",
    parameters={"url": "https://example.com"}
)
print(f"Task Created: {task_id}")

# 2. Wait for Completion
status = client.wait_for_task(task_id, max_wait=600)

# 3. Get Result
if status == "ready":
    download_url = client.get_task_result(task_id)
    print(f"Result: {download_url}")

📹 Video/Audio Tasks

Download content from YouTube and other supported platforms.

from thordata import CommonSettings

task_id = client.create_video_task(
    file_name="video_{{VideoID}}",
    spider_id="youtube_video_by-url",
    spider_name="youtube.com",
    parameters={"url": "https://youtube.com/watch?v=..."},
    common_settings=CommonSettings(resolution="1080p")
)

📊 Account Management

Access usage statistics, manage sub-users, and whitelist IPs.

# Get Usage Stats
stats = client.get_usage_statistics("2024-01-01", "2024-01-31")
print(f"Balance: {stats.balance_gb():.2f} GB")

# List Proxy Users
users = client.list_proxy_users()
print(f"Active Sub-users: {users.user_count}")

# Whitelist IP
client.add_whitelist_ip("1.2.3.4")

⚙️ Advanced Usage

Async Client

For high-concurrency applications, use AsyncThordataClient.

import asyncio
from thordata import AsyncThordataClient

async def main():
    async with AsyncThordataClient(scraper_token="...") as client:
        # SERP
        results = await client.serp_search("async python")
        
        # Universal
        html = await client.universal_scrape("https://example.com")

asyncio.run(main())

Note: AsyncThordataClient does not support HTTPS proxy tunneling (TLS-in-TLS) due to aiohttp limitations. For proxy network requests, use the sync client.

Custom Retry Configuration

from thordata import RetryConfig

retry = RetryConfig(
    max_retries=5,
    backoff_factor=1.5,
    retry_on_status_codes={429, 500, 502, 503, 504}
)

client = ThordataClient(..., retry_config=retry)

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thordata_sdk-1.1.0.tar.gz (51.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thordata_sdk-1.1.0-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file thordata_sdk-1.1.0.tar.gz.

File metadata

  • Download URL: thordata_sdk-1.1.0.tar.gz
  • Upload date:
  • Size: 51.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thordata_sdk-1.1.0.tar.gz
Algorithm Hash digest
SHA256 89a5271d9f692b51a85331379c50ce05155750e1dc0dc0f95b7a553a89362ee4
MD5 57f0cfa9321283903c89f8195a3e7d53
BLAKE2b-256 bdfff54594a617d2e1e6d852d35b3d641ec521604d659c0cdaebb814d985c75a

See more details on using hashes here.

Provenance

The following attestation bundles were made for thordata_sdk-1.1.0.tar.gz:

Publisher: pypi-publish.yml on Thordata/thordata-python-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thordata_sdk-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: thordata_sdk-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thordata_sdk-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa77c26bb0c29ba9afc6a0f32632cd23bd71a26e12accc70df974b0d2c958bfc
MD5 f313d5df82a8e784c79ed77591bf7080
BLAKE2b-256 be4de6a3b3e4ff2240f24039c57ac54c1d61206a9a8914a4a122278c3304e456

See more details on using hashes here.

Provenance

The following attestation bundles were made for thordata_sdk-1.1.0-py3-none-any.whl:

Publisher: pypi-publish.yml on Thordata/thordata-python-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page