The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
Project description
Thordata Python SDK
📦 Installation
pip install thordata-sdk
Optional dependencies for Scraping Browser examples:
pip install playwright
🔐 Configuration
Set the following environment variables (recommended):
# Required for SERP, Universal, and Proxy Network
export THORDATA_SCRAPER_TOKEN="your_scraper_token"
# Required for Web Scraper Tasks & Account Management
export THORDATA_PUBLIC_TOKEN="your_public_token"
export THORDATA_PUBLIC_KEY="your_public_key"
# Optional: Default Proxy Credentials (for Proxy Network)
export THORDATA_RESIDENTIAL_USERNAME="user"
export THORDATA_RESIDENTIAL_PASSWORD="pass"
🚀 Quick Start
from thordata import ThordataClient
# Initialize (credentials loaded from env)
client = ThordataClient(scraper_token="...")
# 1. SERP Search
print("--- SERP Search ---")
results = client.serp_search("python tutorial", engine="google")
print(f"Title: {results['organic'][0]['title']}")
# 2. Universal Scrape (Web Unlocker)
print("\n--- Universal Scrape ---")
html = client.universal_scrape("https://httpbin.org/html")
print(f"HTML Length: {len(html)}")
📚 Core Features
🌐 Proxy Network
Easily generate proxy URLs with geo-targeting and sticky sessions. The SDK handles connection pooling automatically.
from thordata import ProxyConfig, ProxyProduct
# Create a proxy configuration
proxy = ProxyConfig(
username="user",
password="pass",
product=ProxyProduct.RESIDENTIAL,
country="us",
city="new_york",
session_id="session123",
session_duration=10 # Sticky for 10 mins
)
# Use with the client (high performance)
response = client.get("https://httpbin.org/ip", proxy_config=proxy)
print(response.json())
# Or get the URL string for other libs (requests, scrapy, etc.)
proxy_url = proxy.build_proxy_url()
print(f"Proxy URL: {proxy_url}")
🔍 SERP API
Real-time search results from Google, Bing, Yandex, etc.
from thordata import SerpRequest, Engine
# Simple
results = client.serp_search(
query="pizza near me",
engine=Engine.GOOGLE_MAPS,
country="us"
)
# Advanced (Strongly Typed)
request = SerpRequest(
query="AI news",
engine="google_news",
num=50,
time_filter="week",
location="San Francisco",
render_js=True
)
results = client.serp_search_advanced(request)
🔓 Universal Scraping API (Web Unlocker)
Bypass Cloudflare, CAPTCHAs, and antibot systems.
html = client.universal_scrape(
url="https://example.com/protected",
js_render=True,
wait_for=".content",
country="gb",
output_format="html"
)
🕷️ Web Scraper API (Async Tasks)
Manage asynchronous scraping tasks for massive scale.
# 1. Create Task
task_id = client.create_scraper_task(
file_name="my_task",
spider_id="universal",
spider_name="universal",
parameters={"url": "https://example.com"}
)
print(f"Task Created: {task_id}")
# 2. Wait for Completion
status = client.wait_for_task(task_id, max_wait=600)
# 3. Get Result
if status == "ready":
download_url = client.get_task_result(task_id)
print(f"Result: {download_url}")
📹 Video/Audio Tasks
Download content from YouTube and other supported platforms.
from thordata import CommonSettings
task_id = client.create_video_task(
file_name="video_{{VideoID}}",
spider_id="youtube_video_by-url",
spider_name="youtube.com",
parameters={"url": "https://youtube.com/watch?v=..."},
common_settings=CommonSettings(resolution="1080p")
)
📊 Account Management
Access usage statistics, manage sub-users, and whitelist IPs.
# Get Usage Stats
stats = client.get_usage_statistics("2024-01-01", "2024-01-31")
print(f"Balance: {stats.balance_gb():.2f} GB")
# List Proxy Users
users = client.list_proxy_users()
print(f"Active Sub-users: {users.user_count}")
# Whitelist IP
client.add_whitelist_ip("1.2.3.4")
⚙️ Advanced Usage
Async Client
For high-concurrency applications, use AsyncThordataClient.
import asyncio
from thordata import AsyncThordataClient
async def main():
async with AsyncThordataClient(scraper_token="...") as client:
# SERP
results = await client.serp_search("async python")
# Universal
html = await client.universal_scrape("https://example.com")
asyncio.run(main())
Note: AsyncThordataClient does not support HTTPS proxy tunneling (TLS-in-TLS) due to aiohttp limitations. For proxy network requests, use the sync client.
Custom Retry Configuration
from thordata import RetryConfig
retry = RetryConfig(
max_retries=5,
backoff_factor=1.5,
retry_on_status_codes={429, 500, 502, 503, 504}
)
client = ThordataClient(..., retry_config=retry)
📄 License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thordata_sdk-1.1.0.tar.gz.
File metadata
- Download URL: thordata_sdk-1.1.0.tar.gz
- Upload date:
- Size: 51.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89a5271d9f692b51a85331379c50ce05155750e1dc0dc0f95b7a553a89362ee4
|
|
| MD5 |
57f0cfa9321283903c89f8195a3e7d53
|
|
| BLAKE2b-256 |
bdfff54594a617d2e1e6d852d35b3d641ec521604d659c0cdaebb814d985c75a
|
Provenance
The following attestation bundles were made for thordata_sdk-1.1.0.tar.gz:
Publisher:
pypi-publish.yml on Thordata/thordata-python-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thordata_sdk-1.1.0.tar.gz -
Subject digest:
89a5271d9f692b51a85331379c50ce05155750e1dc0dc0f95b7a553a89362ee4 - Sigstore transparency entry: 797650399
- Sigstore integration time:
-
Permalink:
Thordata/thordata-python-sdk@e610584c3f50c85d54be57cc207884204a6396ab -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/Thordata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@e610584c3f50c85d54be57cc207884204a6396ab -
Trigger Event:
push
-
Statement type:
File details
Details for the file thordata_sdk-1.1.0-py3-none-any.whl.
File metadata
- Download URL: thordata_sdk-1.1.0-py3-none-any.whl
- Upload date:
- Size: 43.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa77c26bb0c29ba9afc6a0f32632cd23bd71a26e12accc70df974b0d2c958bfc
|
|
| MD5 |
f313d5df82a8e784c79ed77591bf7080
|
|
| BLAKE2b-256 |
be4de6a3b3e4ff2240f24039c57ac54c1d61206a9a8914a4a122278c3304e456
|
Provenance
The following attestation bundles were made for thordata_sdk-1.1.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on Thordata/thordata-python-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thordata_sdk-1.1.0-py3-none-any.whl -
Subject digest:
aa77c26bb0c29ba9afc6a0f32632cd23bd71a26e12accc70df974b0d2c958bfc - Sigstore transparency entry: 797650404
- Sigstore integration time:
-
Permalink:
Thordata/thordata-python-sdk@e610584c3f50c85d54be57cc207884204a6396ab -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/Thordata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@e610584c3f50c85d54be57cc207884204a6396ab -
Trigger Event:
push
-
Statement type: