Official Python SDK for ContentAPI — extract content from any URL
Project description
ContentAPI Python SDK
Official Python SDK for ContentAPI — extract structured content from any URL.
Features
- 🌐 Web extraction — Get clean markdown/text from any webpage
- 🎬 YouTube — Transcripts, metadata, and summaries
- 🐦 Twitter/X — Thread and tweet extraction
- 🤖 Reddit — Post extraction
- 🔍 Web search — Search the web programmatically
- 📦 Batch — Extract multiple URLs in a single request
- ⚡ Async support — Full async/await with
httpx - 🔄 Auto-retry — Exponential backoff on rate limits and server errors
- 📐 Type-safe — Pydantic v2 models with full type hints
Installation
pip install contentapi
Quick Start
from contentapi import ContentAPI
client = ContentAPI(api_key="sk_live_...")
# Extract web content
result = client.web.extract("https://example.com")
print(result.title) # "Example Domain"
print(result.content) # Extracted content as markdown
print(result.word_count) # 17
Usage
Web Extraction
# Default extraction
result = client.web.extract("https://example.com")
# Specify output format
result = client.web.extract("https://example.com", format="markdown")
result = client.web.extract("https://example.com", format="text")
# Access structured data
print(result.title)
print(result.content)
print(result.word_count)
print(result.metadata.language) # "en"
print(result.metadata.description) # Meta description
# Page structure
for item in result.structure or []:
print(item.tag, item.text)
YouTube
# Get transcript with segments
transcript = client.youtube.transcript("https://youtube.com/watch?v=dQw4w9WgXcQ")
print(transcript.title) # Video title
print(transcript.channel) # Channel name
print(transcript.full_text) # All segments joined
print(transcript.word_count)
for segment in transcript.segments:
print(f"[{segment.start:.1f}s] {segment.text}")
# Get video metadata
metadata = client.youtube.metadata("https://youtube.com/watch?v=dQw4w9WgXcQ")
print(metadata.title)
print(metadata.description)
print(metadata.view_count)
print(metadata.duration) # seconds
print(metadata.published_at)
print(metadata.tags)
Twitter / X
thread = client.twitter.thread("https://x.com/user/status/123456789")
print(thread.author) # "@user"
print(thread.content) # Thread text
for tweet in thread.tweets or []:
print(tweet.text, tweet.likes)
post = client.reddit.post("https://reddit.com/r/Python/comments/abc123/my_post/")
print(post.title)
print(post.subreddit) # "r/Python"
print(post.author)
print(post.score)
print(post.content)
Web Search
results = client.search("python RAG tutorial", count=5)
print(f"Found {results.total_results} results")
for item in results.results:
print(f"{item.title}: {item.url}")
print(f" {item.snippet}")
Batch Extraction
batch = client.batch([
"https://example.com",
"https://youtube.com/watch?v=dQw4w9WgXcQ",
"https://x.com/user/status/123",
])
print(f"{batch.summary.succeeded}/{batch.summary.total} succeeded")
for item in batch.results:
if item.success:
print(f"✅ {item.url}: {item.data}")
else:
print(f"❌ {item.url}: {item.error}")
Async Usage
import asyncio
from contentapi import ContentAPI
async def main():
async with ContentAPI(api_key="sk_live_...", async_mode=True) as client:
# All methods return coroutines in async mode
result = await client.web.extract("https://example.com")
print(result.title)
# Parallel requests
import asyncio
web, yt = await asyncio.gather(
client.web.extract("https://example.com"),
client.youtube.transcript("https://youtube.com/watch?v=dQw4w9WgXcQ"),
)
asyncio.run(main())
You can also use the async methods explicitly:
result = await client.web.aextract("https://example.com")
transcript = await client.youtube.atranscript("https://youtube.com/watch?v=...")
Error Handling
from contentapi import (
ContentAPI,
ContentAPIError,
AuthenticationError,
RateLimitError,
QuotaExceededError,
ExtractionError,
NotFoundError,
)
client = ContentAPI(api_key="sk_live_...")
try:
result = client.web.extract("https://example.com")
except AuthenticationError:
print("Invalid API key!")
except RateLimitError as e:
print(f"Rate limited! Retry after {e.retry_after}s")
except QuotaExceededError:
print("Out of credits!")
except ExtractionError as e:
print(f"Extraction failed: {e.message}")
except NotFoundError:
print("Endpoint not found")
except ContentAPIError as e:
print(f"API error [{e.status_code}]: {e.message}")
Automatic Retries
The SDK automatically retries on:
- 429 — Rate limit exceeded (with exponential backoff)
- 503 — Service unavailable
- Timeouts — Network timeouts
Default: 3 retries with exponential backoff (1s → 2s → 4s).
# Customize retry behavior
client = ContentAPI(
api_key="sk_live_...",
max_retries=5,
timeout=30.0,
)
Configuration
client = ContentAPI(
api_key="sk_live_...", # Required
base_url="https://api.example.com", # Custom base URL
timeout=60.0, # Request timeout in seconds
max_retries=3, # Max retry attempts
)
Credits Tracking
Every response includes credit usage:
result = client.web.extract("https://example.com")
print(result.credits_used) # 1
print(result.credits_remaining) # 99
Context Manager
# Sync
with ContentAPI(api_key="sk_live_...") as client:
result = client.web.extract("https://example.com")
# Async
async with ContentAPI(api_key="sk_live_...", async_mode=True) as client:
result = await client.web.extract("https://example.com")
Requirements
- Python ≥ 3.9
httpx≥ 0.25pydantic≥ 2.0
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contentapi-0.1.0.tar.gz.
File metadata
- Download URL: contentapi-0.1.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
743159fc48df841a058c67e4ec8b13f3cac4662cec637d7d35c044bedb9bd189
|
|
| MD5 |
094008012d91098f287ac771a3e31fde
|
|
| BLAKE2b-256 |
febc502ea7d7bb96e23ff611e1dfc7f5bdcdcc8d878a1fc1d6464f6c355b738a
|
File details
Details for the file contentapi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contentapi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be154d5999c319375a2f371f4bc596be4d3436b9bd5cab5e8e214415b5a40946
|
|
| MD5 |
1b4e420074ab18279a22a0f9fbd267e6
|
|
| BLAKE2b-256 |
2a6abe19fe812bfb851117adfa03f80975c6c7760f669e706a744e2f84250c32
|