Official Python SDK for ContentAPI — extract content from any URL

These details have not been verified by PyPI

Project links

Project description

ContentAPI Python SDK

Official Python SDK for ContentAPI — extract structured content from any URL.

Features

🌐 Web extraction — Clean markdown/text from any webpage, with JS rendering
🎬 YouTube — Transcripts, metadata, comments, chapters, summaries, channels, playlists
🐦 Twitter/X — Tweet and thread extraction
🤖 Reddit — Post extraction
🔍 Web search — Search the web programmatically
🧠 AI extraction — Extract structured data with JSON schema or natural language
📝 AI summarization — Summarize any content with AI
🔗 Site crawling — Crawl entire websites (async with polling)
🔄 URL monitoring — Detect changes on web pages
📦 Batch — Extract multiple URLs in a single request
⚡ Async support — Full async/await with httpx
🔄 Auto-retry — Exponential backoff on rate limits and server errors
📐 Type-safe — Pydantic v2 models with full type hints

Installation

pip install contentapi

Quick Start

from contentapi import ContentAPI

client = ContentAPI(api_key="sk_live_...")

# Extract web content
result = client.web.extract("https://example.com")
print(result.title)       # "Example Domain"
print(result.content)     # Extracted content as markdown
print(result.word_count)  # 17

Usage

Web Extraction

# Default extraction
result = client.web.extract("https://example.com")

# JavaScript rendering (for SPAs)
result = client.web.extract("https://spa-app.com", render_js=True)

# Bypass robots.txt
result = client.web.extract("https://example.com", ignore_robots=True)

# RAG chunking
result = client.web.extract("https://example.com", chunk_size=500, chunk_overlap=50)

# Access structured data
print(result.title)
print(result.content)
print(result.word_count)

YouTube

# Get transcript (with Whisper AI fallback for videos without captions)
transcript = client.youtube.transcript("https://youtube.com/watch?v=dQw4w9WgXcQ")
print(transcript.title)
print(transcript.full_text)

for segment in transcript.segments:
    print(f"[{segment.start:.1f}s] {segment.text}")

# Get video metadata
metadata = client.youtube.metadata("https://youtube.com/watch?v=dQw4w9WgXcQ")
print(metadata.view_count, metadata.published_at)

# Get top comments
comments = client.youtube.comments("https://youtube.com/watch?v=dQw4w9WgXcQ", limit=20)
for c in comments.comments:
    print(f"@{c.author}: {c.text} ({c.likes} likes)")

# Get chapters from description
chapters = client.youtube.chapters("https://youtube.com/watch?v=dQw4w9WgXcQ")
for ch in chapters.chapters:
    print(f"{ch.formatted_time} - {ch.title}")

# AI-generated summary
summary = client.youtube.summary("https://youtube.com/watch?v=dQw4w9WgXcQ")
print(summary.summary)
print(summary.key_points)
print(summary.topics)

# Channel metadata + recent videos
channel = client.youtube.channel("@mkbhd")
print(f"{channel.name} - {channel.subscribers} subscribers")
for video in channel.recent_videos:
    print(f"  {video.title} ({video.views} views)")

# Playlist extraction
playlist = client.youtube.playlist("https://youtube.com/playlist?list=PLrAXt...")
for video in playlist.videos:
    print(f"#{video.position} {video.title}")

AI Schema Extraction

# Extract structured data with a JSON schema
data = client.ai.extract(
    url="https://news.ycombinator.com",
    schema={
        "top_stories": [{"title": "string", "points": "number", "url": "string"}]
    }
)
print(data.extracted)  # Structured data matching your schema

# Or use natural language
data = client.ai.extract(
    url="https://amazon.com/product/...",
    prompt="Extract the product name, price, and rating"
)
print(data.extracted)

AI Summarization

result = client.ai.summarize(
    content="Long article text here...",
    title="Optional title"
)
print(result.summary)     # Concise 2-3 sentence summary
print(result.key_points)  # List of key takeaways
print(result.topics)      # Auto-detected topics

Site Crawling

# Start an async crawl
crawl = client.crawl.start(
    url="https://docs.example.com",
    max_pages=50,
    include_patterns=["/docs/*"],
    webhook_url="https://myapp.com/hook"  # Optional: get notified when done
)
print(f"Crawl started: {crawl.crawl_id}")

# Poll for results
import time
while True:
    status = client.crawl.get(crawl.crawl_id)
    if status.status in ("completed", "failed"):
        break
    print(f"Progress: {status.pages_completed}/{status.pages_found}")
    time.sleep(5)

# Access results
for page in status.results:
    print(f"{page.url} — {page.word_count} words")
    print(page.content[:200])

URL Monitoring (Change Detection)

# Create a monitor
monitor = client.monitor.create(
    url="https://competitor.com/pricing",
    interval_hours=24,
    webhook_url="https://myapp.com/changes"
)
print(f"Monitor active: {monitor.monitor_id}")

# List all monitors
monitors = client.monitor.list()
for m in monitors.monitors:
    print(f"{m.url} — next check: {m.next_check}")

# Get change history
details = client.monitor.get(monitor.monitor_id)
for check in details.checks:
    if check.changed:
        print(f"Changed at {check.checked_at}: {check.diff_summary}")

# Delete a monitor
client.monitor.delete(monitor.monitor_id)

Twitter / X

tweet = client.twitter.tweet("https://x.com/user/status/123456789")
print(tweet.content)

thread = client.twitter.thread("https://x.com/user/status/123456789")
for tweet in thread.tweets:
    print(tweet.text, tweet.likes)

post = client.reddit.post("https://reddit.com/r/Python/comments/abc123/my_post/")
print(post.title, post.score)
print(post.content)

Web Search

results = client.search("python RAG tutorial", count=5)
for item in results.results:
    print(f"{item.title}: {item.url}")

Batch Extraction

batch = client.batch([
    "https://example.com",
    "https://youtube.com/watch?v=dQw4w9WgXcQ",
])
print(f"{batch.summary.succeeded}/{batch.summary.total} succeeded")

Async Usage

import asyncio
from contentapi import ContentAPI

async def main():
    async with ContentAPI(api_key="sk_live_...", async_mode=True) as client:
        # Parallel requests
        web, yt = await asyncio.gather(
            client.web.extract("https://example.com"),
            client.youtube.transcript("https://youtube.com/watch?v=dQw4w9WgXcQ"),
        )
        print(web.title, yt.full_text[:100])

asyncio.run(main())

Error Handling

from contentapi import (
    ContentAPI,
    ContentAPIError,
    AuthenticationError,
    RateLimitError,
    QuotaExceededError,
    ExtractionError,
)

try:
    result = client.web.extract("https://example.com")
except AuthenticationError:
    print("Invalid API key!")
except RateLimitError as e:
    print(f"Rate limited! Retry after {e.retry_after}s")
except QuotaExceededError:
    print("Out of credits!")
except ExtractionError as e:
    print(f"Extraction failed: {e.message}")

The SDK automatically retries on 429 and 503 errors with exponential backoff.

Configuration

client = ContentAPI(
    api_key="sk_live_...",               # Required
    base_url="https://api.example.com",  # Custom base URL
    timeout=60.0,                        # Request timeout (seconds)
    max_retries=3,                       # Max retry attempts
)

Also Available

TypeScript SDK — npm install contentapi
MCP Server — npx @contentapi/mcp-server (for Claude, Cursor, Windsurf)
LangChain — pip install langchain-contentapi
LlamaIndex — pip install llamaindex-contentapi

Requirements

Python ≥ 3.9
httpx ≥ 0.25
pydantic ≥ 2.0

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Feb 14, 2026

0.1.0

Feb 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contentapi-0.2.0.tar.gz (15.9 kB view details)

Uploaded Feb 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contentapi-0.2.0-py3-none-any.whl (23.2 kB view details)

Uploaded Feb 14, 2026 Python 3

File details

Details for the file contentapi-0.2.0.tar.gz.

File metadata

Download URL: contentapi-0.2.0.tar.gz
Upload date: Feb 14, 2026
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for contentapi-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9eaa10ebe8c917158497e000a910376c42b0ab7c313ffa1e0d6311f0350d4e74`
MD5	`7717a443596b0140ec6100bd2fb9488b`
BLAKE2b-256	`acb11173d38c90eb904554a82649aa69370179922951fc2636b73869fee7b419`

See more details on using hashes here.

File details

Details for the file contentapi-0.2.0-py3-none-any.whl.

File metadata

Download URL: contentapi-0.2.0-py3-none-any.whl
Upload date: Feb 14, 2026
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for contentapi-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f009ac51348c80cbfe60e7512d1c7295a8731621c55b4c24f412ed3845bc4ad2`
MD5	`8f5d3fc79dcfc48d5f53624f062316a0`
BLAKE2b-256	`46b79d6239cadbcf9ef16452088888a55674cecb9d8ff483fee5d0e8e770c21f`

See more details on using hashes here.

contentapi 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContentAPI Python SDK

Features

Installation

Quick Start

Usage

Web Extraction

YouTube

AI Schema Extraction

AI Summarization

Site Crawling

URL Monitoring (Change Detection)

Twitter / X

Reddit

Web Search

Batch Extraction

Async Usage

Error Handling

Configuration

Also Available

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes