Skip to main content

Python SDK for PostCrawl - The Fastest LLM Ready Social Media Crawler

Project description

PostCrawl Python SDK

Official Python SDK for PostCrawl - The Fastest LLM-Ready Social Media Crawler. Extract and search content from Reddit and TikTok with a simple, type-safe Python interface.

Features

  • 🔍 Search across Reddit and TikTok with advanced filtering
  • 📊 Extract content from social media URLs with optional comments
  • 🚀 Combined search and extract in a single operation
  • 🏷️ Type-safe with Pydantic models and full type hints
  • Async/await support with synchronous convenience methods
  • 🛡️ Comprehensive error handling with detailed exceptions
  • 📈 Rate limiting support with credit tracking
  • 🔄 Automatic retries for network errors
  • 🎯 Platform-specific models for Reddit and TikTok data with strong typing
  • 📝 Rich content formatting with markdown support
  • 🐍 Python 3.10+ with modern type annotations and snake_case naming

Installation

Using uv (Recommended)

uv is a fast Python package manager that we recommend:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Add postcrawl to your project
uv add postcrawl

Using pip

pip install postcrawl

Optional: Environment Variables

For loading API keys from .env files:

uv add python-dotenv
# or
pip install python-dotenv

Requirements

Quick Start

Async Usage (Recommended)

import asyncio
from postcrawl import PostCrawlClient

async def main():
    # Initialize the client with your API key
    async with PostCrawlClient(api_key="sk_your_api_key_here") as pc:
        # Search for content
        results = await pc.search(
            social_platforms=["reddit"],
            query="machine learning",
            results=10,
            page=1
        )

        # Process results
        for post in results:
            print(f"{post.title} - {post.url}")
            print(f"  Date: {post.date}")
            print(f"  Snippet: {post.snippet[:100]}...")

# Run the async function
asyncio.run(main())

Synchronous Usage

from postcrawl import PostCrawlClient

# Initialize the client
pc = PostCrawlClient(api_key="sk_your_api_key_here")

# Search synchronously
results = pc.search_sync(
    social_platforms=["reddit", "tiktok"],
    query="artificial intelligence",
    results=5
)

# Extract content from URLs
posts = pc.extract_sync(
    urls=["https://reddit.com/r/...", "https://tiktok.com/@..."],
    include_comments=True
)

API Reference

Search

results = await pc.search(
    social_platforms=["reddit", "tiktok"],
    query="your search query",
    results=10,  # 1-100
    page=1       # pagination
)

Extract

posts = await pc.extract(
    urls=["https://reddit.com/...", "https://tiktok.com/..."],
    include_comments=True,
    response_mode="raw"  # or "markdown"
)

Search and Extract

posts = await pc.search_and_extract(
    social_platforms=["reddit"],
    query="search query",
    results=5,
    page=1,
    include_comments=False,
    response_mode="markdown"
)

Synchronous Methods

# All methods have synchronous versions
results = pc.search_sync(...)
posts = pc.extract_sync(...)
combined = pc.search_and_extract_sync(...)

Examples

Check out the examples/ directory for complete working examples:

  • search_101.py - Basic search functionality demo
  • extract_101.py - Content extraction demo
  • search_and_extract_101.py - Combined operation demo

Run examples with:

# Using uv (recommended)
uv run python examples/search_101.py

# Or with standard Python
cd examples
python search_101.py

Response Models

SearchResult

Response from the search endpoint:

  • title: Title of the search result
  • url: URL of the search result
  • snippet: Text snippet from the content
  • date: Date of the post (e.g., "Dec 28, 2024")
  • image_url: URL of associated image (can be empty string)

ExtractedPost

  • url: Original URL
  • source: Platform name ("reddit" or "tiktok")
  • raw: Raw content data (RedditPost or TiktokPost object) - strongly typed
  • markdown: Markdown formatted content (when response_mode="markdown")
  • error: Error message if extraction failed

Working with Platform-Specific Types

The SDK provides type-safe access to platform-specific data:

from postcrawl import PostCrawlClient, RedditPost, TiktokPost

# Extract content with proper type handling
posts = await pc.extract(urls=["https://reddit.com/..."])

for post in posts:
    if post.error:
        print(f"Error: {post.error}")
    elif isinstance(post.raw, RedditPost):
        # Access Reddit-specific fields with snake_case attributes
        print(f"Subreddit: r/{post.raw.subreddit_name}")
        print(f"Score: {post.raw.score}")
        print(f"Title: {post.raw.title}")
        print(f"Upvotes: {post.raw.upvotes}")
        print(f"Created: {post.raw.created_at}")
        if post.raw.comments:
            print(f"Comments: {len(post.raw.comments)}")
    elif isinstance(post.raw, TiktokPost):
        # Access TikTok-specific fields with snake_case attributes
        print(f"Username: @{post.raw.username}")
        print(f"Likes: {post.raw.likes}")
        print(f"Total Comments: {post.raw.total_comments}")
        print(f"Created: {post.raw.created_at}")
        if post.raw.hashtags:
            print(f"Hashtags: {', '.join(post.raw.hashtags)}")

Error Handling

from postcrawl.exceptions import (
    AuthenticationError,      # Invalid API key
    InsufficientCreditsError, # Not enough credits
    RateLimitError,          # Rate limit exceeded
    ValidationError          # Invalid parameters
)

Development

This project uses uv for dependency management. See DEVELOPMENT.md for detailed setup and contribution guidelines.

Quick Development Setup

# Clone the repository
git clone https://github.com/post-crawl/python-sdk.git
cd python-sdk

# Install dependencies
uv sync

# Run tests
make test

# Run all checks (format, lint, test)
make check

# Build the package
make build

Available Commands

make help         # Show all available commands
make format       # Format code with black and ruff
make lint         # Run linting and type checking
make test         # Run test suite
make check        # Run format, lint, and tests
make build        # Build distribution packages
make verify       # Verify package installation
make publish-test # Publish to TestPyPI

API Key Management

Environment Variables (Recommended)

Store your API key securely in environment variables:

export POSTCRAWL_API_KEY="sk_your_api_key_here"

Or use a .env file:

# .env
POSTCRAWL_API_KEY=sk_your_api_key_here

Then load it in your code:

import os
from dotenv import load_dotenv
from postcrawl import PostCrawlClient

load_dotenv()
pc = PostCrawlClient(api_key=os.getenv("POSTCRAWL_API_KEY"))

Security Best Practices

  • Never hardcode API keys in your source code
  • Add .env to .gitignore to prevent accidental commits
  • Use environment variables in production
  • Rotate keys regularly through the PostCrawl dashboard
  • Set key permissions to limit access to specific operations

Rate Limits & Credits

PostCrawl uses a credit-based system:

  • Search: ~1 credit per 10 results
  • Extract: ~1 credit per URL (without comments)
  • Extract with comments: ~3 credits per URL

Rate limits are returned in response headers:

pc = PostCrawlClient(api_key="sk_...")
results = await pc.search(...)

print(f"Rate limit: {pc.rate_limit_info['limit']}")
print(f"Remaining: {pc.rate_limit_info['remaining']}")
print(f"Reset at: {pc.rate_limit_info['reset']}")

Support

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

postcrawl-1.1.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

postcrawl-1.1.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file postcrawl-1.1.0.tar.gz.

File metadata

  • Download URL: postcrawl-1.1.0.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.20

File hashes

Hashes for postcrawl-1.1.0.tar.gz
Algorithm Hash digest
SHA256 95d22918a6268cbd516ed23ac675f14c685b006f82503b9952e09e10825c6d01
MD5 1e4b98fbe11d858c6ee6aa43fe8dfeeb
BLAKE2b-256 501d50735c7e365fea0489348118daa242cc3559195f6e4f16fdc571c409699e

See more details on using hashes here.

File details

Details for the file postcrawl-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: postcrawl-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.20

File hashes

Hashes for postcrawl-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9665a9e890ef477129b2232a57f542714ffa9923bdf0a239fa93c125ac6b2d1
MD5 6f0d929dc9ce8d3972aebe143a792dab
BLAKE2b-256 dd956478e345172cfe8854ed1ecbb89e8475f0cf8f098dd51bdea8d38aa74d23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page