Python SDK for PostCrawl - The Fastest LLM Ready Social Media Crawler

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kdcokenny ps428

These details have not been verified by PyPI

Project links

Homepage

Project description

PostCrawl Python SDK

Official Python SDK for PostCrawl - The Fastest LLM-Ready Social Media Crawler. Extract and search content from Reddit and TikTok with a simple, type-safe Python interface.

Features

🔍 Search across Reddit and TikTok with advanced filtering
📊 Extract content from social media URLs with optional comments
🚀 Combined search and extract in a single operation
🏷️ Type-safe with Pydantic models and full type hints
⚡ Async/await support with synchronous convenience methods
🛡️ Comprehensive error handling with detailed exceptions
📈 Rate limiting support with credit tracking
🔄 Automatic retries for network errors
🎯 Platform-specific models for Reddit and TikTok data with strong typing
📝 Rich content formatting with markdown support
🐍 Python 3.10+ with modern type annotations and snake_case naming

Installation

Using uv (Recommended)

uv is a fast Python package manager that we recommend:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Add postcrawl to your project
uv add postcrawl

Using pip

pip install postcrawl

Optional: Environment Variables

For loading API keys from .env files:

uv add python-dotenv
# or
pip install python-dotenv

Requirements

Python 3.10 or higher
PostCrawl API key (Get one for free)

Quick Start

Async Usage (Recommended)

import asyncio
from postcrawl import PostCrawlClient

async def main():
    # Initialize the client with your API key
    async with PostCrawlClient(api_key="sk_your_api_key_here") as pc:
        # Search for content
        results = await pc.search(
            social_platforms=["reddit"],
            query="machine learning",
            results=10,
            page=1
        )

        # Process results
        for post in results:
            print(f"{post.title} - {post.url}")
            print(f"  Date: {post.date}")
            print(f"  Snippet: {post.snippet[:100]}...")

# Run the async function
asyncio.run(main())

Synchronous Usage

from postcrawl import PostCrawlClient

# Initialize the client
pc = PostCrawlClient(api_key="sk_your_api_key_here")

# Search synchronously
results = pc.search_sync(
    social_platforms=["reddit", "tiktok"],
    query="artificial intelligence",
    results=5
)

# Extract content from URLs
posts = pc.extract_sync(
    urls=["https://reddit.com/r/...", "https://tiktok.com/@..."],
    include_comments=True
)

API Reference

Search

results = await pc.search(
    social_platforms=["reddit", "tiktok"],
    query="your search query",
    results=10,  # 1-100
    page=1       # pagination
)

Extract

posts = await pc.extract(
    urls=["https://reddit.com/...", "https://tiktok.com/..."],
    include_comments=True,
    response_mode="raw",
    comment_filter_config={
        "min_score": 10,
        "max_depth": 2
    }
)

Search and Extract

posts = await pc.search_and_extract(
    social_platforms=["reddit"],
    query="search query",
    results=5,
    page=1,
    include_comments=True,
    response_mode="markdown",
    comment_filter_config={
        "tier_limits": {"0": 5, "1": 3},
        "preserve_high_quality_threads": True
    }
)

Comment Filtering

The comment_filter_config dictionary allows you to filter comments server-side to reduce data transfer and improve performance:

from postcrawl.types import CommentFilterConfig

posts = await pc.extract(
    urls=["..."],
    include_comments=True,
    comment_filter_config=CommentFilterConfig(
        # Limit comments by depth level
        tier_limits={
            "0": 10, # Max 10 top-level comments
            "1": 5,  # Max 5 replies per comment
            "2": 2   # Max 2 nested replies
        },
        
        # Minimum score/likes threshold
        min_score=10,
        
        # Minimum quality relative to top comment (0.0-1.0)
        top_comment_percentile=0.1,
        
        # Maximum depth to traverse
        max_depth=5,
        
        # Preserve more replies for high-quality threads
        preserve_high_quality_threads=True,
        high_quality_thread_score=100
    )
)

Synchronous Methods

# All methods have synchronous versions
results = pc.search_sync(...)
posts = pc.extract_sync(...)
combined = pc.search_and_extract_sync(...)

Examples

Check out the examples/ directory for complete working examples:

search_101.py - Basic search functionality demo
extract_101.py - Content extraction demo
search_and_extract_101.py - Combined operation demo

Run examples with:

# Using uv (recommended)
uv run python examples/search_101.py

# Or with standard Python
cd examples
python search_101.py

Response Models

SearchResult

Response from the search endpoint:

title: Title of the search result
url: URL of the search result
snippet: Text snippet from the content
date: Date of the post (e.g., "Dec 28, 2024")
image_url: URL of associated image (can be empty string)

ExtractedPost

url: Original URL
source: Platform name ("reddit" or "tiktok")
raw: Raw content data (RedditPost or TiktokPost object) - strongly typed
markdown: Markdown formatted content (when response_mode="markdown")
error: Error message if extraction failed

Working with Platform-Specific Types

The SDK provides type-safe access to platform-specific data:

from postcrawl import PostCrawlClient, RedditPost, TiktokPost

# Extract content with proper type handling
posts = await pc.extract(urls=["https://reddit.com/..."])

for post in posts:
    if post.error:
        print(f"Error: {post.error}")
    elif isinstance(post.raw, RedditPost):
        # Access Reddit-specific fields with snake_case attributes
        print(f"Subreddit: r/{post.raw.subreddit_name}")
        print(f"Score: {post.raw.score}")
        print(f"Title: {post.raw.title}")
        print(f"Upvotes: {post.raw.upvotes}")
        print(f"Created: {post.raw.created_at}")
        if post.raw.comments:
            print(f"Comments: {len(post.raw.comments)}")
    elif isinstance(post.raw, TiktokPost):
        # Access TikTok-specific fields with snake_case attributes
        print(f"Username: @{post.raw.username}")
        print(f"Likes: {post.raw.likes}")
        print(f"Total Comments: {post.raw.total_comments}")
        print(f"Created: {post.raw.created_at}")
        if post.raw.hashtags:
            print(f"Hashtags: {', '.join(post.raw.hashtags)}")

Error Handling

from postcrawl.exceptions import (
    AuthenticationError,      # Invalid API key
    InsufficientCreditsError, # Not enough credits
    RateLimitError,          # Rate limit exceeded
    ValidationError          # Invalid parameters
)

Development

This project uses uv for dependency management. See DEVELOPMENT.md for detailed setup and contribution guidelines.

Quick Development Setup

# Clone the repository
git clone https://github.com/post-crawl/python-sdk.git
cd python-sdk

# Install dependencies
uv sync

# Run tests
make test

# Run all checks (format, lint, test)
make check

# Build the package
make build

Available Commands

make help         # Show all available commands
make format       # Format code with black and ruff
make lint         # Run linting and type checking
make test         # Run test suite
make check        # Run format, lint, and tests
make build        # Build distribution packages
make verify       # Verify package installation
make publish-test # Publish to TestPyPI

API Key Management

Environment Variables (Recommended)

Store your API key securely in environment variables:

export POSTCRAWL_API_KEY="sk_your_api_key_here"

Or use a .env file:

# .env
POSTCRAWL_API_KEY=sk_your_api_key_here

Then load it in your code:

import os
from dotenv import load_dotenv
from postcrawl import PostCrawlClient

load_dotenv()
pc = PostCrawlClient(api_key=os.getenv("POSTCRAWL_API_KEY"))

Security Best Practices

Never hardcode API keys in your source code
Add .env to .gitignore to prevent accidental commits
Use environment variables in production
Rotate keys regularly through the PostCrawl dashboard
Set key permissions to limit access to specific operations

Rate Limits & Credits

PostCrawl uses a credit-based system:

Search: ~1 credit per 10 results
Extract: ~1 credit per URL (without comments)
Extract with comments: ~3 credits per URL

Rate limits are returned in response headers:

pc = PostCrawlClient(api_key="sk_...")
results = await pc.search(...)

print(f"Rate limit: {pc.rate_limit_info['limit']}")
print(f"Remaining: {pc.rate_limit_info['remaining']}")
print(f"Reset at: {pc.rate_limit_info['reset']}")

Support

Documentation: github.com/post-crawl/python-sdk
Issues: github.com/post-crawl/python-sdk/issues
Email: support@postcrawl.com

License

MIT License - see LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kdcokenny ps428

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.2.0

Nov 18, 2025

1.1.0

Nov 12, 2025

1.0.0

Jul 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

postcrawl-1.2.0.tar.gz (18.3 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

postcrawl-1.2.0-py3-none-any.whl (18.1 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file postcrawl-1.2.0.tar.gz.

File metadata

Download URL: postcrawl-1.2.0.tar.gz
Upload date: Nov 18, 2025
Size: 18.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.7.20

File hashes

Hashes for postcrawl-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b431037ad80e6c811a36612abe11f9437e84937c901a5d5d6baf4030a291cb3e`
MD5	`2ca77299817d4b454264bbd8f56cf00e`
BLAKE2b-256	`337ae3c60d28e0730cdbe93d75aee931783afa9cae7ed113c3d140d2a72a8645`

See more details on using hashes here.

File details

Details for the file postcrawl-1.2.0-py3-none-any.whl.

File metadata

Download URL: postcrawl-1.2.0-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 18.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.7.20

File hashes

Hashes for postcrawl-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f8ffea3acfb7a6289a20751e5bb627d343de6e6fb89d1f405cd99f60566e7e1`
MD5	`7932defa1872eb90ceb0db19c6179ff4`
BLAKE2b-256	`eeb43e128e45fb6eeb493a139b076bbde9ba50ccaa6a14b04a9c188a6436de6c`

See more details on using hashes here.

postcrawl 1.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PostCrawl Python SDK

Features

Installation

Using uv (Recommended)

Using pip

Optional: Environment Variables

Requirements

Quick Start

Async Usage (Recommended)

Synchronous Usage

API Reference

Search

Extract

Search and Extract

Comment Filtering

Synchronous Methods

Examples

Response Models

SearchResult

ExtractedPost

Working with Platform-Specific Types

Error Handling

Development

Quick Development Setup

Available Commands

API Key Management

Environment Variables (Recommended)

Security Best Practices

Rate Limits & Credits

Support

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes