Skip to main content

Reddit Extraction and Data Dumper — a modern, async-ready library for extracting Reddit data without API keys.

Project description

REDD

Reddit Extraction and Data Dumper

PyPI License: MIT

A modern, async-ready Python library for extracting Reddit data. No API keys required.


Table of Contents

  1. Features
  2. Installation
  3. Quick Start
  4. API Reference
  5. Architecture
  6. Examples
  7. Contributing
  8. Disclaimer
  9. License

1. Features

  • No API keys — uses Reddit's public .json endpoints.
  • Sync and async — choose Redd or AsyncRedd depending on your stack.
  • Typed models — frozen dataclasses instead of raw dictionaries.
  • Hexagonal architecture — swap HTTP adapters without touching business logic.
  • Auto-pagination — fetch hundreds of posts with a single call.
  • User-Agent rotation — built-in rotation to reduce ban risk.
  • Proxy support — pass a proxy URL and scrape at scale.
  • Throttling — configurable random sleep between paginated requests.

2. Installation

With uv (recommended):

uv add redd

With pip:

pip install redd

For async support (requires httpx):

uv add redd httpx

3. Quick Start

3.1. Synchronous usage

from redd import Redd, Category, TimeFilter

with Redd() as r:
    # Search Reddit
    results = r.search("Python programming", limit=5)
    for item in results:
        print(f"  {item.title}")

    # Fetch top posts from a subreddit
    posts = r.get_subreddit_posts(
        "Python",
        limit=10,
        category=Category.TOP,
        time_filter=TimeFilter.WEEK,
    )
    for post in posts:
        print(f"  [{post.score:>5}] {post.title}")

    # Get full post details with comments
    detail = r.get_post("/r/Python/comments/abc123/example_post/")
    print(f"  {detail.title} -- {len(detail.comments)} comments")

    # Scrape user activity
    items = r.get_user("spez", limit=10)
    for item in items:
        print(f"  [{item.kind}] {item.title or item.body[:80]}")

3.2. Asynchronous usage

import asyncio
from redd import AsyncRedd

async def main():
    async with AsyncRedd() as r:
        results = await r.search("machine learning", limit=5)
        for item in results:
            print(item.title)

asyncio.run(main())

3.3. Configuration

r = Redd(
    proxy="http://user:pass@host:port",  # Optional proxy
    timeout=15.0,                        # Request timeout in seconds
    rotate_user_agent=True,              # Rotate UA per request
    throttle=(1.0, 3.0),                 # Random sleep range between pages
)

4. API Reference

4.1. Clients

Class Description
Redd Synchronous client (requests)
AsyncRedd Asynchronous client (httpx)

Both clients support context managers and expose the same API surface.

4.2. Methods

Method Description
search(query, *, limit, sort, after, before) Search all of Reddit
search_subreddit(subreddit, query, *, limit, sort, after, before) Search within a subreddit
get_post(permalink) Get full post details and comment tree
get_user(username, *, limit) Get a user's recent activity
get_subreddit_posts(subreddit, *, limit, category, time_filter) Fetch subreddit listings
get_user_posts(username, *, limit, category, time_filter) Fetch a user's submitted posts
download_image(image_url, *, output_dir) Download an image
close() Release HTTP resources

4.3. Models

All models are frozen dataclasses.

Model Fields
SearchResult title, url, description, subreddit
PostDetail title, author, body, score, url, subreddit, created_utc, num_comments, comments
Comment author, body, score, replies
SubredditPost title, author, permalink, score, num_comments, created_utc, subreddit, url, image_url, thumbnail_url
UserItem kind, subreddit, url, created_utc, title, body

4.4. Enums

Enum Values
Category HOT, TOP, NEW, RISING
UserCategory HOT, TOP, NEW
TimeFilter HOUR, DAY, WEEK, MONTH, YEAR, ALL
SortOrder RELEVANCE, HOT, TOP, NEW, COMMENTS

4.5. Exceptions

Exception Description
ReddError Base exception for all REDD errors
HttpError HTTP request failed after retries
ParseError Reddit's JSON could not be parsed into domain models
NotFoundError Requested resource does not exist

5. Architecture

REDD follows hexagonal architecture (ports and adapters), separating business logic from I/O concerns:

graph LR
    subgraph Public API
        A["Redd (sync)"]
        B["AsyncRedd (async)"]
    end

    subgraph Core
        C["Parsing Layer"]
        D["Domain Models"]
        E["Enums"]
    end

    subgraph Ports
        F["HttpPort"]
        G["AsyncHttpPort"]
    end

    subgraph Adapters
        H["RequestsHttpAdapter"]
        I["HttpxAsyncAdapter"]
    end

    A --> C
    B --> C
    C --> D
    C --> E
    A --> F
    B --> G
    F -.implements.-> H
    G -.implements.-> I
    H --> J["reddit.com"]
    I --> J

Directory layout

src/redd/
├── __init__.py           # Public API surface
├── _client.py            # Sync client (Redd)
├── _async_client.py      # Async client (AsyncRedd)
├── _parsing.py           # JSON to domain model parsing (I/O-free)
├── _exceptions.py        # Error hierarchy
│
├── domain/               # Pure domain layer
│   ├── models.py         # Frozen dataclasses
│   └── enums.py          # Type-safe enumerations
│
├── ports/                # Abstract interfaces
│   └── http.py           # HttpPort and AsyncHttpPort protocols
│
└── adapters/             # Concrete implementations
    ├── http_sync.py      # requests-based adapter
    └── http_async.py     # httpx-based adapter

The parsing module has no I/O dependencies. Clients interact with the HTTP layer exclusively through protocol-based ports, making it straightforward to swap adapters, mock dependencies in tests, or add new transports.


6. Examples

See the examples/ directory for runnable scripts.

Fetch hot posts from a subreddit (subreddit_hot_posts.py):

from redd import Category, Redd

with Redd() as r:
    posts = r.get_subreddit_posts("brdev", limit=10, category=Category.HOT)

    for i, post in enumerate(posts, 1):
        print(f"{i:>2}. [{post.score:>5}] {post.title}")
        print(f"     by u/{post.author}{post.num_comments} comments")
        print(f"     {post.url}")
        print()

Sample output:

 1. [   91] Qual o plano B de vocês caso a área piore muito?
     by u/Spiritual_Pangolin18 — 185 comments
     https://www.reddit.com/r/brdev/comments/1rnytuh/...

 2. [   83] Fuçando minhas coisas, encontrei um código de 600 linhas em Portugol
     by u/Dramatic-Revenue-802 — 7 comments
     https://www.reddit.com/r/brdev/comments/1ro269a/...

7. Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines on setting up the project, running tests, and submitting changes.


8. Disclaimer

Use responsibly. Reddit may rate-limit or ban IPs that make excessive requests. Consider using rotating proxies for large-scale scraping.


9. License

MIT. See LICENSE for details.

Copyright (c) 2025 Elias Biondo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redd-0.2.0.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redd-0.2.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file redd-0.2.0.tar.gz.

File metadata

  • Download URL: redd-0.2.0.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for redd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e25f3cd0cfa2405b5cacdacdff116e1e8204a217afad228c0a6f3e311f656154
MD5 c1b48d5d505af8be39017297909fb680
BLAKE2b-256 bd16eab8755597bfbccc097e5d2de164e4c2b34620724376b4bf3ebb2220fe3c

See more details on using hashes here.

File details

Details for the file redd-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: redd-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for redd-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1b8d0aeaa43890d58c0ea05c50bd0e27b93eb6da977259f4faf8ff95887ef3d
MD5 4e57e5b6f9f099f3db21db3f5b0cad09
BLAKE2b-256 8b97f8c3533e2d88c1609f8302a326bd7f2f34879fb7cd3755cd5e5789c6ea94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page