Skip to main content

Reddit Extraction and Data Dumper — a modern, async-ready library for extracting Reddit data without API keys.

Project description

REDD

Reddit Extraction and Data Dumper

PyPI version Python CI License: MIT

A modern, async-ready Python library for extracting Reddit data. No API keys required.


Table of Contents

  1. Features
  2. Installation
  3. Quick Start
  4. API Reference
  5. Architecture
  6. Examples
  7. Contributing
  8. Disclaimer
  9. License

1. Features

  • No API keys — uses Reddit's public .json endpoints.
  • Sync and async — choose Redd or AsyncRedd depending on your stack.
  • Typed models — frozen dataclasses instead of raw dictionaries.
  • Hexagonal architecture — swap HTTP adapters without touching business logic.
  • Auto-pagination — fetch hundreds of posts with a single call.
  • User-Agent rotation — built-in rotation to reduce ban risk.
  • Proxy support — pass a proxy URL and scrape at scale.
  • Throttling — configurable random sleep between paginated requests.

2. Installation

With uv (recommended):

uv add redd

With pip:

pip install redd

For async support (requires httpx):

uv add redd httpx

3. Quick Start

3.1. Synchronous usage

from redd import Redd, Category, TimeFilter

with Redd() as r:
    # Search Reddit
    results = r.search("Python programming", limit=5)
    for item in results:
        print(f"  {item.title}")

    # Fetch top posts from a subreddit
    posts = r.get_subreddit_posts(
        "Python",
        limit=10,
        category=Category.TOP,
        time_filter=TimeFilter.WEEK,
    )
    for post in posts:
        print(f"  [{post.score:>5}] {post.title}")

    # Get full post details with comments
    detail = r.get_post("/r/Python/comments/abc123/example_post/")
    print(f"  {detail.title} -- {len(detail.comments)} comments")

    # Scrape user activity
    items = r.get_user("spez", limit=10)
    for item in items:
        print(f"  [{item.kind}] {item.title or item.body[:80]}")

3.2. Asynchronous usage

import asyncio
from redd import AsyncRedd

async def main():
    async with AsyncRedd() as r:
        results = await r.search("machine learning", limit=5)
        for item in results:
            print(item.title)

asyncio.run(main())

3.3. Configuration

r = Redd(
    proxy="http://user:pass@host:port",  # Optional proxy
    timeout=15.0,                        # Request timeout in seconds
    rotate_user_agent=True,              # Rotate UA per request
    throttle=(1.0, 3.0),                 # Random sleep range between pages
)

4. API Reference

4.1. Clients

Class Description
Redd Synchronous client (requests)
AsyncRedd Asynchronous client (httpx)

Both clients support context managers and expose the same API surface.

4.2. Methods

Method Description
search(query, *, limit, sort, after, before) Search all of Reddit
search_subreddit(subreddit, query, *, limit, sort, after, before) Search within a subreddit
get_post(permalink) Get full post details and comment tree
get_user(username, *, limit) Get a user's recent activity
get_subreddit_posts(subreddit, *, limit, category, time_filter) Fetch subreddit listings
get_user_posts(username, *, limit, category, time_filter) Fetch a user's submitted posts
download_image(image_url, *, output_dir) Download an image
close() Release HTTP resources

4.3. Models

All models are frozen dataclasses.

Model Fields
SearchResult title, url, description, subreddit
PostDetail title, author, body, score, url, subreddit, created_utc, num_comments, comments
Comment author, body, score, replies
SubredditPost title, author, permalink, score, num_comments, created_utc, subreddit, url, image_url, thumbnail_url
UserItem kind, subreddit, url, created_utc, title, body

4.4. Enums

Enum Values
Category HOT, TOP, NEW, RISING
UserCategory HOT, TOP, NEW
TimeFilter HOUR, DAY, WEEK, MONTH, YEAR, ALL
SortOrder RELEVANCE, HOT, TOP, NEW, COMMENTS

4.5. Exceptions

Exception Description
ReddError Base exception for all REDD errors
HttpError HTTP request failed after retries
ParseError Reddit's JSON could not be parsed into domain models
NotFoundError Requested resource does not exist

5. Architecture

REDD follows hexagonal architecture (ports and adapters), separating business logic from I/O concerns:

graph LR
    subgraph Public API
        A["Redd (sync)"]
        B["AsyncRedd (async)"]
    end

    subgraph Core
        C["Parsing Layer"]
        D["Domain Models"]
        E["Enums"]
    end

    subgraph Ports
        F["HttpPort"]
        G["AsyncHttpPort"]
    end

    subgraph Adapters
        H["RequestsHttpAdapter"]
        I["HttpxAsyncAdapter"]
    end

    A --> C
    B --> C
    C --> D
    C --> E
    A --> F
    B --> G
    F -.implements.-> H
    G -.implements.-> I
    H --> J["reddit.com"]
    I --> J

Directory layout

src/redd/
├── __init__.py           # Public API surface
├── _client.py            # Sync client (Redd)
├── _async_client.py      # Async client (AsyncRedd)
├── _parsing.py           # JSON to domain model parsing (I/O-free)
├── _exceptions.py        # Error hierarchy
│
├── domain/               # Pure domain layer
│   ├── models.py         # Frozen dataclasses
│   └── enums.py          # Type-safe enumerations
│
├── ports/                # Abstract interfaces
│   └── http.py           # HttpPort and AsyncHttpPort protocols
│
└── adapters/             # Concrete implementations
    ├── http_sync.py      # requests-based adapter
    └── http_async.py     # httpx-based adapter

The parsing module has no I/O dependencies. Clients interact with the HTTP layer exclusively through protocol-based ports, making it straightforward to swap adapters, mock dependencies in tests, or add new transports.


6. Examples

See the examples/ directory for runnable scripts.

Fetch hot posts from a subreddit (subreddit_hot_posts.py):

from redd import Category, Redd

with Redd() as r:
    posts = r.get_subreddit_posts("brdev", limit=10, category=Category.HOT)

    for i, post in enumerate(posts, 1):
        print(f"{i:>2}. [{post.score:>5}] {post.title}")
        print(f"     by u/{post.author}{post.num_comments} comments")
        print(f"     {post.url}")
        print()

Sample output:

 1. [   91] Qual o plano B de vocês caso a área piore muito?
     by u/Spiritual_Pangolin18 — 185 comments
     https://www.reddit.com/r/brdev/comments/1rnytuh/...

 2. [   83] Fuçando minhas coisas, encontrei um código de 600 linhas em Portugol
     by u/Dramatic-Revenue-802 — 7 comments
     https://www.reddit.com/r/brdev/comments/1ro269a/...

7. Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines on setting up the project, running tests, and submitting changes.


8. Disclaimer

Use responsibly. Reddit may rate-limit or ban IPs that make excessive requests. Consider using rotating proxies for large-scale scraping.


9. License

MIT. See LICENSE for details.

Copyright (c) 2025 Elias Biondo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redd-0.1.2.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redd-0.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file redd-0.1.2.tar.gz.

File metadata

  • Download URL: redd-0.1.2.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for redd-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1d65a1df4e716b3225af58d5f564be3585ba1750134cdbe27aac6a84812d2b9e
MD5 aefa0c5d401bf8206c3033808e4134a4
BLAKE2b-256 cf4d0ddb7c1850f63093c2b01e1880efdf35cdc027e4346950bb88ad1c52de8c

See more details on using hashes here.

File details

Details for the file redd-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: redd-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for redd-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1de862b7d23668479d00ba690289d6bf2ea048ec779bc75b07accbb1a3a66504
MD5 36212b9c4aa7d193e49a5f16a9b34d27
BLAKE2b-256 99ec4655f7854a2ae35059c7050631cdda03c7c4f7e432c4231f95b3a9a22e73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page