Reddit Extraction and Data Dumper — a modern, async-ready library for extracting Reddit data without API keys.
Project description
REDD
Reddit Extraction and Data Dumper
A modern, async-ready Python library for extracting Reddit data. No API keys required.
Table of Contents
- Features
- Installation
- Quick Start
- API Reference
- Architecture
- Examples
- Contributing
- Disclaimer
- License
1. Features
- No API keys — uses Reddit's public
.jsonendpoints. - Sync and async — choose
ReddorAsyncRedddepending on your stack. - Typed models — frozen dataclasses instead of raw dictionaries.
- Hexagonal architecture — swap HTTP adapters without touching business logic.
- Auto-pagination — fetch hundreds of posts with a single call.
- User-Agent rotation — built-in rotation to reduce ban risk.
- Proxy support — pass a proxy URL and scrape at scale.
- Throttling — configurable random sleep between paginated requests.
2. Installation
With uv (recommended):
uv add redd
With pip:
pip install redd
For async support (requires httpx):
uv add redd httpx
3. Quick Start
3.1. Synchronous usage
from redd import Redd, Category, TimeFilter
with Redd() as r:
# Search Reddit
results = r.search("Python programming", limit=5)
for item in results:
print(f" {item.title}")
# Fetch top posts from a subreddit
posts = r.get_subreddit_posts(
"Python",
limit=10,
category=Category.TOP,
time_filter=TimeFilter.WEEK,
)
for post in posts:
print(f" [{post.score:>5}] {post.title}")
# Get full post details with comments
detail = r.get_post("/r/Python/comments/abc123/example_post/")
print(f" {detail.title} -- {len(detail.comments)} comments")
# Scrape user activity
items = r.get_user("spez", limit=10)
for item in items:
print(f" [{item.kind}] {item.title or item.body[:80]}")
3.2. Asynchronous usage
import asyncio
from redd import AsyncRedd
async def main():
async with AsyncRedd() as r:
results = await r.search("machine learning", limit=5)
for item in results:
print(item.title)
asyncio.run(main())
3.3. Configuration
r = Redd(
proxy="http://user:pass@host:port", # Optional proxy
timeout=15.0, # Request timeout in seconds
rotate_user_agent=True, # Rotate UA per request
throttle=(1.0, 3.0), # Random sleep range between pages
)
4. API Reference
4.1. Clients
| Class | Description |
|---|---|
Redd |
Synchronous client (requests) |
AsyncRedd |
Asynchronous client (httpx) |
Both clients support context managers and expose the same API surface.
4.2. Methods
| Method | Description |
|---|---|
search(query, *, limit, sort, after, before) |
Search all of Reddit |
search_subreddit(subreddit, query, *, limit, sort, after, before) |
Search within a subreddit |
get_post(permalink) |
Get full post details and comment tree |
get_user(username, *, limit) |
Get a user's recent activity |
get_subreddit_posts(subreddit, *, limit, category, time_filter) |
Fetch subreddit listings |
get_user_posts(username, *, limit, category, time_filter) |
Fetch a user's submitted posts |
download_image(image_url, *, output_dir) |
Download an image |
close() |
Release HTTP resources |
4.3. Models
All models are frozen dataclasses.
| Model | Fields |
|---|---|
SearchResult |
title, url, description, subreddit |
PostDetail |
title, author, body, score, url, subreddit, created_utc, num_comments, comments |
Comment |
author, body, score, replies |
SubredditPost |
title, author, permalink, score, num_comments, created_utc, subreddit, url, image_url, thumbnail_url |
UserItem |
kind, subreddit, url, created_utc, title, body |
4.4. Enums
| Enum | Values |
|---|---|
Category |
HOT, TOP, NEW, RISING |
UserCategory |
HOT, TOP, NEW |
TimeFilter |
HOUR, DAY, WEEK, MONTH, YEAR, ALL |
SortOrder |
RELEVANCE, HOT, TOP, NEW, COMMENTS |
4.5. Exceptions
| Exception | Description |
|---|---|
ReddError |
Base exception for all REDD errors |
HttpError |
HTTP request failed after retries |
ParseError |
Reddit's JSON could not be parsed into domain models |
NotFoundError |
Requested resource does not exist |
5. Architecture
REDD follows hexagonal architecture (ports and adapters), separating business logic from I/O concerns:
graph LR
subgraph Public API
A["Redd (sync)"]
B["AsyncRedd (async)"]
end
subgraph Core
C["Parsing Layer"]
D["Domain Models"]
E["Enums"]
end
subgraph Ports
F["HttpPort"]
G["AsyncHttpPort"]
end
subgraph Adapters
H["RequestsHttpAdapter"]
I["HttpxAsyncAdapter"]
end
A --> C
B --> C
C --> D
C --> E
A --> F
B --> G
F -.implements.-> H
G -.implements.-> I
H --> J["reddit.com"]
I --> J
Directory layout
src/redd/
├── __init__.py # Public API surface
├── _client.py # Sync client (Redd)
├── _async_client.py # Async client (AsyncRedd)
├── _parsing.py # JSON to domain model parsing (I/O-free)
├── _exceptions.py # Error hierarchy
│
├── domain/ # Pure domain layer
│ ├── models.py # Frozen dataclasses
│ └── enums.py # Type-safe enumerations
│
├── ports/ # Abstract interfaces
│ └── http.py # HttpPort and AsyncHttpPort protocols
│
└── adapters/ # Concrete implementations
├── http_sync.py # requests-based adapter
└── http_async.py # httpx-based adapter
The parsing module has no I/O dependencies. Clients interact with the HTTP layer exclusively through protocol-based ports, making it straightforward to swap adapters, mock dependencies in tests, or add new transports.
6. Examples
See the examples/ directory for runnable scripts.
Fetch hot posts from a subreddit (subreddit_hot_posts.py):
from redd import Category, Redd
with Redd() as r:
posts = r.get_subreddit_posts("brdev", limit=10, category=Category.HOT)
for i, post in enumerate(posts, 1):
print(f"{i:>2}. [{post.score:>5}] {post.title}")
print(f" by u/{post.author} — {post.num_comments} comments")
print(f" {post.url}")
print()
Sample output:
1. [ 91] Qual o plano B de vocês caso a área piore muito?
by u/Spiritual_Pangolin18 — 185 comments
https://www.reddit.com/r/brdev/comments/1rnytuh/...
2. [ 83] Fuçando minhas coisas, encontrei um código de 600 linhas em Portugol
by u/Dramatic-Revenue-802 — 7 comments
https://www.reddit.com/r/brdev/comments/1ro269a/...
7. Contributing
Contributions are welcome. Please read CONTRIBUTING.md for guidelines on setting up the project, running tests, and submitting changes.
8. Disclaimer
Use responsibly. Reddit may rate-limit or ban IPs that make excessive requests. Consider using rotating proxies for large-scale scraping.
9. License
MIT. See LICENSE for details.
Copyright (c) 2025 Elias Biondo
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redd-0.1.2.tar.gz.
File metadata
- Download URL: redd-0.1.2.tar.gz
- Upload date:
- Size: 43.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d65a1df4e716b3225af58d5f564be3585ba1750134cdbe27aac6a84812d2b9e
|
|
| MD5 |
aefa0c5d401bf8206c3033808e4134a4
|
|
| BLAKE2b-256 |
cf4d0ddb7c1850f63093c2b01e1880efdf35cdc027e4346950bb88ad1c52de8c
|
File details
Details for the file redd-0.1.2-py3-none-any.whl.
File metadata
- Download URL: redd-0.1.2-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1de862b7d23668479d00ba690289d6bf2ea048ec779bc75b07accbb1a3a66504
|
|
| MD5 |
36212b9c4aa7d193e49a5f16a9b34d27
|
|
| BLAKE2b-256 |
99ec4655f7854a2ae35059c7050631cdda03c7c4f7e432c4231f95b3a9a22e73
|