A Python tool for scraping images, galleries, and comments from Reddit using browser cookies.
Project description
RedditMiner: Subreddit Image Scraper
RedditMiner is a lightweight, open-source Python tool for scraping image and gallery URLs from any public or private subreddit using your browser session cookies. No Reddit API credentials required—works even for NSFW and restricted subreddits.
Features
- Cookie Authentication: Uses your browser session cookies for seamless access.
- Image & Gallery Support: Extracts direct image links and all images from Reddit galleries.
- Deep Pagination: Efficiently fetches large numbers of posts using Reddit's pagination.
- Command-Line Interface: Specify the subreddit and options directly via command line.
Installation
-
Clone the repository
git clone https://github.com/MisbahKhan0009/RedditMiner.git cd RedditMiner
-
Install dependencies
pip install requests
-
Export your Reddit cookies
- Log into Reddit in your browser.
- Use a browser extension like "EditThisCookie" or "Get cookies.txt" to export your cookies for
reddit.com. - Save the exported file as
cookies.txtin the project root directory.
Usage
Run the scraper with your desired subreddit:
python main.py --subreddit EarthPorn
Optional arguments
--limit: Number of posts to scrape (default: 100)--sort: Sort order (new,hot,top, etc.; default:new)--output-mode: Output format. Options:post(default): Full post data (JSON)image_url: Only image URLs (from bothimage_urlandgallery_imagesfields, TXT file)post_with_comments: Full post data with comments (JSON, same aspostif--with-commentis not set)
--with-comment: Include top-level comments for each post (JSON output modes only). Comments from "AutoModerator" are automatically skipped.
Rate Limiting: If Reddit returns a 429 (Too Many Requests) error, the scraper will automatically slow down and retry after 60 seconds. This helps avoid being blocked by Reddit's rate limits. For best results, avoid running multiple scrapes in parallel and consider using a fresh set of cookies if you encounter repeated rate limiting.
Examples:
Scrape 200 top posts and save as JSON:
python main.py --subreddit funny --limit 200 --sort top
Scrape only image URLs (TXT file):
python main.py --subreddit funny --output-mode image_url
Scrape posts with top-level comments included (JSON):
python main.py --subreddit funny --output-mode post --with-comment
Each post in the output JSON will have a comments field containing a list of top-level comments (author, body, score, created_utc). Comments from "AutoModerator" are excluded.
Scrape and immediately download all images:
python main.py --subreddit funny --output-mode image_url --download-images
You can customize the download directory and parallelism:
python main.py --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16
Downloaded images are automatically organized by subreddit:
- For example, images from r/EarthPorn will be saved in
images/EarthPorn/by default. - If you specify a custom output directory, images will be saved in
<output-dir>/<subreddit>/.
Results are saved as:
- JSON:
output/images_[subreddit]_[timestamp].json - TXT (image URLs):
output/images_[subreddit]_[timestamp].txt - Downloaded images: in
images/<subreddit>/(or<output-dir>/<subreddit>/if specified)
Project Structure
RedditMiner/
│
├── redditminer/
│ ├── __init__.py
│ └── scraper.py # Core scraping logic and RedditImageScraper class
│
├── main.py # Command-line entry point
├── cookies.txt # Your exported Reddit cookies
├── README.md
└── ...
Contributing
Contributions are welcome! Please open issues or submit pull requests for new features, bug fixes, or improvements.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Disclaimer
This tool is intended for personal and educational use. Please respect Reddit's Terms of Service and do not use this tool for spamming or violating site rules.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redditminer-1.0.0.tar.gz.
File metadata
- Download URL: redditminer-1.0.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11de1bab2526a1b3631caff880bb5855050112c3aa4e2e8ddf276b3b667ce539
|
|
| MD5 |
f1688173e97785471c8a5bc5a8c9fea5
|
|
| BLAKE2b-256 |
0217d3191be5ae9012e0fa247693c2b6833d6927dcbcfcacdb91edaba88a4153
|
File details
Details for the file redditminer-1.0.0-py3-none-any.whl.
File metadata
- Download URL: redditminer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
907e9d665ff17315c374361a1730bc8f3d9927e9d9595e56583a375698645595
|
|
| MD5 |
f936f28a6b0e348b4cd74b2bb18404bf
|
|
| BLAKE2b-256 |
59594a9d0c88174b78cb198fb88c2c1ef1caa91100db4e1e27ca45d109dfcd72
|