Simple test package
Project description
Scrapeddit
Overview
Scrapeddit is a Python class designed for scraping images from Reddit subreddits and creating PyTorch datasets. It facilitates the collection of image data from various subreddits, allowing for easy integration into machine learning pipelines or data analysis projects.
Key Features
- Reddit Scraping: Automatically retrieves image URLs from specified subreddits using the PRAW library.
- Flexible Configuration: Users can customize parameters such as subreddit names, post limits, sorting methods, and content safety filters.
- Data Transformation: Supports image transformation and resizing to fit specific requirements.
- Error Handling: Handles invalid subreddits, restricted subreddits, and failed image fetching gracefully, ensuring smooth data collection.
- Data Visualization: Provides visualization tools to understand the distribution of data sources across different subreddits.
Usage
- Initialization: Instantiate the ScrapeditDataset class with a list of subreddit names and optional parameters for customization.
- Data Loading: Access the dataset like any other PyTorch dataset, allowing for seamless integration into machine learning workflows.
- Data Analysis: Use the provided visualization functions to gain insights into the distribution of data sources and explore the collected dataset.
- Model Training: Utilize the ScrapeditDataset as a DataLoader for training machine learning models. Integrate it with PyTorch's DataLoader for efficient batch processing and model training.
Requirements
- Python 3.x
- PRAW
- pandas
- requests
- matplotlib
- Pillow
- torch
- torchvision
- tqdm
Acknowledgements
- PRAW: The Python Reddit API Wrapper
- tqdm: A Fast, Extensible Progress Bar for Python and CLI
- Pillow: The Python Imaging Library
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapeddit-0.0.4.tar.gz.
File metadata
- Download URL: scrapeddit-0.0.4.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d53a96cdd17b85fad14fa6d2b2c9272ea3ab287cd5843f4a837fae5ed4101dcc
|
|
| MD5 |
819341f29a593a9d1ceb0297d729c84d
|
|
| BLAKE2b-256 |
6e58925ffa0d4982672ae911eb28a345efbc20479ed4536a545799adbd4ee431
|
File details
Details for the file scrapeddit-0.0.4-py3-none-any.whl.
File metadata
- Download URL: scrapeddit-0.0.4-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8e312a01cd73c78eccdd7f5ace658dbf8e5683d8898fe83e76841e744b933f2
|
|
| MD5 |
5946c2ec6864b1bc264c2d7ccc180101
|
|
| BLAKE2b-256 |
60ecb0d58b709a0d3fcf7c6d238d393373d5b4f1567705f97aac582de25d9ea6
|