Skip to main content

Simple test package

Project description

Scrapeddit

Overview

Scrapeddit is a Python class designed for scraping images from Reddit subreddits and creating PyTorch datasets. It facilitates the collection of image data from various subreddits, allowing for easy integration into machine learning pipelines or data analysis projects.

Key Features

  • Reddit Scraping: Automatically retrieves image URLs from specified subreddits using the PRAW library.
  • Flexible Configuration: Users can customize parameters such as subreddit names, post limits, sorting methods, and content safety filters.
  • Data Transformation: Supports image transformation and resizing to fit specific requirements.
  • Error Handling: Handles invalid subreddits, restricted subreddits, and failed image fetching gracefully, ensuring smooth data collection.
  • Data Visualization: Provides visualization tools to understand the distribution of data sources across different subreddits.

Usage

  1. Initialization: Instantiate the ScrapeditDataset class with a list of subreddit names and optional parameters for customization.
  2. Data Loading: Access the dataset like any other PyTorch dataset, allowing for seamless integration into machine learning workflows.
  3. Data Analysis: Use the provided visualization functions to gain insights into the distribution of data sources and explore the collected dataset.
  4. Model Training: Utilize the ScrapeditDataset as a DataLoader for training machine learning models. Integrate it with PyTorch's DataLoader for efficient batch processing and model training.

Requirements

  • Python 3.x
  • PRAW
  • pandas
  • requests
  • matplotlib
  • Pillow
  • torch
  • torchvision
  • tqdm

Acknowledgements

  • PRAW: The Python Reddit API Wrapper
  • tqdm: A Fast, Extensible Progress Bar for Python and CLI
  • Pillow: The Python Imaging Library

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapeddit-0.3.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapeddit-0.3.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file scrapeddit-0.3.0.tar.gz.

File metadata

  • Download URL: scrapeddit-0.3.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for scrapeddit-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9e045fc9d8cbc92b26e2076fbb091168b856bc235adcf055d282f69ae79a9d8a
MD5 d76067ba5b24dc7d58d9ba42cc1466ab
BLAKE2b-256 1bffcd2fb9380c7e7b1ce6cfee745f3753813950ef1d3b5740ce5071f9821c21

See more details on using hashes here.

File details

Details for the file scrapeddit-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: scrapeddit-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for scrapeddit-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e1f6e380f1fd1b9921f0a0c1e06ec16dd8636a4d9e2808b71f13e29b3c0e240
MD5 3bb53c31d633085b07b73b3e8fe89894
BLAKE2b-256 0c04995b9e5f462e97db2a0c832c456d15c3f4b90b0409ecd91ea837f0f583e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page