Skip to main content

Take snowball samples of Reddit data

Project description

sampleReddit 🫴

A streamlined interface for generating snowball samples of Reddit data.

Snowball sampling is a data collection method that starts with a small set of seeds and iteratively collects data from their connections. This method is particularly useful for collecting data from social media platforms, where the connections between users and communities are often of primary interest. sampleReddit also outputs full documentation of each sampling process.

Installation

sampleReddit can be installed from PyPI using pip:

pip install sampleReddit

Quick Start

An annotated example of how to go from a list of seed subreddits to a snowball sample of Reddit comments can be found in this script.

Usage

The core functionality of sampleReddit resides in the sample_reddit function:

import sampleReddit as sr

sampling_frame, users_df = sr.sample_reddit(
    api_instance=instance,
    seed_subreddits=["politics", "news"],
    post_filter="top",
    time_period="year",
    n_posts="3",
    log_file_path="path/to/log/file.log",
)

The above function will conduct a snowball sample of Reddit users by collecting the top 3 posts from the "politics" and "news" subreddits from the past year and then the usernames of all the users who commented on those posts. The function returns two things:

  1. A Python dictionary object that documents the sampling frame. It maps subreddits to posts and posts to comments.
  2. A pandas DataFrame with a single column called "users" that lists the users who were sampled.

The library also provides lower-level functions that only sample posts from a subreddit, or comments from a list of posts IDs. For a full list of functions, see the documentation.

Note: Any access to the Reddit API requires an application that is registered with Reddit via their developer portal. Once your app is registered the setup_access function can be used to create an authenticated Reddit API instance. For instructions on how to set up a registered Reddit API application, refer to this guide.[^1]

[^1]: You will need a regular Reddit user account to complete the app authentication setup.

Testing is performed on Python 3.10, but everything should work on 3.6 or later.

Documentation

Full package documentation can be found in this repo's wiki.

Acknowledgments

sampleReddit is built on top of the PRAW (Python Reddit API Wrapper) library, which provides a comprehensive and flexible interface for the Reddit API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samplereddit-0.1.5.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

samplereddit-0.1.5-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file samplereddit-0.1.5.tar.gz.

File metadata

  • Download URL: samplereddit-0.1.5.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for samplereddit-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7a2a7f39ad2d131f5e1451d7840db95799e7c4c6dc5976a76ee745197f94e645
MD5 78177436c163018dca4e2656485c1162
BLAKE2b-256 fc10ffbf8e7d7e21d7962cf633a466504daed6e373e19394694aea3c11fea8d1

See more details on using hashes here.

File details

Details for the file samplereddit-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for samplereddit-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 62379fd6c10b10e342e63fa1e68c2b3df59b77a786c2425b009061d87a4704a8
MD5 766b730a6d7d30108dce94f213469bd1
BLAKE2b-256 349681f5c995c80e3a449726f47cae06cd18b4acfef5d7ddab2c62622bd4bc7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page