Skip to main content

Tool for extracting reddit comments

Project description

Reddit Comments Analyzer

Build Status Python 3.4 Python 3.5 Python 3.6 License: MIT

General package for reddit comments analysis, data manipulation, and other areas

Install Instructions

pip install .

or for released version:

pip install reddit-extract

How to Use

Required:

  • Reddit Client ID
  • Reddit Client Secret
  • Reddit User Agent
  • Subreddit
  • List of Thread IDs for bulk extract of multiple Reddit threads
  • Dictionary (Should match your regex search pattern, otherwise your headers will not match the data retrieved, for example: a defined copy/paste form for a reddit thread that users reply to, aka "Megathreads")
  • Regex Pattern (for csv)

Extracting all comments for list of threads to csv with defined headers and search pattern:

import reddit_extract
reddit_extract.extract_comments_csv_bulk(<Reddit ClientID>, <Reddit ClientSecret>, <Reddit User Agent>, <Subreddit>, <Thread IDs>, <Dictionary Headers>)

Extracting all comments for list of threads to txt:

import reddit_extract
reddit_extract.extract_comments_txt_bulk(<Reddit ClientID>, <Reddit ClientSecret>, <Reddit User Agent>, <Subreddit>, <Thread IDs>)

Example:

import reddit_extract
threads = ['aw79c5', 'b7x7n1', 'am5uk7', 'bji681', 'abv2gl', '9klf8e']
search_pattern = r'Form: (.*)\n*Entity: (.*)\n*Pending: (.*)\n*Approved: (.*)\n*Standardized wait: (.*)\n*STATE: (.*)'
headers = {'Form': 'Form', 'Entity': 'Entity', 'Pending': 'Pending', 'Approved': 'Approved', 'Standardized Wait': 'Standardized Wait', 'STATE': 'STATE'}
reddit_extract.extract_comments_csv_bulk(<client_id>, <client_secret>, <user_agent>, 'nfa', threads, headers, search_pattern)

Tests

python setup.py test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for reddit-extract, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size reddit_extract-0.2.0.tar.gz (6.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page