Skip to main content

Tool for extracting reddit comments

Project description

Reddit Comments Analyzer

Build Status Python 3.4 Python 3.5 Python 3.6 License: MIT

General package for reddit comments analysis, data manipulation, and other areas

Install Instructions

pip install .

or for released version:

pip install reddit-extract

How to Use

Required:

  • Reddit Client ID
  • Reddit Client Secret
  • Reddit User Agent
  • Subreddit
  • List of Thread IDs for bulk extract of multiple Reddit threads
  • Dictionary (Should match your regex search pattern, otherwise your headers will not match the data retrieved, for example: a defined copy/paste form for a reddit thread that users reply to, aka "Megathreads")
  • Regex Pattern (for csv)

Extracting all comments for list of threads to csv with defined headers and search pattern:

import reddit_extract
reddit_extract.extract_comments_csv_bulk(<Reddit ClientID>, <Reddit ClientSecret>, <Reddit User Agent>, <Subreddit>, <Thread IDs>, <Dictionary Headers>)

Extracting all comments for list of threads to txt:

import reddit_extract
reddit_extract.extract_comments_txt_bulk(<Reddit ClientID>, <Reddit ClientSecret>, <Reddit User Agent>, <Subreddit>, <Thread IDs>)

Example:

import reddit_extract
threads = ['aw79c5', 'b7x7n1', 'am5uk7', 'bji681', 'abv2gl', '9klf8e']
search_pattern = r'Form: (.*)\n*Entity: (.*)\n*Pending: (.*)\n*Approved: (.*)\n*Standardized wait: (.*)\n*STATE: (.*)'
headers = {'Form': 'Form', 'Entity': 'Entity', 'Pending': 'Pending', 'Approved': 'Approved', 'Standardized Wait': 'Standardized Wait', 'STATE': 'STATE'}
reddit_extract.extract_comments_csv_bulk(<client_id>, <client_secret>, <user_agent>, 'nfa', threads, headers, search_pattern)

Tests

python setup.py test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reddit_extract-0.2.0.tar.gz (6.2 kB view details)

Uploaded Source

File details

Details for the file reddit_extract-0.2.0.tar.gz.

File metadata

  • Download URL: reddit_extract-0.2.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.5.6

File hashes

Hashes for reddit_extract-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0abf19cfac08d2ad4552a4ce085b60e83ea01c7ffc9a25a4878e63e63fa3364c
MD5 116677c8d1962c34e1fa8fa3df1f9760
BLAKE2b-256 b5d2a960b687ce52c887a9c6349255a411b90902578c64a5245b9501442ee63f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page