Skip to main content

Convert Reddit posts to text

Project description

Reddit2Text

reddit2text is the Python library designed to effortlessly transform any Reddit thread into clean, readable text data.

Perfect for feeding to an LLM, performing textual/data analysis, or simply archiving for offline use, reddit2text offers a straightforward interface to access and convert content from Reddit.

Table of Contents

Features

  • Convert any Reddit thread (the post + all its comments) into structured text.
  • Include all comments, with the ability to specify the maximum comment depth.
  • Configure a custom comment delimiter, for visual separation of nested comments.

Have a Feature Idea?

Simply open an issue on github and tell us what should be added to the next release!

Installation

Easy install using pip

pip3 install reddit2text

Quickstart

First, you need to create a Reddit app to get your client_id and client_secret. Follow the instructions on Reddit's API documentation to set up your application.

Then, replace the client_id, client_secret, and user_agent with your credentials.

The user agent can be anything you like, but we recommend following this convention according to Reddit's guidelines: '<app type>:<app name>:<version> (by <your username>)'

Here's an example:

from reddit2text import Reddit2Text

r2t = Reddit2Text(
    # example credentials
    client_id='123abc',
    client_secret='123abc',
    user_agent='script:my_app:v1.0 (by u/reddit2text)'
)

# The URL must have the post ID after the /comments/ to work, e.g. `1buyr0g`
URL = 'https://www.reddit.com/r/MadeMeSmile/comments/1buyr0g/ryan_reynolds_being_wholesome/'

output = r2t.textualize_post(URL)
print(output)

Here is an example (truncated) output from the above code! https://pastebin.com/mmHFJtcc

Extra Configuration

  • max_comment_depth: Maximum depth of comments to output. Includes the top-most comment. Defaults to None or -1 to include all.
  • comment_delim: String/character used to indent comments according to their nesting level. Defaults to | to mimic reddit.
r2t = Reddit2Text(
    # credentials ...
    max_comment_depth=3,  # all comment chains will be limited to a max of 3 replies
    comment_delim='#'  # each comment level will be preceded by multiples of this string
)

Contributions

Contributions to reddit2text are welcome. Please submit pull requests or issues to our GitHub repository.

License

reddit2text is released under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reddit2text-0.0.9.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reddit2text-0.0.9-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file reddit2text-0.0.9.tar.gz.

File metadata

  • Download URL: reddit2text-0.0.9.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for reddit2text-0.0.9.tar.gz
Algorithm Hash digest
SHA256 b2defa149e841a9a5142bc82b121d276b056eaa3f87e1e528a3337c9ded6b349
MD5 ebfb342f5963fd1a2c14d2c80713e19a
BLAKE2b-256 35a3fa7ade4567b39b6f507f9575e5acef6bde9a91852b3452717aeaeade2640

See more details on using hashes here.

File details

Details for the file reddit2text-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: reddit2text-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for reddit2text-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c950f7872a589f5382223b81813f361e7945a54d6c7d1d9cbf95491f67f4cb4a
MD5 90bf93cee60fda12a0939fab3e94631d
BLAKE2b-256 e4f5e9eb5d5b1f851aba31d91e0d07dd7e1e44001db69733d82fe419f2398d48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page