Skip to main content

Create a SQLite database containing data pulled from Reddit about a single user.

Project description

reddit-user-to-sqlite

Stores all the content from a specific user in a SQLite database. This includes their comments and will soon include their posts.

Install

The PyPI package is reddit-user-to-sqlite (PyPI Link). Install it globally using pipx:

pipx install reddit-user-to-sqlite

Usage

The CLI currently exposes a single command: user. An archive command is planned.

user

Fetches all comments and posts for a specific user.

reddit-user-to-sqlite user your_username
reddit-user-to-sqlite user your_username --db my-reddit-data.db

Params

Note: the argument order is reversed from most dogsheep packages (which take db_path first). This method allows for use of a default db name, so I prefer it.

  1. username: a case-insensitive string. The leading /u/ is optional (and ignored if supplied)
  2. (optional) --db: the path to a sqlite file, which will be created or updated as needed. Defaults to reddit.db.

A Note on Stored Data

While most Dogsheep projects grab the raw JSON output of their source APIs, Reddit's API has a lot of junk in it. So, I opted for a slimmed down approach.

Viewing Data

The resulting SQLite database pairs well with Datasette, a tool for viewing SQLite in the web. Below is my recommended configuration.

First, install datasette:

pipx install datasette

Then, add the recommended plugins (for rendering timestamps and markdown):

pipx inject datasette datasette-render-markdown datasette-render-timestamps

Finally, create a metadata.json file with the following:

{
  "databases": {
    "reddit": {
      "tables": {
        "comments": {
          "sort_desc": "timestamp",
          "plugins": {
            "datasette-render-markdown": {
              "columns": ["text"]
            },
            "datasette-render-timestamps": {
              "columns": ["timestamp"]
            }
          }
        },
        "subreddits": {
          "sort": "name"
        }
      }
    }
  }
}

Now when you run

datasette reddit.db --metadata metadata.json

You'll get a nice, formatted output:

Development

This section is people making changes to this package.

When in a virtual environment, run the following:

pip install -e '.[test]'

This installs the package in --edit mode and makes its dependencies available.

Running Tests

In your virtual environment, a simple pytest should run the unit test suite.

Motivation

I got nervous when I saw Reddit's notification of upcoming API changes. To ensure I could always access data I created, I wanted to make sure I had a backup in place before anything changed in a big way.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reddit-user-to-sqlite-0.1.0.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

reddit_user_to_sqlite-0.1.0-py3-none-any.whl (7.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page