Skip to main content

Save an RSS or ATOM feed to a SQLITE database

Project description

Build Status Tests PyPI License

feed-to-sqlite

Download an RSS or Atom feed and save it to a SQLite database. This is meant to work well with datasette.

Installation

pip install feed-to-sqlite

CLI Usage

Let's grab the ATOM feeds for items I've shared on NewsBlur and my instapaper favorites save each its own table.

feed-to-sqlite feeds.db http://chrisamico.newsblur.com/social/rss/35501/chrisamico https://www.instapaper.com/starred/rss/13475/qUh7yaOUGOSQeANThMyxXdYnho

This will use a SQLite database called feeds.db, creating it if necessary. By default, each feed gets its own table, named based on a slugified version of the feed's title.

To load all items from multiple feeds into a common (or pre-existing) table, pass a --table argument:

feed-to-sqlite feeds.db --table links <url> <url>

That will put all items in a table called links.

Each feed also creates an entry in a feeds table containing top-level metadata for each feed. Each item will have a foreign key to the originating feed. This is especially useful if combining feeds into a shared table.

Python API

One function, ingest_feed, does most of the work here. The following will create a database called feeds.db and download my NewsBlur shared items into a new table called links.

from feed_to_sqlite import ingest_feed

url = "http://chrisamico.newsblur.com/social/rss/35501/chrisamico"

ingest_feed("feeds.db", url=url, table_name="links")

Transforming data on ingest

When working in Python directly, it's possible to pass in a function to transform rows before they're saved to the database.

The normalize argument to ingest_feed is a function that will be called on each feed item, useful for fixing links or doing additional work.

It's signature is normalize(table, entry, feed_details, client):

  • table is a SQLite table (from sqlite-utils)
  • entry is one feed item, as a dictionary
  • feed_details is a dictionary of top-level feed information, as a dictionary
  • client is an instance of httpx.Client, which can be used for outgoing HTTP requests during normalization

That function should return a dictionary representing the row to be saved. Returning a falsey value for a given row will cause that row to be skipped.

Development

Tests use pytest. Run pytest tests/ to run the test suite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feed-to-sqlite-0.5.1.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

feed_to_sqlite-0.5.1-py3-none-any.whl (9.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page