Skip to main content

Save an RSS or ATOM feed to a SQLITE database

Project description

Build Status Tests PyPI License

feed-to-sqlite

Download an RSS or Atom feed and save it to a SQLite database. This is meant to work well with datasette.

Installation

pip install feed-to-sqlite

CLI Usage

Let's grab the ATOM feeds for items I've shared on NewsBlur and my instapaper favorites save each its own table.

feed-to-sqlite feeds.db http://chrisamico.newsblur.com/social/rss/35501/chrisamico https://www.instapaper.com/starred/rss/13475/qUh7yaOUGOSQeANThMyxXdYnho

This will use a SQLite database called feeds.db, creating it if necessary. By default, each feed gets its own table, named based on a slugified version of the feed's title.

To load all items from multiple feeds into a common (or pre-existing) table, pass a --table argument:

feed-to-sqlite feeds.db --table links <url> <url>

That will put all items in a table called links.

Each feed also creates an entry in a feeds table containing top-level metadata for each feed. Each item will have a foreign key to the originating feed. This is especially useful if combining feeds into a shared table.

Python API

One function, ingest_feed, does most of the work here. The following will create a database called feeds.db and download my NewsBlur shared items into a new table called links.

from feed_to_sqlite import ingest_feed

url = "http://chrisamico.newsblur.com/social/rss/35501/chrisamico"

ingest_feed("feeds.db", url=url, table_name="links")

Transforming data on ingest

When working in Python directly, it's possible to pass in a function to transform rows before they're saved to the database.

The normalize argument to ingest_feed is a function that will be called on each feed item, useful for fixing links or doing additional work.

It's signature is normalize(table, entry, feed_details, client):

  • table is a SQLite table (from sqlite-utils)
  • entry is one feed item, as a dictionary
  • feed_details is a dictionary of top-level feed information, as a dictionary
  • client is an instance of httpx.Client, which can be used for outgoing HTTP requests during normalization

That function should return a dictionary representing the row to be saved. Returning a falsey value for a given row will cause that row to be skipped.

Development

Tests use pytest. Run pytest tests/ to run the test suite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feed_to_sqlite-0.6.2.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feed_to_sqlite-0.6.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file feed_to_sqlite-0.6.2.tar.gz.

File metadata

  • Download URL: feed_to_sqlite-0.6.2.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.14

File hashes

Hashes for feed_to_sqlite-0.6.2.tar.gz
Algorithm Hash digest
SHA256 a3156aac7a397a1af4c48528be34d74e406c9ed855af05a9b30ac4e9260148ee
MD5 d67f320a686f9888781f3bdd92008b8c
BLAKE2b-256 88189c4f3c590f488b5e8a520ccd74710b878765569cb0f3fcfd203f48bcf163

See more details on using hashes here.

File details

Details for the file feed_to_sqlite-0.6.2-py3-none-any.whl.

File metadata

File hashes

Hashes for feed_to_sqlite-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83b2adaf6b9c33167c45f6e174ad4368327abc6cc1d4df19c4693f22c92c04f7
MD5 aa47f28869c1f242ffa457b0ab207c01
BLAKE2b-256 6fc8c6cac0aedd59600e5f684734e8b9948fc0dfcce917400b8c89c0131ee7b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page