Skip to main content

Save an RSS or ATOM feed to a SQLITE database

Project description

Build Status Tests PyPI License

feed-to-sqlite

Download an RSS or Atom feed and save it to a SQLite database. This is meant to work well with datasette.

Installation

pip install feed-to-sqlite

CLI Usage

Let's grab the ATOM feeds for items I've shared on NewsBlur and my instapaper favorites save each its own table.

feed-to-sqlite feeds.db http://chrisamico.newsblur.com/social/rss/35501/chrisamico https://www.instapaper.com/starred/rss/13475/qUh7yaOUGOSQeANThMyxXdYnho

This will use a SQLite database called feeds.db, creating it if necessary. By default, each feed gets its own table, named based on a slugified version of the feed's title.

To load all items from multiple feeds into a common (or pre-existing) table, pass a --table argument:

feed-to-sqlite feeds.db --table links <url> <url>

That will put all items in a table called links.

Each feed also creates an entry in a feeds table containing top-level metadata for each feed. Each item will have a foreign key to the originating feed. This is especially useful if combining feeds into a shared table.

Python API

One function, ingest_feed, does most of the work here. The following will create a database called feeds.db and download my NewsBlur shared items into a new table called links.

from feed_to_sqlite import ingest_feed

url = "http://chrisamico.newsblur.com/social/rss/35501/chrisamico"

ingest_feed("feeds.db", url=url, table_name="links")

Transforming data on ingest

When working in Python directly, it's possible to pass in a function to transform rows before they're saved to the database.

The normalize argument to ingest_feed is a function that will be called on each feed item, useful for fixing links or doing additional work.

It's signature is normalize(table, entry, feed_details, client):

  • table is a SQLite table (from sqlite-utils)
  • entry is one feed item, as a dictionary
  • feed_details is a dictionary of top-level feed information, as a dictionary
  • client is an instance of httpx.Client, which can be used for outgoing HTTP requests during normalization

That function should return a dictionary representing the row to be saved. Returning a falsey value for a given row will cause that row to be skipped.

Development

Tests use pytest. Run pytest tests/ to run the test suite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feed-to-sqlite-0.5.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

feed_to_sqlite-0.5.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file feed-to-sqlite-0.5.1.tar.gz.

File metadata

  • Download URL: feed-to-sqlite-0.5.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for feed-to-sqlite-0.5.1.tar.gz
Algorithm Hash digest
SHA256 f7ef6c4b3e95899b366d8ba3ba831f8fa5c0ee59b8cd81b2481800e6e5c4cefc
MD5 618e83ca3c99b71e23a296f287a04f2a
BLAKE2b-256 dab8c41627e76dfa03bfd120874ad59003906a364b9e1f8f5727b49b8fa1168d

See more details on using hashes here.

File details

Details for the file feed_to_sqlite-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: feed_to_sqlite-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for feed_to_sqlite-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f32f50f88898866d41eb516ba90a4f072e21c6f1dbbbc40a6990620f07f76be7
MD5 1344f03208420a9058287a23a57fd165
BLAKE2b-256 747a4de9bbb4d4096e5f8986b2e2bc73ef96801e02a24f7db38a40bcbb5cf0c2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page