Skip to main content

Save listening history and feed/episode info from Overcast to a SQLite database.

Project description

overcast-to-sqlite

PyPI Lint Ruff Code style: black Checked with pytype Versions discord twitter

Save listening history and feed/episode info from Overcast to a SQLite database. Try exploring your podcast listening habits with Datasette!

How to install

$ pip install overcast-to-sqlite

Or to upgrade:

$ pip install --upgrade overcast-to-sqlite

Authentication

Run this command to login to Overcast (note: neither your password nor email are saved, only the auth cookie):

$ overcast-to-sqlite auth

This will create a file called auth.json in your current directory containing the required value. To save the file at a different path or filename, use the --auth=myauth.json option.

If you do not wish to save this information you can manually download the "All data" file from the Overcast account page and pass it into the save command as described below.

Fetching and saving updates

The save command retrieves all Overcast info and stores playlists, podcast feeds, and episodes in their respective tables with a primary key overcastId.

$ overcast-to-sqlite save

By default, this saves to overcast.db but this can be manually set.

$ overcast-to-sqlite save someother.db

By default, it will attempt to use the info in auth.json file is present it will use the cookie from that file. You can point to a different location using -a:

$ overcast-to-sqlite save -a /path/to/auth.json

Alternately, you can skip authentication by passing in an OPML file you downloaded from Overcast:

$ overcast-to-sqlite save --load /path/to/overcast.opml

By default, the save command will save any OPML file it downloads adjacent to the database file in archive/overcast/. You can disable this behavior with --no-archive or -na.

For increased reporting verbosity, use the -v flag.

Extending and saving full feeds

The extend command that will download the XML files for all feeds you are subscribed to and extract tags and attributes. These are stored in separate tables feeds_extended and episodes_extended with primary keys xmlUrl and enclosureUrl respectively. (See points 4 and 5 below for more information.)

$ overcast-to-sqlite extend

Like the save command, this will attempt to archive feeds to archive/feeds/ by default. This can be disabled with --no-archive or -na.

It also supports the -v flag to print additional information.

There are a few caveats for this functionality:

  1. The first time this is invoked will require downloading and parsing an XML file for each feed you are subscribed to. (Subsequent invocations only require this for new episodes loaded by save) Because this command may take a long time to run if you have many feeds, it is recommended to use the -v flag to observe progress.
  2. This will increase the size of your database by approximately 2 MB per feed, so may result in a large file if you subscribe to many feeds.
  3. Certain feeds may not load due to e.g. authentication, rate limiting, or other issues. These will be logged to the console and the feed will be skipped. Likewise, an episode may appear in your episodes table but not in the extended information if it is no longer available.
  4. The _extended tables use URLs as their primary key. This may potentially lead to unjoinable / orphaned episodes if the enclosure URL (i.e. URL of the audio file) has changed since Overcast stored it.
  5. There is no guarantee of which columns will be present in these tables aside from URL, title, and description. This command attempts to capture and normalize all XML tags contained in the feed so it is likely that many columns will be created and only a few rows will have values for uncommon tags/attributes.

Any suggestions for improving on these caveats are welcome, please open an issue!

Downloading transcripts

The transcripts command that will download the transcripts if available.

The save and extend commands MUST be run prior to this.

Episodes with a "podcast:transcript:url" value will be downloaded from that URL and the download's location will then be stored in "transcriptDownloadPath".

$ overcast-to-sqlite transcripts

Like previous commands, by default this will save transcripts to archive/transcripts/<feed title>/<episode title> by default.

A different path can be set with the -p/--path flag.

It also supports the -v flag to print additional information.

There is also a -s flag to only download transcripts for starred episodes.

See also

Development

Pull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.

Setup

git clone git@github.com:hbmartin/overcast-to-sqlite.git
cd overcast-to-sqlite
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m overcast_to_sqlite.cli all -v

Code Formatting

This project is linted with ruff and uses Black code formatting.

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

overcast_to_sqlite-0.7.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

overcast_to_sqlite-0.7.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file overcast_to_sqlite-0.7.0.tar.gz.

File metadata

  • Download URL: overcast_to_sqlite-0.7.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for overcast_to_sqlite-0.7.0.tar.gz
Algorithm Hash digest
SHA256 f79628449ac59a0982cc723a789ec467c0146c2312b24dd59aa1f0f6f40302ec
MD5 4632d0ebdebfde29a7733d2683faa362
BLAKE2b-256 e3e8b33b16eaae17e745fe3c4e4075d90ef9fa1f5b0bedde86b702b690d15741

See more details on using hashes here.

File details

Details for the file overcast_to_sqlite-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for overcast_to_sqlite-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfecc17b106437d878ee5a57e94962c1925dc4284b08ae0edb5fa0010595fe4e
MD5 139bb737c0c57a3063670a20fee3fe5e
BLAKE2b-256 41f889dd498da20679f56595aa3e2bdb7d5e0a185d66bba3e7845c4a4054fd32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page