Skip to main content

An Intake driver to access local or remote SQLite databases by URL.

Project description

SQLite Driver for Intake Data Catalogs

Tox-PyTest Status Codecov Test Coverage Read the Docs Build Status PyPI Latest Version conda-forge Version Supported Python Versions Any color you want, so long as it's black.

This package provides a (very) thin wrapper around the more general intake-sql driver, which can be used to generate Intake data catalogs from SQL databases.

The intake-sql driver takes an SQL Alchemy database URL and uses it to connect to and extract data from the database. This works with just fine with SQLite databases, but only when the database file is stored locally and can be referenced with a simple path.

For example this path:

/home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite

would correspond to this SQL Alchemy database URL:

sqlite:///home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite

But you can’t access a remote SQLite DB this way.

Why access a remote SQLite DB?

  • SQLite databases are great standalone, standardized containers for relational data, that can be accessed using a huge variety of tools on almost any computer platform. They are even accepted as an archival format by the US Library of Congress!

  • Data evolves over time, and it’s often useful to have easy access to several different versions of it, and to know exactly which version you’re working with.

  • Cloud object storage is extremely cheap and convenient, and makes it easy to publish and store historical file-based data releases.

  • Managing your own bespoke local filesystem hierarchy filled with data – and coordinating with colleagues so that everyone is using the same filesystem organizational scheme – is a pain.

  • Intake catalogs can provide easy access to metadata and let you manage data versions just like software versions. Installing a new version of the data catalog points you at the new version of the data.

  • The overhead and cost associated with setting up and maintaining a database that uses a client-server model is relatively large compared to distributing a few files that change infrequently, are essentially read-only resources, and only take up a few gigabytes of space.

How does it work?

Rather than using an SQL Alchemy database URL to reference the SQLite DB, this intake driver takes a local path or a remote URL, like:

  • ../pudl-work/sqlite/pudl.sqlite

  • https://global-power-plants.datasettes.com/global-power-plants.db

  • s3://cloudy-mc-cloudface-databucket/v1.2.3/mydata.db

For local paths, it resolves the path and prepends sqlite:// before handing it off to intake-sql to do all the hard work.

For remote URLs it uses fsspec to cache a local copy of the database, and then gives intake-sql a database URL that points to the cached copy.

import intake_sqlite

gpp_cat = intake_sqlite.SQLiteCatalog(
    urlpath="https://global-power-plants.datasettes.com/global-power-plants.db",
    storage_options={"simplecache": {"cache_storage": "/home/zane/.cache/intake"}},
)

list(gpp_cat)

# ['global-power-plants',
#  'global-power-plants_fts',
#  'global-power-plants_fts_config',
#  'global-power-plants_fts_data',
#  'global-power-plants_fts_docsize',
#  'global-power-plants_fts_idx']

About Catalyst Cooperative

Catalyst Cooperative is a small group of data wranglers and policy wonks organized as a worker-owned cooperative consultancy. Our goal is a more just, livable, and sustainable world. We integrate public data and perform custom analyses to inform public policy (Hire us!). Our focus is primarily on mitigating climate change and improving electric utility regulation in the United States.

Contact Us

  • For general support, questions, or other conversations around the project that might be of interest to others, check out the GitHub Discussions

  • If you’d like to get occasional updates about our projects sign up for our email list.

  • Want to schedule a time to chat with us one-on-one? Join us for Office Hours

  • Follow us on Twitter: @CatalystCoop

  • More info on our website: https://catalyst.coop

  • For private communication about the project or to hire us to provide customized data extraction and analysis, you can email the maintainers: pudl@catalyst.coop

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intake-sqlite-0.2.0.tar.gz (40.0 kB view details)

Uploaded Source

Built Distribution

intake_sqlite-0.2.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file intake-sqlite-0.2.0.tar.gz.

File metadata

  • Download URL: intake-sqlite-0.2.0.tar.gz
  • Upload date:
  • Size: 40.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for intake-sqlite-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f4d130ea5405440e19d899d2f323a94dacab5662beb12feadda1f6f4355801e8
MD5 ae61fd933f8fa9adfe78d9073653709e
BLAKE2b-256 35b63a2bcefce401d85490e8467a61780980946c175775fde05a96eeeff39ddf

See more details on using hashes here.

File details

Details for the file intake_sqlite-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for intake_sqlite-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f1e2c41610afcb1bf607c3ee258f597b4c558b8b7aa56cd8c4fed59691a84f4
MD5 e02d82e98b8f1754de433e765341f178
BLAKE2b-256 479edf46792c42b8a15225b6f173a32b76b4be0f43b5d309676e742453eec720

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page