Skip to main content

The dbcooper package turns a database connection into a collection of functions, handling logic for keeping track of connections and letting you take advantage of autocompletion when exploring a database.

Project description

dbcooper-py

CI Binder

The dbcooper package turns a database connection into a collection of functions, handling logic for keeping track of connections and letting you take advantage of autocompletion when exploring a database.

It's especially helpful to use when authoring database-specific Python packages, for instance in an internal company package or one wrapping a public data source.

For the R version see dgrtwo/dbcooper.

Installation

pip install dbcooper

Example

Initializing the functions

The dbcooper package asks you to create the connection first. As an example, we'll use the Lahman baseball database package (lahman).

from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite

# connect to sqlite
engine = create_engine("sqlite://")

# load the lahman data into the "lahman" schema
lahman_sqlite(engine)

Next we'll set up dbcooper

from dbcooper import DbCooper

dbc = DbCooper(engine)

The DbCooper object contains two important things:

  • Accessors to fetch specific tables.
  • Functions for interacting with the underlying database.

Using table accessors

In the example below, we'll use the "Lahman"."Salaries" table as an example. By default, dbcooper makes this accessible as .lahman_salaries.

Plain .lahman_salaries prints out table and column info, including types and descriptions.

# show table and column descriptions
dbc.lahman_salaries

salaries

(No table description.)

name type description
index BIGINT
yearID BIGINT
teamID TEXT
lgID TEXT
playerIDTEXT
salary BIGINT

Note that sqlite doesn't support table and columnn descriptions, so these sections are empty.

Calling .lahman_salaries() fetches a lazy version of the data.

dbc.lahman_salaries()
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
   index  yearID teamID lgID   playerID  salary
0      0    1985    ATL   NL  barkele01  870000
1      1    1985    ATL   NL  bedrost01  550000
2      2    1985    ATL   NL  benedbr01  545000
3      3    1985    ATL   NL   campri01  633333
4      4    1985    ATL   NL  ceronri01  625000
# .. may have more rows

Note that this data is a siuba LazyTbl object, which you can use to analyze the data.

from siuba import _, count

dbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
   over_100k      n
0       True  25374
1      False   1054
# .. may have more rows

Using database functions

  • .list(): Get a list of tables
  • .tbl(): Access a table that can be worked with using siuba.
  • .query(): Perform a SQL query and work with the result.
  • ._engine: Get the underlying sqlalchemy engine.

For instance, we could start by finding the names of the tables in the Lahman database.

dbc.list()
['lahman.allstar_full',
 'lahman.appearances',
 'lahman.awards_managers',
 'lahman.awards_players',
 'lahman.awards_share_managers',
 'lahman.awards_share_players',
 'lahman.batting',
 'lahman.batting_post',
 'lahman.college_playing',
 'lahman.fielding',
 'lahman.fielding_of',
 'lahman.fielding_ofsplit',
 'lahman.fielding_post',
 'lahman.hall_of_fame',
 'lahman.home_games',
 'lahman.managers',
 'lahman.managers_half',
 'lahman.parks',
 'lahman.people',
 'lahman.pitching',
 'lahman.pitching_post',
 'lahman.salaries',
 'lahman.schools',
 'lahman.series_post',
 'lahman.teams',
 'lahman.teams_franchises',
 'lahman.teams_half']

We can access one of these tables with dbc.tbl(), then put it through any kind of siuba operation.

dbc.tbl("Salaries")
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
   index  yearID teamID lgID   playerID  salary
0      0    1985    ATL   NL  barkele01  870000
1      1    1985    ATL   NL  bedrost01  550000
2      2    1985    ATL   NL  benedbr01  545000
3      3    1985    ATL   NL   campri01  633333
4      4    1985    ATL   NL  ceronri01  625000
# .. may have more rows
from siuba import _, count
dbc.tbl("Salaries") >> count(_.yearID, sort=True)
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
   yearID     n
0    1999  1006
1    1998   998
2    1995   986
3    1996   931
4    1997   925
# .. may have more rows

If you'd rather start from a SQL query, use the .query() method.

dbc.query("""
    SELECT
        playerID,
        sum(AB) as AB
    FROM Batting
    GROUP BY playerID
""")
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
    playerID     AB
0  aardsda01      4
1  aaronha01  12364
2  aaronto01    944
3   aasedo01      5
4   abadan01     21
# .. may have more rows

For anything else you might want to do, the sqlalchemy Engine object is available. For example, the code below shows how you can set its .echo attribute, which tells sqlalchemy to provide useful logs.

dbc._engine.echo = True
table_names = dbc.list()
2022-03-20 22:49:37,553 INFO sqlalchemy.engine.Engine PRAGMA database_list
2022-03-20 22:49:37,554 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine SELECT name FROM "main".sqlite_master WHERE type='table' ORDER BY name
2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-20 22:49:37,556 INFO sqlalchemy.engine.Engine SELECT name FROM "lahman".sqlite_master WHERE type='table' ORDER BY name
2022-03-20 22:49:37,557 INFO sqlalchemy.engine.Engine [raw sql] ()

Note that the log messages above show that the .list() method executed two queries: One to list tables in the "main" schema (which is empty), and one to list tables in the "lahman" schema.

Advanced Configuration

⚠️: These behaviors are well tested, but dbcooper's internals and API may change.

dbcooper can be configured in three ways, each corresponding to a class interface:

  • TableFinder: Which tables will be used by dbcooper.
  • AccessorBuilder: How table names are turned into accessors.
  • DbcDocumentedTable: The class that defines what an accessor will return.
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite
from dbcooper import DbCooper, AccessorBuilder

engine = create_engine("sqlite://")
lahman_sqlite(engine)

Excluding a schema

from dbcooper import TableFinder

finder = TableFinder(exclude_schemas=["lahman"])
dbc_no_lahman = DbCooper(engine, table_finder=finder)
dbc_no_lahman.list()
[]

Formatting table names

from dbcooper import AccessorBuilder

# omits schema, and keeps only table name
# e.g. `salaries`, rather than `lahman_salaries`
builder = AccessorBuilder(format_from_part="table")

tbl_flat = DbCooper(engine, accessor_builder=builder)
tbl_flat.salaries()
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
   index  yearID teamID lgID   playerID  salary
0      0    1985    ATL   NL  barkele01  870000
1      1    1985    ATL   NL  bedrost01  550000
2      2    1985    ATL   NL  benedbr01  545000
3      3    1985    ATL   NL   campri01  633333
4      4    1985    ATL   NL  ceronri01  625000
# .. may have more rows

Grouping tables by schema

from dbcooper import AccessorHierarchyBuilder

tbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())

# note the form: <schema>.<table>
tbl_nested.lahman.salaries()
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
   index  yearID teamID lgID   playerID  salary
0      0    1985    ATL   NL  barkele01  870000
1      1    1985    ATL   NL  bedrost01  550000
2      2    1985    ATL   NL  benedbr01  545000
3      3    1985    ATL   NL   campri01  633333
4      4    1985    ATL   NL  ceronri01  625000
# .. may have more rows

Don't show table documentation

from dbcooper import DbcSimpleTable

dbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)
dbc_no_doc.lahman_salaries
DbcSimpleTable(..., 'salaries', 'lahman')

Note that sqlalchemy dialects like snowflake-sqlalchemy cannot look up things like table and column descriptions as well as other dialects, so DbcSimpleTable may be needed to connect to snowflake (see this issue).

Developing

# install with development dependencies
pip install -e .[dev]

# or install from requirements file
pip install -r requirements/dev.txt

Test

# run all tests, see pytest section of pyproject.toml
pytest

# run specific backends
pytest -m 'not snowflake and not bigquery'

# stop on first failure, drop into debugger
pytest -x --pdb

Release

# set version number
git tag v0.0.1

# (optional) push to github
git push origin --tags

# check version
python -m setuptools_scm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbcooper-0.0.7.tar.gz (127.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbcooper-0.0.7-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file dbcooper-0.0.7.tar.gz.

File metadata

  • Download URL: dbcooper-0.0.7.tar.gz
  • Upload date:
  • Size: 127.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbcooper-0.0.7.tar.gz
Algorithm Hash digest
SHA256 b775f543662fb91d6529a77b498fd1e795d00c1b5b855a216fc13d873d784f06
MD5 07629ba4603121128203b11c3303a679
BLAKE2b-256 b82db531eca21d6e0b1fb0d09d39136ecf545cec56be413b93b83dc1032896b3

See more details on using hashes here.

File details

Details for the file dbcooper-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: dbcooper-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbcooper-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2dab151162d4bf3765c9062ff949c349410a6e958cde808fed88b0e49c502c55
MD5 6a0bc26208dd0edd2be213a9ac2b354d
BLAKE2b-256 3e6041b013718301f565cb2aa276c1aee6134adafd92d9b7fa2592635a4c3e52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page