A bit of extra usability for duckdb... inspired by fastlite and sqlite_utils
Project description
fastlite
fastlite
provides some little quality-of-life improvements for
interactive use of the wonderful
sqlite-utils library. It’s likely
to be particularly of interest to folks using Jupyter.
Install
pip install fastlite
Overview
from fastlite import *
from fastcore.utils import *
from fastcore.net import urlsave
We demonstrate fastlite
‘s features here using the ’chinook’ sample
database.
url = 'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite'
path = Path('chinook.sqlite')
if not path.exists(): urlsave(url, path)
db = database("chinook.sqlite")
Databases have a t
property that lists all tables:
dt = db.t
dt
Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
You can use this to grab a single table…:
artist = dt.Artist
artist
<Table Artist (ArtistId, Name)>
…or multiple tables at once:
dt['Artist','Album','Track','Genre','MediaType']
[<Table Artist (ArtistId, Name)>,
<Table Album (AlbumId, Title, ArtistId)>,
<Table Track (TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice)>,
<Table Genre (GenreId, Name)>,
<Table MediaType (MediaTypeId, Name)>]
It also provides auto-complete in Jupyter, IPython, and nearly any other interactive Python environment:
You can check if a table is in the database already:
'Artist' in dt
True
Column work in a similar way to tables, using the c
property:
ac = artist.c
ac
ArtistId, Name
Auto-complete works for columns too:
Columns, tables, and view stringify in a format suitable for including in SQL statements. That means you can use auto-complete in f-strings.
qry = f"select * from {artist} where {ac.Name} like 'AC/%'"
print(qry)
select * from "Artist" where "Artist"."Name" like 'AC/%'
You can view the results of a select query using q
:
db.q(qry)
[{'ArtistId': 1, 'Name': 'AC/DC'}]
Views can be accessed through the v
property:
album = dt.Album
acca_sql = f"""select {album}.*
from {album} join {artist} using (ArtistId)
where {ac.Name} like 'AC/%'"""
db.create_view("AccaDaccaAlbums", acca_sql, replace=True)
acca_dacca = db.q(f"select * from {db.v.AccaDaccaAlbums}")
acca_dacca
[{'AlbumId': 1,
'Title': 'For Those About To Rock We Salute You',
'ArtistId': 1},
{'AlbumId': 4, 'Title': 'Let There Be Rock', 'ArtistId': 1}]
Dataclass support
A dataclass
type with the names, types, and defaults of the tables is
created using dataclass()
:
album_dc = album.dataclass()
Let’s try it:
album_obj = album_dc(**acca_dacca[0])
album_obj
Album(AlbumId=1, Title='For Those About To Rock We Salute You', ArtistId=1)
You can get the definition of the dataclass using fastcore’s
dataclass_src
– everything is treated as nullable, in order to handle
auto-generated database values:
src = dataclass_src(album_dc)
hl_md(src, 'python')
@dataclass
class Album:
AlbumId: int | None = None
Title: str | None = None
ArtistId: int | None = None
Because dataclass()
is dynamic, you won’t get auto-complete in editors
like vscode – it’ll only work in dynamic environments like Jupyter and
IPython. For editor support, you can export the full set of dataclasses
to a module, which you can then import from:
create_mod(db, 'db_dc')
Indexing into a table does a query on primary key:
from db_dc import Track
Track(**dt.Track[1])
Track(TrackId=1, Name='For Those About To Rock (We Salute You)', AlbumId=1, MediaTypeId=1, GenreId=1, Composer='Angus Young, Malcolm Young, Brian Johnson', Milliseconds=343719, Bytes=11170334, UnitPrice=0.99)
There’s a shortcut to select from a table – just call it as a function.
If you’ve previously called dataclass()
, returned iterms will be
constructed using that class by default. There’s lots of params you can
check out, such as limit
:
album(limit=2)
[Album(AlbumId=1, Title='For Those About To Rock We Salute You', ArtistId=1),
Album(AlbumId=2, Title='Balls to the Wall', ArtistId=2)]
Pass a truthy value as with_pk
and you’ll get tuples of primary keys
and records:
album(with_pk=1, limit=2)
[(1,
Album(AlbumId=1, Title='For Those About To Rock We Salute You', ArtistId=1)),
(2, Album(AlbumId=2, Title='Balls to the Wall', ArtistId=2))]
Indexing also uses the dataclass by default:
album[5]
Album(AlbumId=5, Title='Big Ones', ArtistId=3)
If you set xtra
fields, then indexing is also filtered by those. As a
result, for instance in this case, nothing is returned since album 5 is
not created by artist 1:
album.xtra(ArtistId=1)
try: album[5]
except NotFoundError: print("Not found")
Not found
The same filtering is done when using the table as a callable:
album()
[Album(AlbumId=1, Title='For Those About To Rock We Salute You', ArtistId=1),
Album(AlbumId=4, Title='Let There Be Rock', ArtistId=1)]
Insert, upsert, and update
The following methods accept **kwargs
, passing them along to the first
dict
param:
create
transform
transform_sql
update
insert
upsert
lookup
We can access a table that doesn’t actually exist yet:
cats = dt.cats
cats
<Table cats (does not exist yet)>
We can use keyword arguments to now create that table:
cats.create(id=int, name=str, weight=float, uid=int, pk='id')
hl_md(cats.schema, 'sql')
CREATE TABLE [cats] (
[id] INTEGER PRIMARY KEY,
[name] TEXT,
[weight] FLOAT,
[uid] INTEGER
)
It we set xtra
then the additional fields are used for insert
,
update
, and delete
:
cats.xtra(uid=2)
cat = cats.insert(name='meow', weight=6)
The inserted row is returned, including the xtra ‘uid’ field.
cat
{'id': 1, 'name': 'meow', 'weight': 6.0, 'uid': 2}
Using **
in update
here doesn’t actually achieve anything, since we
can just pass a dict
directly – it’s just to show that it works:
cat['name'] = "moo"
cat['uid'] = 1
cats.update(**cat)
cats()
[{'id': 1, 'name': 'moo', 'weight': 6.0, 'uid': 2}]
Attempts to update or insert with xtra fields are ignored.
An error is raised if there’s an attempt to update a record not matching
xtra
fields:
cats.xtra(uid=1)
try: cats.update(**cat)
except NotFoundError: print("Not found")
Not found
This all also works with dataclasses:
cats.xtra(uid=2)
cats.dataclass()
cat = cats[1]
cat
Cats(id=1, name='moo', weight=6.0, uid=2)
cat.name = 'foo'
cats.upsert(cat)
cats()
[Cats(id=1, name='foo', weight=6.0, uid=2)]
cats.drop()
cats
<Table cats (does not exist yet)>
Diagrams
If you have graphviz installed, you can create database diagrams:
diagram(db.tables)
Pass a subset of tables to just diagram those. You can also adjust the size and aspect ratio.
diagram(db.t['Artist','Album','Track','Genre','MediaType'], size=8, ratio=0.4)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.