pandabase links pandas DataFrames to SQL databases. Supports read, append, and upsert.
Project description
pandabase
pandabase is a tool for reading & writing DataFrames to & from SQLite
By default, uses DataFrame.index as primary key. By using an explicit primary key, pandabase makes it easy to upsert pandas data into SQL databases.
Designed for time series datasets that need to be updated over time and stored to disk, but are used in-memory for computation.
Tested under Python 3.6 and 3.7, with new versions of Pandas and SQLAlchemy
Features
- primary keys (any named index is assumed to be the PK)
- also supports 'auto_index'
- insert modes: 'create_only', 'upsert', and 'append'
- replaces pd.DataFrame.to_sql and pd.read_sql
- tested under SQLite
- postgres support coming soon
- automated tests in pytest
- 96% coverage
- also includes pandabase.companda.companda for rich comparisons of DataFrames
Design Considerations
- Minimal dependencies: SQLAlchemy and Pandas are the only requirements
- Database is the source of truth: will coerce incoming DataFrames to fit existing schema
- but also is reasonably smart about how new tables are created from DataFrames
- Not horrendously slow
License
MIT license
Thanks
Code partially stolen from Dataset and pandas.sql
Installation
From your inside your virtual environment of choice:
~/$ pip install pandabase
For latest version:
~/$ git clone https://github.com/notsambeck/pandabase
~/$ cd pandabase
~/pandabase/$ pip install -r requirements.txt
~/pandabase/$ pip install .
Usage
# Python >= 3.6
>>> import pandas as pd
>>> import pandabase
>>> my_data = pd.DataFrame(index=range(7, 12),
columns=['some_number'],
data=pd.np.random.random((5,1)))
>>> my_data.index.name = 'made_up_name' # index must be named to use as PK
>>> pandabase.to_sql(my_data, table_name='my_table', con='sqlite:///new_sqlite_db.sqlite', how='create_only')
Table('my_table', ...
>>> exit()
That's all!
Your data is now persistently stored in a SQLite database, using my_data.index as primary key. To append or update data, replace 'create_only' with 'append' or 'upsert'. To store records without an explicit index, use 'autoindex=True'.
~/pandabase$ ls
brand_new_sqlite_db.sqlite
>>> import pandabase
>>> df = pandabase.read_sql('my_table', con='sqlite:///new_sqlite_db.sqlite'))
>>> df
some_number
7 0.722416
8 0.076045
9 0.213118
10 0.453716
11 0.406995
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for pandabase-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bac8b4f47d2c60835d9866eea4d759ff494292e1f29bdf27bc9e623d1be1b12 |
|
MD5 | bce945cb7433202b663d2977330b4ceb |
|
BLAKE2b-256 | 3d0ff054480fea20fef416b38147dd628d84e59af39c81eefbf74bb74cfc416f |