EimerDB
Project description
EimerDB
About
EimerDB is a python package that gives database-like functionality to parquet files stored in google cloud storage. It achieves this by organising the parquet files in a certain way, reads and combines them with pyarrow and then query the combined pyarrow tables with duckdb. For use as a part of the statistical production process at Statistics Norway.
Features
Create and connect to a database
Create a new database by specifying the bucket name and a database name.
import eimerdb as db
db.create_eimerdb(bucket_name="bucket-name", db_name="prodcombasen")
Connect to your EimerDB database.
prodcombasen = db.EimerDBInstance("bucket-name", "prodcombasen")
Table Management
You can create a new table with the create_table method. Specify the table name, the schema, the partition columns and set if the table is editable or not. Define the columns in the schema, with a column name, type and a label.
schema = [
{
"name": "aar",
"type": "int16",
"label": "Årgangen."
},
{
"name": "ident",
"type": "string",
"label": "Foretakets identifikator."
},
{
"name": "skjemaversjon",
"type": "string",
"label": "Skjemaets versjon."
},
{
"name": "råvarekode",
"type": "string",
"label": "Prefillet råvarekode. Disse kodene lages av NR."
},
{
"name": "beskrivelse",
"type": "string",
"label": "Prefillet råvarebeskrivelse. Disse beskrivelsene lages av NR."
},
{
"name": "forbruk",
"type": "int64",
"label": "Oppgitt forbruk (i 1 000 NOK) til den tilhørende råvarekoden."
},
]
prodcombasen.create_table(
table_name="prefill_prod",
schema,
partition_columns=["aar"],
editable=True
)
Partitioning the table by one or more columns will help improve query performance
SQL Query Support
Query your tables with SQL syntax. You can optionally specify the partition to be queried.
prodcombasen.query(
"""SELECT *
FROM prodcom_prefill
WHERE produktkode = '10.13.11.20'""",
partition_select = {
"aar": [2022, 2021]
}
Updates
Perform updates using SQL statements Each update is saved as a separate parquet file for versioning. The update files includes a username column and a datetime column for when the update happened.
prodcombasen.query(
"""UPDATE prodcom_prefill
SET mengde = 123
WHERE ident = '123456'
AND produktkode = '10.13.11.20'""",
partition_select = partitions
)
Easily access the unedited version of a table
Retrieve the unedited version of your data by specifying unedited=True.
prodcombasen.query(
"""SELECT *
FROM prodcom_prefill""",
unedited=True
)
Query the changes made to a table
You can query alle the changes made to the table with the query_changes method.
prodcombasen.query_changes(
"""SELECT *
FROM prodcom_prefill""",
unedited=True
)
Query multiple tables
Query multiple tables using JOIN and subquery.
prodcombasen.query(
f"""SELECT
t1.aar,
t1.produktkode,
t1.beskrivelse,
SUM(t1.mengde) AS mengde
FROM
prefill_prod AS t1
JOIN (
SELECT
t2.aar,
t2.ident,
t2.skjemaversjon,
MAX(t2.dato_mottatt) AS newest_dato_mottatt
FROM
skjemainfo AS t2
GROUP BY
t2.aar,
t2.ident,
t2.skjemaversjon
) AS subquery ON
t1.aar = subquery.aar
AND t1.ident = subquery.ident
AND t1.skjemaversjon = subquery.skjemaversjon
WHERE
t1.mengde IS NOT NULL
GROUP BY
t1.aar,
t1.produktkode,
t1.beskrivelse;""",
partition_select={
"aar": [2022, 2021, 2020]
},
)
User Management (in development)
Add and remove users from your instance. Assign specific roles to users for access control.
prodcombasen.add_user(username="newuser", role="admin")
prodcombasen.remove_user(username="olduser")
Requirements
- TODO
Installation
You can install EimerDB via pip from PyPI:
pip install ssb-eimerdb
Usage
Please see the Reference Guide for details.
Contributing
Contributions are very welcome. To learn more, see the Contributor Guide.
License
Distributed under the terms of the MIT license, EimerDB is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Credits
This project was generated from Statistics Norway's SSB PyPI Template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ssb_eimerdb-0.2.2.tar.gz
.
File metadata
- Download URL: ssb_eimerdb-0.2.2.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf652448dd58a5e047249fc423652cfaf871a45cb4ed966e72fb83b0e4afccee |
|
MD5 | 49ef78efdac1bfd271d6511b4c9b3e76 |
|
BLAKE2b-256 | ca38226d6ac604ccc0f9513e4e24a82696629b7af83810462ad647050933b270 |
File details
Details for the file ssb_eimerdb-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: ssb_eimerdb-0.2.2-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bba988da07539cad4baf76002a3443417d3cf6cfc6d0456d314c38adcb661cba |
|
MD5 | 5ba7f21778ffa1098494e013f9eca0a3 |
|
BLAKE2b-256 | 146b554ef559456421be2ce8ee5bc43466ca23a448e4e08156ddd16fd92d8a63 |