Skip to main content

EimerDB

Project description

EimerDB

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

About

EimerDB is a python package that gives database-like functionality to parquet files stored in google cloud storage. It achieves this by organising the parquet files in a certain way, reads and combines them with pyarrow and then query the combined pyarrow tables with duckdb. For use as a part of the statistical production process at Statistics Norway.

Features

Create and connect to a database

Create a new database by specifying the bucket name and a database name.

import eimerdb as db

db.create_eimerdb(bucket_name="bucket-name", db_name="prodcombasen")

Connect to your EimerDB database.

prodcombasen = db.EimerDBInstance("bucket-name", "prodcombasen")

Table Management

You can create a new table with the create_table method. Specify the table name, the schema, the partition columns and set if the table is editable or not. Define the columns in the schema, with a column name, type and a label.

schema = [
    {
        "name": "aar",
        "type": "int16",
        "label": "Årgangen."
    },
    {
        "name": "ident",
        "type": "string",
        "label": "Foretakets identifikator."
    },
    {
        "name": "skjemaversjon",
        "type": "string",
        "label": "Skjemaets versjon."
    },
    {
        "name": "råvarekode",
        "type": "string",
        "label": "Prefillet råvarekode. Disse kodene lages av NR."
    },
    {
        "name": "beskrivelse",
        "type": "string",
        "label": "Prefillet råvarebeskrivelse. Disse beskrivelsene lages av NR."
    },
    {
        "name": "forbruk",
        "type": "int64",
        "label": "Oppgitt forbruk (i 1 000 NOK) til den tilhørende råvarekoden."
    },
]

prodcombasen.create_table(
    table_name="prefill_prod",
    schema,
    partition_columns=["aar"],
    editable=True
)

Partitioning the table by one or more columns will help improve query performance

SQL Query Support

Query your tables with SQL syntax. You can optionally specify the partition to be queried.

prodcombasen.query(
    """SELECT *
    FROM prodcom_prefill
    WHERE produktkode = '10.13.11.20'""",
    partition_select = {
        "aar": [2022, 2021]
        }

Updates

Perform updates using SQL statements Each update is saved as a separate parquet file for versioning. The update files includes a username column and a datetime column for when the update happened.

prodcombasen.query(
    """UPDATE prodcom_prefill
    SET mengde = 123
    WHERE ident = '123456'
    AND produktkode = '10.13.11.20'""",
    partition_select = partitions
)

Easily access the unedited version of a table

Retrieve the unedited version of your data by specifying unedited=True.

prodcombasen.query(
    """SELECT *
    FROM prodcom_prefill""",
    unedited=True
)

Query the changes made to a table

You can query alle the changes made to the table with the query_changes method.

prodcombasen.query_changes(
    """SELECT *
    FROM prodcom_prefill""",
    unedited=True
)

Query multiple tables

Query multiple tables using JOIN and subquery.

prodcombasen.query(
    f"""SELECT
            t1.aar,
            t1.produktkode,
            t1.beskrivelse,
            SUM(t1.mengde) AS mengde
        FROM
            prefill_prod AS t1
        JOIN (
            SELECT
                t2.aar,
                t2.ident,
                t2.skjemaversjon,
                MAX(t2.dato_mottatt) AS newest_dato_mottatt
            FROM
                skjemainfo AS t2
            GROUP BY
                t2.aar,
                t2.ident,
                t2.skjemaversjon
        ) AS subquery ON
            t1.aar = subquery.aar
            AND t1.ident = subquery.ident
            AND t1.skjemaversjon = subquery.skjemaversjon
        WHERE
            t1.mengde IS NOT NULL
        GROUP BY
            t1.aar,
            t1.produktkode,
            t1.beskrivelse;""",
        partition_select={
            "aar": [2022, 2021, 2020]
        },
    )

User Management (in development)

Add and remove users from your instance. Assign specific roles to users for access control.

prodcombasen.add_user(username="newuser", role="admin")
prodcombasen.remove_user(username="olduser")

Requirements

  • TODO

Installation

You can install EimerDB via pip from PyPI:

pip install ssb-eimerdb

Usage

Please see the Reference Guide for details.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, EimerDB is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from Statistics Norway's SSB PyPI Template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssb_eimerdb-0.2.7.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ssb_eimerdb-0.2.7-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file ssb_eimerdb-0.2.7.tar.gz.

File metadata

  • Download URL: ssb_eimerdb-0.2.7.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ssb_eimerdb-0.2.7.tar.gz
Algorithm Hash digest
SHA256 d61a572734d9afcca019091ffb7674279fa019da89507616fa7aac72b8607b91
MD5 f27e81cc3e843d5358d21c738774aae8
BLAKE2b-256 fca1e053817f5fa0d5ab0981de1ac8709ca8339c8fba0a69f8fbb09edea5a5c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ssb_eimerdb-0.2.7.tar.gz:

Publisher: release.yml on statisticsnorway/ssb-eimerdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ssb_eimerdb-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: ssb_eimerdb-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ssb_eimerdb-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 22a782138d6b5a607a31c39254b285280e72db9cb66e1b74304b9bc091213cfe
MD5 0eb0b8af00751a31662bc6329a28a7a6
BLAKE2b-256 6fe288f06fb60cfd1d711096459443d6edc86a412e62dee01b0643675f826085

See more details on using hashes here.

Provenance

The following attestation bundles were made for ssb_eimerdb-0.2.7-py3-none-any.whl:

Publisher: release.yml on statisticsnorway/ssb-eimerdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page