datasaurus

Data Engineering framework based on Polars.rs

These details have not been verified by PyPI

Project links

Project description

Datasaurus is a Data Engineering framework written in Python 3.8, 3.9, 3.10 and 3.11

It is based in Polars and heavily influenced by Django.

Datasaurus offers an opinionated, feature-rich and powerful framework to help you write data pipelines, ETLs or data manipulation programs.

Documentation (TODO)

It supports:

✅ Fully support read/write operations.
⭕ Not yet but will be implemented.
💀 Won't be implemented in the near future.

Storages:

Sqlite ✅
PostgresSQL ✅
MySQL ✅
Mariadb ✅
Local Storage ✅
Azure blob storage ⭕
AWS S3 ⭕

Formats:

CSV ✅
JSON ✅
PARQUET ✅
EXCEL ✅
AVRO ✅
TSV ⭕
SQL ⭕ (Like sql inserts)

Features:

Delta Tables ⭕
Field validations ⭕

Simple example

# settings.py 
from datasaurus.core.storage import PostgresStorage, StorageGroup, SqliteStorage
from datasaurus.core.models import StringColumn, IntegerColumn

# We set the environment that will be used.
os.environ['DATASAURUS_ENVIRONMENT'] = 'dev'

class ProfilesData(StorageGroup):
    dev = SqliteStorage(path='/data/data.sqlite')
    live = PostgresStorage(username='user', password='user', host='localhost', database='postgres')

    
# models.py
from datasaurus.core.models import Model, StringColumn, IntegerColumn

class ProfileModel(Model):
    id = IntegerColumn()
    username = StringColumn()
    mail = StringColumn()
    sex = StringColumn()

    class Meta:
        storage = ProfilesData
        table_name = 'PROFILE'

We can access the raw Polars dataframe with 'Model.df', it's lazy, meaning it will only load the data if we access the attribute.

>>> ProfileModel.df
shape: (100, 4)
┌─────┬────────────────────┬──────────────────────────┬─────┐
│ id  ┆ username           ┆ mail                     ┆ sex │
│ --- ┆ ---                ┆ ---                      ┆ --- │
│ i64 ┆ str                ┆ str                      ┆ str │
╞═════╪════════════════════╪══════════════════════════╪═════╡
│ 1   ┆ ehayes             ┆ colleen63@hotmail.com    ┆ F   │
│ 2   ┆ thompsondeborah    ┆ judyortega@hotmail.com   ┆ F   │
│ 3   ┆ orivera            ┆ iperkins@hotmail.com     ┆ F   │
│ 4   ┆ ychase             ┆ sophia92@hotmail.com     ┆ F   │
│ …   ┆ …                  ┆ …                        ┆ …   │
│ 97  ┆ mary38             ┆ sylvia80@yahoo.com       ┆ F   │
│ 98  ┆ charlessteven      ┆ usmith@gmail.com         ┆ F   │
│ 99  ┆ plee               ┆ powens@hotmail.com       ┆ F   │
│ 100 ┆ elliottchristopher ┆ wilsonbenjamin@yahoo.com ┆ M   │
└─────┴────────────────────┴──────────────────────────┴─────┘

We could now create a new model whose data is created from ProfileModel

class FemaleProfiles(Model):
    id = IntegerField()
    profile_id = IntegerField()
    mail = StringField()

    def calculate_data(self):
        return (
            ProfileModel.df
            .filter(ProfileModel.sex == 'F')
            .with_row_count('new_id')
            .with_columns(
                pl.col('new_id')
            )
            .with_columns(
                pl.col('id').alias('profile_id')
            )
        )

    class Meta:
        recalculate = 'if_no_data_in_storage'
        storage = ProfilesData
        table_name = 'PROFILE_FEMALES'

Et voilá! the columns will be auto selected from the column definitions (id, profile_id and email).

If we now call:

FemaleProfiles.df

It will check if the dataframe exists in the storage and if it does not, it will 'calculate' it again from calculate_data and save it to the Storage, this parameter can also be set to 'always'.

You can also move data to different environments or storages, making it easy to change formats or move data around:

FemaleProfiles.save(to=ProfilesData.live)

Effectively moving data from SQLITE (dev) to PostgreSQL (live),

# Can also change formats
FemaleProfiles.save(to=ProfilesData.otherenvironment, format=LocalFormat.JSON)
FemaleProfiles.save(to=ProfilesData.otherenvironment, format=LocalFormat.CSV)
FemaleProfiles.save(to=ProfilesData.otherenvironment, format=LocalFormat.PARQUET)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.2.dev4 pre-release

Dec 19, 2023

0.0.2.dev3 pre-release

Dec 19, 2023

0.0.2.dev0 pre-release

Dec 19, 2023

0.0.1.dev2 pre-release

Jun 27, 2023

0.0.1.dev1 pre-release

Jun 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasaurus-0.0.2.dev4.tar.gz (16.9 kB view details)

Uploaded Dec 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datasaurus-0.0.2.dev4-py3-none-any.whl (20.0 kB view details)

Uploaded Dec 19, 2023 Python 3

File details

Details for the file datasaurus-0.0.2.dev4.tar.gz.

File metadata

Download URL: datasaurus-0.0.2.dev4.tar.gz
Upload date: Dec 19, 2023
Size: 16.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.6 Linux/6.6.7-arch1-1

File hashes

Hashes for datasaurus-0.0.2.dev4.tar.gz
Algorithm	Hash digest
SHA256	`c8dd2a76fb6d52049232782ec575d2d53f634095478781a4a13c3793ec8a322a`
MD5	`34308388f237ef34c6be5d64ca17c588`
BLAKE2b-256	`34e537f1adf2e208b1a93e60d29c2b22ba4b4eb121850c8c0dab77319f7c9d46`

See more details on using hashes here.

File details

Details for the file datasaurus-0.0.2.dev4-py3-none-any.whl.

File metadata

Download URL: datasaurus-0.0.2.dev4-py3-none-any.whl
Upload date: Dec 19, 2023
Size: 20.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.6 Linux/6.6.7-arch1-1

File hashes

Hashes for datasaurus-0.0.2.dev4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9e37584a072adf1184b546fe4f0a66dd13dd55ccaed6ef89102fdd609f780ce7`
MD5	`614abde226269b643c3a1531860ebce9`
BLAKE2b-256	`a9cf9a68b4764c1a2df2668e8d8141c614654d45a3d55d8ee8f9ceeca03e74f3`

See more details on using hashes here.

datasaurus 0.0.2.dev4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

It supports:

Storages:

Formats:

Features:

Simple example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes