Package for working with pandas Dataset, but with specialized functions used for Energinet
Project description
Datamazing
The Datamazing package provides an interface for various transformations of data (filtering, aggregation, merging, etc.)
Interface
The interface is very similar to those of most DataFrame libraries (pandas, pyspark, SQL, etc.). For example, a group-by is implemented as group(df, by=["..."]), and a merge is implemented as merge([df1, df2], on=["..."], how="inner"). So, why not just use native pandas, pyspark, etc.?
- The native libraries have some parts, with a little annoying interface (such as pandas inconsistent use of indexing)
- Ability to add custom operations, used specifically for the Energinet domain.
Backends
The package contains methods with the same interface, but for different backends. Currently, 2 backends are supported: pandas and pyspark (though not all methods are available for both). For example, when working with pandas DataFrames, one would use
import pandas as pd
import datamazing.pandas as pdz
df = pd.DataFrame([
{"animal": "cat", "time": pd.Timestamp("2020-01-01"), "age": 1.0},
{"animal": "cat", "time": pd.Timestamp("2020-01-02"), "age": 3.0},
{"animal": "dog", "time": pd.Timestamp("2020-01-01"), "age": 5.0},
])
pdz.group(df, by="animal") \
.resample(on="time", resolution=pd.Timedelta(hours=12)) \
.agg("interpolate")
whereas, when working with pyspark DataFrame, one would instead use
import datetime as dt
import pyspark.sql as ps
import datamazing.pyspark as psz
spark = ps.SparkSession.getActiveSession()
df = spark.createDataFrame([
{"animal": "cat", "time": dt.datetime(2020, 1, 1), "age": 1.0},
{"animal": "cat", "time": dt.datetime(2020, 1, 2), "age": 3.0},
{"animal": "dog", "time": dt.datetime(2020, 1, 1), "age": 5.0},
])
psz.group(df, by="animal") \
.resample(on="time", resolution=pd.Timedelta(hours=12)) \
.agg("interpolate")
Development
To setup the Python environment, run
$ pip install poetry
$ poetry install
To run test locally one needs java. This can be installed using the following:
$ sudo apt install default-jdk
To execute unit tests, run
$ pytest .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamazing-8.0.7.tar.gz.
File metadata
- Download URL: datamazing-8.0.7.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.4 CPython/3.10.20 Linux/6.17.0-1010-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9699e2695f096222dc48c78d97a291bb14ba58bbb957f5373008a489f967dca1
|
|
| MD5 |
5b8a2ad6a317a58bac8541b362d780b1
|
|
| BLAKE2b-256 |
5373c3411e6f1639630a29218fbeab73d49459065107dacfa8f8a811fcbde16d
|
File details
Details for the file datamazing-8.0.7-py3-none-any.whl.
File metadata
- Download URL: datamazing-8.0.7-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.4 CPython/3.10.20 Linux/6.17.0-1010-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91731be093671b01631b7438df8519751ce6bb1f7440e4d784db5d27cf13d2b0
|
|
| MD5 |
fb12db97ad019a0d3bec361685581350
|
|
| BLAKE2b-256 |
8851176ccbd22401fa1f00e6f57cd34dbbf7ad56a6ffe6d7fb429a197da1771b
|