sscu-budapest utilities for scientific data engineering
Project description
datazimmer
To create a new project
- make sure that
python
points topython>=3.8
and you havepip
andgit
thenpip install datazimmer
- run
dz init project-name
- pulls project-template
- add a remote
- both to git and dvc (can run
dz build-meta
to see available dvc remotes) - git remote can be given with
dz init
- both to git and dvc (can run
- create, register and document steps in a pipeline you will run in different environments
- build metadata to exportable and serialized format with
dz build-meta
- if you defined importable data from other artifacts in the config, you can import them with
load-external-data
- ensure that you import envs that are served from sources you have access to
- if you defined importable data from other artifacts in the config, you can import them with
- build and run pipeline steps by running
dz run
- validate that the data matches the datascript description with
dz validate
Scheduling
- a project as a whole has a cron expression in
zimmer.yaml
to determine the schedule of reruns - additionally, aswan projects within the dz project can have different cron expressions for scheduling new runs of the aswan projects
Test projects
TODO: document dogshow and everything else much better here
Lookahead
- overlapping names convention
- resolve naming confusion with colassigner, colaccessor and table feature / composite type / index base classes
- abstract composite type + subclass of entity class
- import ACT, inherit from it and specify
- importing composite type is impossible now if it contains foreign key :(
- add option to infer data type of assigned feature
- can be problematic b/c pandas int/float/nan issue
- create similar sets of features in a dry way
- overlapping in entities
- detect / signal the same type of entity
- exports: postgres, postgis , superset
W3C compliancy plan
- test suite for compliance: https://w3c.github.io/csvw/publishing-snapshots/PR-earl/earl.html
- https://github.com/w3c/csvw
@article{tennison2015model,
title={Model for tabular data and metadata on the web},
author={Tennison, Jeni and Kellogg, Gregg and Herman, Ivan},
year={2015}
}
@article{pollock2015metadata,
title={Metadata vocabulary for tabular data},
author={Pollock, Rufus and Tennison, Jeni and Kellogg, Gregg and Herman, Ivan},
journal={W3C Recommendation},
volume={17},
year={2015}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datazimmer-0.5.3.tar.gz
(62.1 kB
view details)
Built Distribution
File details
Details for the file datazimmer-0.5.3.tar.gz
.
File metadata
- Download URL: datazimmer-0.5.3.tar.gz
- Upload date:
- Size: 62.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe46a7e0d6c7a8bdfabfc27aa5b4a55fc8e7e00178a65bca0668c76cffc780ce |
|
MD5 | 8c71a94a6fe7e19b71b383d8f349af20 |
|
BLAKE2b-256 | 79656cc6038d769105c987a5d93f8700fcdd84481125eb4725633d000a6ab83e |
File details
Details for the file datazimmer-0.5.3-py3-none-any.whl
.
File metadata
- Download URL: datazimmer-0.5.3-py3-none-any.whl
- Upload date:
- Size: 50.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9ef017745a117c32a50f591eafb0c07dca43dd2aca1c160de0cf39e450018b0 |
|
MD5 | 170e1baae4e11e96360cbbaa7b41bbf8 |
|
BLAKE2b-256 | 09c6625ec6c96005db02898187980d1cc8c08ad53a4ac6ec2fcea92d7126d080 |