Skip to main content

sscu-budapest utilities for scientific data engineering

Project description

datazimmer

Documentation Status codeclimate codecov pypi DOI

To create a new project

  • make sure that python points to python>=3.8 and you have pip and git then pip install datazimmer
  • run dz init project-name
  • add a remote
    • both to git and dvc (can run dz build-meta to see available dvc remotes)
    • git remote can be given with dz init
  • create, register and document steps in a pipeline you will run in different environments
  • build metadata to exportable and serialized format with dz build-meta
    • if you defined importable data from other artifacts in the config, you can import them with load-external-data
    • ensure that you import envs that are served from sources you have access to
  • build and run pipeline steps by running dz run
  • validate that the data matches the datascript description with dz validate

Scheduling

  • a project as a whole has a cron expression in zimmer.yaml to determine the schedule of reruns
  • additionally, aswan projects within the dz project can have different cron expressions for scheduling new runs of the aswan projects

Test projects

TODO: document dogshow and everything else much better here

Lookahead

  • overlapping names convention
  • resolve naming confusion with colassigner, colaccessor and table feature / composite type / index base classes
  • abstract composite type + subclass of entity class
    • import ACT, inherit from it and specify
    • importing composite type is impossible now if it contains foreign key :(
  • add option to infer data type of assigned feature
    • can be problematic b/c pandas int/float/nan issue
  • create similar sets of features in a dry way
  • overlapping in entities
    • detect / signal the same type of entity
  • exports: postgres, postgis , superset

W3C compliancy plan

@article{tennison2015model,
  title={Model for tabular data and metadata on the web},
  author={Tennison, Jeni and Kellogg, Gregg and Herman, Ivan},
  year={2015}
}
@article{pollock2015metadata,
  title={Metadata vocabulary for tabular data},
  author={Pollock, Rufus and Tennison, Jeni and Kellogg, Gregg and Herman, Ivan},
  journal={W3C Recommendation},
  volume={17},
  year={2015}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datazimmer-0.5.3.tar.gz (62.1 kB view details)

Uploaded Source

Built Distribution

datazimmer-0.5.3-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file datazimmer-0.5.3.tar.gz.

File metadata

  • Download URL: datazimmer-0.5.3.tar.gz
  • Upload date:
  • Size: 62.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for datazimmer-0.5.3.tar.gz
Algorithm Hash digest
SHA256 fe46a7e0d6c7a8bdfabfc27aa5b4a55fc8e7e00178a65bca0668c76cffc780ce
MD5 8c71a94a6fe7e19b71b383d8f349af20
BLAKE2b-256 79656cc6038d769105c987a5d93f8700fcdd84481125eb4725633d000a6ab83e

See more details on using hashes here.

File details

Details for the file datazimmer-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: datazimmer-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 50.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for datazimmer-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b9ef017745a117c32a50f591eafb0c07dca43dd2aca1c160de0cf39e450018b0
MD5 170e1baae4e11e96360cbbaa7b41bbf8
BLAKE2b-256 09c6625ec6c96005db02898187980d1cc8c08ad53a4ac6ec2fcea92d7126d080

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page