Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)

These details have not been verified by PyPI

Project links

Project description

Omnypy logo

Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration.

Conceptual overview of Omnipy

Updates

Feb 3, 2023: Documentation of the Omnipy API is still sparse. However, for examples of running code, please check out the omnipy-examples repo.
Dec 22, 2022: Omnipy is the new name of the Python package formerly known as uniFAIR. We are very grateful to Dr. Jamin Chen, who gracefully transferred ownership of the (mostly unused) "omnipy" name in PyPI to us!_

Installation and use

For basic information on installation and use of omnipy, read the INSTALL.md file.

Contribute to omnipy development

For basic information on how to set up a development environment to effectively contribute to the omnipy library, read the CONTRIBUTING.md file.

Overview of Omnipy

Generic functionality

(NOTE: Read the section Transformation on the FAIRtracks.net website for a more detailed and better formatted version of the following description!)

Omnipy is designed primarily to simplify development and deployment of (meta)data transformation processes in the context of FAIRification and data brokering efforts. However, the functionality is very generic and can also be used to support research data (and metadata) transformations in a range of fields and contexts beyond life science, including day-to-day research scenarios:

Data wrangling in day-to-day research

Researchers in life science and other data-centric fields often need to extract, manipulate and integrate data and/or metadata from different sources, such as repositories, databases or flat files. Much research time is spent on trivial and not-so-trivial details of such "data wrangling":

reformat data structures
clean up errors
remove duplicate data
map and integrate dataset fields
etc.

General software for data wrangling and analysis, such as Pandas, R or Frictionless, are useful, but researchers still regularly end up with hard-to-reuse scripts, often with manual steps.

Step-wise data model transformations

With the Omnipy Python package, researchers can import (meta)data in almost any shape or form: nested JSON; tabular (relational) data; binary streams; or other data structures. Through a step-by-step process, data is continuously parsed and reshaped according to a series of data model transformations.

"Parse, don't validate"

Omnipy follows the principles of "Type-driven design" (read Technical note #2: "Parse, don't validate" on the FAIRtracks.net website for more info). It makes use of cutting-edge Python type hints and the popular pydantic package to "pour" data into precisely defined data models that can range from very general (e.g. "any kind of JSON data", "any kind of tabular data", etc.) to very specific (e.g. "follow the FAIRtracks JSON Schema for track files with the extra restriction of only allowing BigBED files").

Data types as contracts

Omnipy tasks (single steps) or flows (workflows) are defined as transformations from specific input data models to specific output data models. pydantic-based parsing guarantees that the input and output data always follows the data models (i.e. data types). Thus, the data models defines "contracts" that simplifies reuse of tasks and flows in a mix-and-match fashion.

Catalog of common processing steps

Omnipy is built from the ground up to be modular. We aim to provide a catalog of commonly useful functionality ranging from:

data import from REST API endpoints, common flat file formats, database dumps, etc.
flattening of complex, nested JSON structures
standardization of relational tabular data (i.e. removing redundancy)
mapping of tabular data between schemas
lookup and mapping of ontology terms
semi-automatic data cleaning (through e.g. Open Refine)
support for common data manipulation software and libraries, such as Pandas, R, Frictionless, etc.

In particular, we will provide a FAIRtracks module that contains data models and processing steps to transform metadata to follow the FAIRtracks standard.

Catalog of commonly useful processing steps, data modules and tool integrations

Refine and apply templates

An Omnipy module typically consists of a set of generic task and flow templates with related data models, (de)serializers, and utility functions. The user can then pick task and flow templates from this extensible, modular catalog, further refine them in the context of a custom, use case-specific flow, and apply them to the desired compute engine to carry out the transformations needed to wrangle data into the required shape.

Rerun only when needed

When piecing together a custom flow in Omnipy, the user has persistent access to the state of the data at every step of the process. Persistent intermediate data allows for caching of tasks based on the input data and parameters. Hence, if the input data and parameters of a task does not change between runs, the task is not rerun. This is particularly useful for importing from REST API endpoints, as a flow can be continuously rerun without taxing the remote server; data import will only carried out in the initial iteration or when the REST API signals that the data has changed.

Scale up with external compute resources

In the case of large datasets, the researcher can set up a flow based on a representative sample of the full dataset, in a size that is suited for running locally on, say, a laptop. Once the flow has produced the correct output on the sample data, the operation can be seamlessly scaled up to the full dataset and sent off in software containers to run on external compute resources, using e.g. Kubernetes. Such offloaded flows can be easily monitored using a web GUI.

Working with Omnipy directly from an Integrated Development Environment (IDE)

Industry-standard ETL backbone

Offloading of flows to external compute resources is provided by the integration of Omnipy with a workflow engine based on the Prefect Python package. Prefect is an industry-leading platform for dataflow automation and orchestration that brings a series of powerful features to Omnipy:

Predefined integrations with a range of compute infrastructure solutions
Predefined integration with various services to support extraction, transformation, and loading (ETL) of data and metadata
Code as workflow ("If Python can write it, Prefect can run it")
Dynamic workflows: no predefined Direct Acyclic Graphs (DAGs) needed!
Command line and web GUI-based visibility and control of jobs
Trigger jobs from external events such as GitHub commits, file uploads, etc.
Define continuously running workflows that still respond to external events
Run tasks concurrently through support for asynchronous tasks

Overview of the compute and storage infrastructure integrations that comes built-in with Prefect

Pluggable workflow engines

It is also possible to integrate Omnipy with other workflow backends by implementing new workflow engine plugins. This is relatively easy to do, as the core architecture of Omnipy allows the user to easily switch the workflow engine at runtime. Omnipy supports both traditional DAG-based and the more avant garde code-based definition of flows. Two workflow engines are currently supported: local and prefect.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.22.1

Jan 8, 2026

0.22.0

Jan 8, 2026

0.21.2

Nov 3, 2025

0.21.1

Nov 1, 2025

0.21.0

Nov 1, 2025

0.20.1

Jan 7, 2025

0.20.0

Jan 6, 2025

0.19.0

Dec 17, 2024

0.18.0

Dec 6, 2024

0.17.2

Nov 12, 2024

0.17.0

Nov 7, 2024

0.16.1

Sep 16, 2024

0.16.0 yanked

Sep 16, 2024

Reason this release was yanked:

Issues with Python 3.12

0.15.12

Apr 22, 2024

0.15.11

Apr 22, 2024

0.15.10

Mar 22, 2024

0.15.9

Mar 22, 2024

0.15.8

Feb 16, 2024

0.15.7

Feb 15, 2024

0.15.6

Feb 9, 2024

0.15.5

Feb 9, 2024

0.15.4

Feb 9, 2024

0.15.3

Feb 9, 2024

0.15.2

Feb 8, 2024

0.15.1

Feb 6, 2024

0.15.0

Feb 6, 2024

0.14.12

Feb 2, 2024

0.14.11

Feb 2, 2024

0.14.10

Jan 30, 2024

0.14.9

Jan 30, 2024

0.14.8

Jan 30, 2024

0.14.7

Jan 7, 2024

0.14.6

Jan 7, 2024

0.14.5

Jan 7, 2024

0.14.4

Jan 7, 2024

0.14.3

Jan 7, 2024

0.14.2

Jan 7, 2024

0.13.2

Jan 5, 2024

0.13.1

Jan 3, 2024

This version

0.13.0

Dec 19, 2023

0.12.3

Dec 15, 2023

0.12.2

Dec 14, 2023

0.12.1

Dec 13, 2023

0.12.0

Dec 13, 2023

0.11.0

Dec 11, 2023

0.10.5

Sep 29, 2023

0.10.4

Apr 20, 2023

0.10.3

Mar 31, 2023

0.10.2 yanked

Mar 31, 2023

Reason this release was yanked:

Bad build

0.10.1

Mar 24, 2023

0.10.0 yanked

Mar 24, 2023

Reason this release was yanked:

Based on commit that no longer exists

0.9.5

Mar 14, 2023

0.9.3

Mar 1, 2023

0.9.2 yanked

Feb 14, 2023

Reason this release was yanked:

Released from branch that was later rebased

0.9.1 yanked

Feb 14, 2023

Reason this release was yanked:

Released from branch that was later rebased

0.9.0

Feb 14, 2023

0.8.2

Feb 3, 2023

0.8.1

Feb 3, 2023

0.8.0

Feb 2, 2023

0.7.0

Jan 24, 2023

0.6.0

Jan 18, 2023

0.5.2

Jan 12, 2023

0.5.1

Jan 12, 2023

0.5.0

Jan 12, 2023

0.4.2

Jan 12, 2023

0.4.1

Jan 12, 2023

0.4.0

Jan 12, 2023

0.3.0

Jan 11, 2023

0.2.0

Dec 22, 2022

0.1.2 yanked

Nov 12, 2014

Reason this release was yanked:

Unrelated and retired package. Replaced from version 0.2.0 onwards with a different package, formerly named unifair

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnipy-0.13.0.tar.gz (77.3 kB view details)

Uploaded Dec 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnipy-0.13.0-py3-none-any.whl (100.3 kB view details)

Uploaded Dec 19, 2023 Python 3

File details

Details for the file omnipy-0.13.0.tar.gz.

File metadata

Download URL: omnipy-0.13.0.tar.gz
Upload date: Dec 19, 2023
Size: 77.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.10.8 Darwin/21.6.0

File hashes

Hashes for omnipy-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`d004dd23df3672e27d51ec85618eb859af9896d26ab12a37d9bba7abb0da19e6`
MD5	`81dd4e28aa2a86393dedd5ec8ca7dd9d`
BLAKE2b-256	`d6661f4345646742901798ce22d93ee419914122b2de95909f9916f701f1ccdf`

See more details on using hashes here.

File details

Details for the file omnipy-0.13.0-py3-none-any.whl.

File metadata

Download URL: omnipy-0.13.0-py3-none-any.whl
Upload date: Dec 19, 2023
Size: 100.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.10.8 Darwin/21.6.0

File hashes

Hashes for omnipy-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`99803721f25ed487bdb179759eabd8d550c1cf2f50523601d057a964736f8462`
MD5	`72268fd4d72686896ab035707cccf17d`
BLAKE2b-256	`e66483b5f009782a22cb346895e1aaa334f63c9f73a2f2f827c13ffb5bced189`

See more details on using hashes here.

omnipy 0.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Updates

Installation and use

Contribute to omnipy development

Overview of Omnipy

Generic functionality

Data wrangling in day-to-day research

Step-wise data model transformations

"Parse, don't validate"

Data types as contracts

Catalog of common processing steps

Refine and apply templates

Rerun only when needed

Scale up with external compute resources

Industry-standard ETL backbone

Pluggable workflow engines

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes