Skip to main content

Data transformation framework for LinkML data models

Project description

Koza - a data transformation framework

Pyversions PyPi Github Action

pupa

Documentation

Disclaimer: Koza is in beta - we are looking for testers!

Overview

  • Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
  • Koza also can output data in the KGX format
  • Write data transforms in semi-declarative Python
  • Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
  • Create or import mapping files to be used in ingests (eg id mapping, type mappings)
  • Create and use translation tables to map between source and target vocabularies

Installation

Koza is available on PyPi and can be installed via pip/pipx:

[pip|pipx] install koza

Usage

NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp instance. Please see the updated documentation for details.

See the Koza documentation for usage information

Try the Examples

Validate

Give Koza a local or remote csv file, and get some basic information (headers, number of rows)

koza validate \
  --file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
  --delimiter ' '

Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl

koza validate \
  --file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
  --format jsonl
koza validate \
  --file ./examples/data/ddpheno.json.gz \
  --format json

Transform

Run the example ingest, "string/protein-links-detailed"

koza transform \
  --source examples/string/protein-links-detailed.yaml \
  --global-table examples/translation_table.yaml

koza transform \
  --source examples/string-declarative/protein-links-detailed.yaml \
  --global-table examples/translation_table.yaml

Note: Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory:

.
├── ...
│   ├── your_source
│   │   ├── your_ingest.yaml
│   │   └── your_ingest.py
│   └── some_translation_table.yaml
└── ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koza-2.1.1.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

koza-2.1.1-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file koza-2.1.1.tar.gz.

File metadata

  • Download URL: koza-2.1.1.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for koza-2.1.1.tar.gz
Algorithm Hash digest
SHA256 8dba8d316fb756873a48d66d9085dcd4f6e4c5b8476fa41b9dbc5a607d360497
MD5 8c72a7132bc5c0a145c15786b94f88f5
BLAKE2b-256 815954cc59f120734cdab30a1f4e10ad718e3e73b4e609aeb7141a0683eaaaf8

See more details on using hashes here.

File details

Details for the file koza-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: koza-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for koza-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5348c1ebe2f5dde8287ec67220b8775ff52fb758f521e4f36afb8bd7bad585b2
MD5 838a6ba1b5f3066fa61e3927d11530a9
BLAKE2b-256 a5f7625cc873af50cf79f205004d8d484eb7fa6d5a3599f58630eabd82f3f40d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page