Skip to main content

Data transformation framework for LinkML data models

Project description

Koza - a data transformation framework Pyversions PyPi

pupa

Documentation: https://koza.monarchinitiative.org/

Disclaimer: Koza is in beta; we are looking for beta testers

Overview

  • Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
  • Koza also can output data in the KGX format
  • Write data transforms in semi-declarative Python
  • Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
  • Create or import mapping files to be used in ingests (eg id mapping, type mappings)
  • Create and use translation tables to map between source and target vocabularies

Installation

Koza is available on PyPi and can be installed via pip:

pip install koza

Usage

NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp instance. Please see the updated documentation for details.

See the Koza documentation for usage information

Try the Examples

Validate

Give Koza a local or remote csv file, and get some basic information (headers, number of rows)

koza validate \
  --file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
  --delimiter ' '

Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl

koza validate \
  --file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
  --format jsonl
koza validate \
  --file ./examples/data/ddpheno.json.gz \
  --format json \
  --compression gzip

Transform

Run the example ingest, "string/protein-links-detailed"

koza transform --source examples/string/protein-links-detailed.yaml --global-table examples/translation_table.yaml

koza transform --source examples/string-declarative/protein-links-detailed.yaml --global-table examples/translation_table.yaml

note: koza expects a directory structure as described in the above example (examples/ingest_name/ingest.yaml)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koza-0.2.5.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

koza-0.2.5-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file koza-0.2.5.tar.gz.

File metadata

  • Download URL: koza-0.2.5.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.8 Linux/5.15.0-1023-azure

File hashes

Hashes for koza-0.2.5.tar.gz
Algorithm Hash digest
SHA256 a363bafe84e99ec4eb923aa53f25d03e4cf7190f5f82aed91089cdcc406bb32e
MD5 5bce3b88d0bcf286134430588b3885a2
BLAKE2b-256 a2966d276c89ec504f1dd7acee1e1191d4df20e3cef9db0b90e067158fda3eaf

See more details on using hashes here.

File details

Details for the file koza-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: koza-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.8 Linux/5.15.0-1023-azure

File hashes

Hashes for koza-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 09738ce6bdf76df515fba6c74ead16fe8dff4c65e56c43857fb1ca079150d81f
MD5 1922a7c388be7eb4658ae713c3e27765
BLAKE2b-256 d6e79d8fe79e30c9e278f86f30dd8416c62c005098b66079dc83c0a8c0c2b13d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page