Skip to main content

Data transformation framework for LinkML data models

Project description

Koza - Knowledge Graph Transformation and Operations Toolkit

Pyversions PyPi Github Action

pupa

Documentation

Overview

Koza is a Python library and CLI tool for transforming biomedical data and performing graph operations on Knowledge Graph Exchange (KGX) files. It provides two main capabilities:

📊 Graph Operations (New!)

Powerful DuckDB-based operations for KGX knowledge graphs:

  • Join multiple KGX files with schema harmonization
  • Split files by field values with format conversion
  • Prune dangling edges and handle singleton nodes
  • Append new data to existing databases with schema evolution
  • Multi-format support for TSV, JSONL, and Parquet files

🔄 Data Transformation (Core)

Transform biomedical data sources into KGX format:

  • Transform csv, json, yaml, jsonl, and xml to target formats
  • Output in KGX format
  • Write data transforms in semi-declarative Python
  • Configure source files, columns/properties, and metadata in YAML
  • Create mapping files and translation tables between vocabularies

Installation

Koza is available on PyPi and can be installed via pip/pipx:

[pip|pipx] install koza

Usage

See the Koza documentation for complete usage information.

Key Features

🔧 Multi-Format Support

  • Native support for TSV, JSONL, and Parquet KGX files
  • Automatic format detection and conversion
  • Mixed-format operations in single commands

🛡️ Schema Flexibility

  • Automatic schema harmonization across heterogeneous files
  • Schema evolution with backward compatibility
  • Comprehensive schema reporting and validation

High Performance

  • DuckDB-powered operations for fast bulk processing
  • Memory-efficient handling of large knowledge graphs
  • Parallel processing and streaming where possible

🔍 Rich CLI Experience

  • Progress indicators for long-running operations
  • Detailed statistics and operation summaries
  • Dry-run modes for safe operation preview

🧹 Data Integrity

  • Dangling edge detection and preservation
  • Duplicate detection and removal strategies
  • Non-destructive operations with data archiving

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koza-2.6.0.tar.gz (432.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

koza-2.6.0-py3-none-any.whl (166.9 kB view details)

Uploaded Python 3

File details

Details for the file koza-2.6.0.tar.gz.

File metadata

  • Download URL: koza-2.6.0.tar.gz
  • Upload date:
  • Size: 432.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.22

File hashes

Hashes for koza-2.6.0.tar.gz
Algorithm Hash digest
SHA256 fb270938de295bbf01a59b976a43719f8947ace9ece697ff7decf067d82a8318
MD5 4614dc8af37355ffab5d8a36222bda31
BLAKE2b-256 26d2fe686cb6602a2abb358073b72e2256aa42c1cc5570a4df3c8dfd9408983a

See more details on using hashes here.

File details

Details for the file koza-2.6.0-py3-none-any.whl.

File metadata

  • Download URL: koza-2.6.0-py3-none-any.whl
  • Upload date:
  • Size: 166.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.22

File hashes

Hashes for koza-2.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f07e208badc1a515bc32407bdd5e8086607f65239c647294dea573d0429ce913
MD5 46b4dbdded52f0997507aa8094a4b9d3
BLAKE2b-256 9e751e015ee7ba9c252d246a93c55572cef0ac4f47fc6501c0d2fb147504b225

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page