Skip to main content

No project description provided

Project description

kirin

Version controlled data storage with cloud support!

Made with ❤️ by Eric J. Ma (@ericmjl).

Features

  • 📦 Linear versioning for datasets - Simple, Git-like commits without branching complexity
  • 🔗 Content-addressed storage - Files stored by content hash for integrity and deduplication
  • ☁️ Cloud storage support - S3, GCS, Azure, Minio, Backblaze B2, etc.
  • 🔄 Automatic filesystem detection from URIs
  • 🔐 Easy authentication helpers
  • 🚀 Simple, intuitive API - Focus on ergonomic Python interface
  • 📊 File versioning - Track changes to individual files over time

Quick Start

from kirin import Catalog, Dataset, File, Commit
from pathlib import Path

# Create a catalog (works with local and cloud storage)
catalog = Catalog(root_dir="/path/to/data")  # Local storage
catalog = Catalog(root_dir="s3://my-bucket")  # S3 storage
catalog = Catalog(root_dir="gs://my-bucket")  # GCS storage

# Get or create a dataset
ds = catalog.get_dataset("my_dataset")

# Commit files
commit_hash = ds.commit(message="Initial commit", add_files=["file1.csv"])

# Checkout the latest commit
ds.checkout()

# Access files from current commit
files = ds.files
print(f"Files in current commit: {list(files.keys())}")

# Work with files locally
with ds.local_files() as local_files:
    # Access files as local paths
    csv_path = local_files["file1.csv"]
    content = Path(csv_path).read_text()  # text mode
    binary_content = Path(csv_path).read_bytes()  # binary mode

# Checkout a specific commit
ds.checkout(commit_hash)

# Get commit history
history = ds.history(limit=10)
for commit in history:
    print(f"{commit.short_hash}: {commit.message}")

Cloud Authentication

If you get authentication errors, see the Cloud Storage Authentication Guide or use helper functions:

from kirin import Catalog, get_gcs_filesystem

# GCS with service account
fs = get_gcs_filesystem(token='/path/to/key.json')
catalog = Catalog(root_dir="gs://my-bucket", fs=fs)
ds = catalog.get_dataset("my_dataset")

Advanced Usage

Working with Files

from kirin import Catalog
from pathlib import Path

# Create catalog and get dataset
catalog = Catalog(root_dir="s3://my-bucket")
ds = catalog.get_dataset("my_dataset")
ds.checkout()

# Work with files locally (recommended approach)
with ds.local_files() as local_files:
    # Access files as local paths
    csv_path = local_files["data.csv"]

    # Read file content
    content = Path(csv_path).read_text()

    # Get file info
    file_size = Path(csv_path).stat().st_size
    print(f"File size: {file_size} bytes")

    # Open as file handle
    with open(csv_path, "r") as f:
        data = f.read()

Working with Commits

from kirin import Catalog

# Create catalog and get dataset
catalog = Catalog(root_dir="gs://my-bucket")
ds = catalog.get_dataset("my_dataset")

# Get specific commit
commit = ds.get_commit(commit_hash)
if commit:
    print(f"Commit: {commit.short_hash}")
    print(f"Message: {commit.message}")
    print(f"Files: {commit.list_files()}")
    print(f"Total size: {commit.get_total_size()} bytes")

Local File Access

from kirin import Catalog
from pathlib import Path
import pandas as pd

# Create catalog and get dataset
catalog = Catalog(root_dir="s3://my-bucket")
ds = catalog.get_dataset("my_dataset")
ds.checkout()

# Work with all files locally (recommended pattern)
with ds.local_files() as local_files:
    for filename, local_path in local_files.items():
        print(f"{filename} -> {local_path}")

        # Process files with standard Python libraries
        if filename.endswith('.csv'):
            df = pd.read_csv(local_path)
        elif filename.endswith('.txt'):
            text = Path(local_path).read_text()
        elif filename.endswith('.json'):
            import json
            data = json.loads(Path(local_path).read_text())

Documentation

Installation

Option 1: Pixi (Recommended for Development)

# Clone and install
git clone git@github.com:ericmjl/kirin
cd kirin
pixi install

# Set up SSL certificates for cloud storage (one-time setup)
pixi run setup-ssl

# Start the web UI
pixi run python -m kirin.web.app

Option 2: UV Tool (Recommended for Production)

# Install with uv
uv tool install kirin

# Set up SSL certificates (one-time setup)
uv run python -m kirin.setup_ssl

# Start the web UI
uv run kirin ui

Option 3: UVX (One-time Use)

# Run directly with uvx
uvx kirin ui

# If SSL issues occur, set up certificates
uvx python -m kirin.setup_ssl

Option 4: System Python

# Install with pip
pip install kirin

# No SSL setup needed - uses system certificates
kirin ui

Get started for development

To get started:

git clone git@github.com:ericmjl/kirin
cd kirin
pixi install

Development Commands

Once installed, you can use these common development commands:

# Run tests
pixi run -e tests pytest

# Run tests for a specific file
pixi run -e tests pytest tests/test_filename.py

# Run all tests with verbose output
pixi run -e tests pytest -v

# Run tests without coverage
pixi run -e tests pytest --no-cov

# Set up SSL certificates (if needed)
pixi run setup-ssl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kirin-0.0.15.tar.gz (208.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kirin-0.0.15-py3-none-any.whl (267.4 kB view details)

Uploaded Python 3

File details

Details for the file kirin-0.0.15.tar.gz.

File metadata

  • Download URL: kirin-0.0.15.tar.gz
  • Upload date:
  • Size: 208.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kirin-0.0.15.tar.gz
Algorithm Hash digest
SHA256 b83850e7e2936ae3085b7c73ad1fa0082b801aa5e27633bfafee19c081e78727
MD5 d0160e03e19ad2961d0345c30aba05f4
BLAKE2b-256 dced81d15c5f6a1fcbbed8e2253d37fed97cf6f47485050031cd4554b8894812

See more details on using hashes here.

File details

Details for the file kirin-0.0.15-py3-none-any.whl.

File metadata

  • Download URL: kirin-0.0.15-py3-none-any.whl
  • Upload date:
  • Size: 267.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kirin-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 aa8fac8f4b0601c097dea8b04bcb1c0d192aa2cfdceaa480efd37bb718e3d960
MD5 0c56daebbd83ef67a5548503eb2a36fa
BLAKE2b-256 ed92e34ab435829e5d119accfea5c4b6dbc9d62b0305607e11bb7c9a1002a974

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page