Skip to main content

No project description provided

Project description

kirin

Version controlled data storage with cloud support!

Made with ❤️ by Eric J. Ma (@ericmjl).

Features

  • 📦 Linear versioning for datasets - Simple, Git-like commits without branching complexity
  • 🔗 Content-addressed storage - Files stored by content hash for integrity and deduplication
  • ☁️ Cloud storage support - S3, GCS, Azure, Minio, Backblaze B2, etc.
  • 🔄 Automatic filesystem detection from URIs
  • 🔐 Easy authentication helpers
  • 🚀 Simple, intuitive API - Focus on ergonomic Python interface
  • 📊 File versioning - Track changes to individual files over time

Quick Start

from kirin import Catalog, Dataset, File, Commit
from pathlib import Path

# Create a catalog (works with local and cloud storage)
catalog = Catalog(root_dir="/path/to/data")  # Local storage
catalog = Catalog(root_dir="s3://my-bucket")  # S3 storage
catalog = Catalog(root_dir="gs://my-bucket")  # GCS storage

# Get or create a dataset
ds = catalog.get_dataset("my_dataset")

# Commit files
commit_hash = ds.commit(message="Initial commit", add_files=["file1.csv"])

# Checkout the latest commit
ds.checkout()

# Access files from current commit
files = ds.files
print(f"Files in current commit: {list(files.keys())}")

# Work with files locally
with ds.local_files() as local_files:
    # Access files as local paths
    csv_path = local_files["file1.csv"]
    content = Path(csv_path).read_text()  # text mode
    binary_content = Path(csv_path).read_bytes()  # binary mode

# Checkout a specific commit
ds.checkout(commit_hash)

# Get commit history
history = ds.history(limit=10)
for commit in history:
    print(f"{commit.short_hash}: {commit.message}")

Cloud Authentication

If you get authentication errors, see the Cloud Storage Authentication Guide or use helper functions:

from kirin import Catalog, get_gcs_filesystem

# GCS with service account
fs = get_gcs_filesystem(token='/path/to/key.json')
catalog = Catalog(root_dir="gs://my-bucket", fs=fs)
ds = catalog.get_dataset("my_dataset")

Advanced Usage

Working with Files

from kirin import Catalog
from pathlib import Path

# Create catalog and get dataset
catalog = Catalog(root_dir="s3://my-bucket")
ds = catalog.get_dataset("my_dataset")
ds.checkout()

# Work with files locally (recommended approach)
with ds.local_files() as local_files:
    # Access files as local paths
    csv_path = local_files["data.csv"]

    # Read file content
    content = Path(csv_path).read_text()

    # Get file info
    file_size = Path(csv_path).stat().st_size
    print(f"File size: {file_size} bytes")

    # Open as file handle
    with open(csv_path, "r") as f:
        data = f.read()

Working with Commits

from kirin import Catalog

# Create catalog and get dataset
catalog = Catalog(root_dir="gs://my-bucket")
ds = catalog.get_dataset("my_dataset")

# Get specific commit
commit = ds.get_commit(commit_hash)
if commit:
    print(f"Commit: {commit.short_hash}")
    print(f"Message: {commit.message}")
    print(f"Files: {commit.list_files()}")
    print(f"Total size: {commit.get_total_size()} bytes")

Local File Access

from kirin import Catalog
from pathlib import Path
import pandas as pd

# Create catalog and get dataset
catalog = Catalog(root_dir="s3://my-bucket")
ds = catalog.get_dataset("my_dataset")
ds.checkout()

# Work with all files locally (recommended pattern)
with ds.local_files() as local_files:
    for filename, local_path in local_files.items():
        print(f"{filename} -> {local_path}")

        # Process files with standard Python libraries
        if filename.endswith('.csv'):
            df = pd.read_csv(local_path)
        elif filename.endswith('.txt'):
            text = Path(local_path).read_text()
        elif filename.endswith('.json'):
            import json
            data = json.loads(Path(local_path).read_text())

Documentation

Installation

Option 1: Pixi (Recommended for Development)

# Clone and install
git clone git@github.com:ericmjl/kirin
cd kirin
pixi install

# Set up SSL certificates for cloud storage (one-time setup)
pixi run setup-ssl

# Start the web UI
pixi run python -m kirin.web.app

Option 2: UV Tool (Recommended for Production)

# Install with uv
uv tool install kirin

# Set up SSL certificates (one-time setup)
uv run python -m kirin.setup_ssl

# Start the web UI
uv run kirin ui

Option 3: UVX (One-time Use)

# Run directly with uvx
uvx kirin ui

# If SSL issues occur, set up certificates
uvx python -m kirin.setup_ssl

Option 4: System Python

# Install with pip
pip install kirin

# No SSL setup needed - uses system certificates
kirin ui

Get started for development

To get started:

git clone git@github.com:ericmjl/kirin
cd kirin
pixi install

Development Commands

Once installed, you can use these common development commands:

# Run tests
pixi run -e tests pytest

# Run tests for a specific file
pixi run -e tests pytest tests/test_filename.py

# Run all tests with verbose output
pixi run -e tests pytest -v

# Run tests without coverage
pixi run -e tests pytest --no-cov

# Set up SSL certificates (if needed)
pixi run setup-ssl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kirin-0.0.14.tar.gz (206.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kirin-0.0.14-py3-none-any.whl (264.8 kB view details)

Uploaded Python 3

File details

Details for the file kirin-0.0.14.tar.gz.

File metadata

  • Download URL: kirin-0.0.14.tar.gz
  • Upload date:
  • Size: 206.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kirin-0.0.14.tar.gz
Algorithm Hash digest
SHA256 0c470b0ec2c8f9c2770bfd443486ceb69742d18ca341a35436287321b8acab56
MD5 b2043072d650da14ca69cb3a1173f30d
BLAKE2b-256 d9db92487d84a3acfee167fa9f41f501c21262801535b7b66699be42e8171f1d

See more details on using hashes here.

File details

Details for the file kirin-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: kirin-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 264.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kirin-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 70444ade0f5596cf971cc1921622722e96e73e21e9ec503e512277fcce07a0b0
MD5 c4fcfdaf54a3f103376ff3612d6d9b81
BLAKE2b-256 3fa500409066163f62cc3a833246dbcebe8177bd538703f416c370df5d136df7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page