Skip to main content

No project description provided

Project description

kirin

Version controlled data storage with cloud support!

Made with ❤️ by Eric J. Ma (@ericmjl).

Features

  • 📦 Linear versioning for datasets - Simple, Git-like commits without branching complexity
  • 🔗 Content-addressed storage - Files stored by content hash for integrity and deduplication
  • ☁️ Cloud storage support - S3, GCS, Azure, Minio, Backblaze B2, etc.
  • 🔄 Automatic filesystem detection from URIs
  • 🔐 Easy authentication helpers
  • 🚀 Simple, intuitive API - Focus on ergonomic Python interface
  • 📊 File versioning - Track changes to individual files over time

Quick Start

from kirin import Catalog, Dataset, File, Commit
from pathlib import Path

# Create a catalog (works with local and cloud storage)
catalog = Catalog(root_dir="/path/to/data")  # Local storage
catalog = Catalog(root_dir="s3://my-bucket")  # S3 storage
catalog = Catalog(root_dir="gs://my-bucket")  # GCS storage

# Get or create a dataset
ds = catalog.get_dataset("my_dataset")

# Commit files
commit_hash = ds.commit(message="Initial commit", add_files=["file1.csv"])

# Checkout the latest commit
ds.checkout()

# Access files from current commit
files = ds.files
print(f"Files in current commit: {list(files.keys())}")

# Work with files locally
with ds.local_files() as local_files:
    # Access files as local paths
    csv_path = local_files["file1.csv"]
    content = Path(csv_path).read_text()  # text mode
    binary_content = Path(csv_path).read_bytes()  # binary mode

# Checkout a specific commit
ds.checkout(commit_hash)

# Get commit history
history = ds.history(limit=10)
for commit in history:
    print(f"{commit.short_hash}: {commit.message}")

Cloud Authentication

If you get authentication errors, see the Cloud Storage Authentication Guide or use helper functions:

from kirin import Catalog, get_gcs_filesystem

# GCS with service account
fs = get_gcs_filesystem(token='/path/to/key.json')
catalog = Catalog(root_dir="gs://my-bucket", fs=fs)
ds = catalog.get_dataset("my_dataset")

Advanced Usage

Working with Files

from kirin import Catalog
from pathlib import Path

# Create catalog and get dataset
catalog = Catalog(root_dir="s3://my-bucket")
ds = catalog.get_dataset("my_dataset")
ds.checkout()

# Work with files locally (recommended approach)
with ds.local_files() as local_files:
    # Access files as local paths
    csv_path = local_files["data.csv"]

    # Read file content
    content = Path(csv_path).read_text()

    # Get file info
    file_size = Path(csv_path).stat().st_size
    print(f"File size: {file_size} bytes")

    # Open as file handle
    with open(csv_path, "r") as f:
        data = f.read()

Working with Commits

from kirin import Catalog

# Create catalog and get dataset
catalog = Catalog(root_dir="gs://my-bucket")
ds = catalog.get_dataset("my_dataset")

# Get specific commit
commit = ds.get_commit(commit_hash)
if commit:
    print(f"Commit: {commit.short_hash}")
    print(f"Message: {commit.message}")
    print(f"Files: {commit.list_files()}")
    print(f"Total size: {commit.get_total_size()} bytes")

Local File Access

from kirin import Catalog
from pathlib import Path
import pandas as pd

# Create catalog and get dataset
catalog = Catalog(root_dir="s3://my-bucket")
ds = catalog.get_dataset("my_dataset")
ds.checkout()

# Work with all files locally (recommended pattern)
with ds.local_files() as local_files:
    for filename, local_path in local_files.items():
        print(f"{filename} -> {local_path}")

        # Process files with standard Python libraries
        if filename.endswith('.csv'):
            df = pd.read_csv(local_path)
        elif filename.endswith('.txt'):
            text = Path(local_path).read_text()
        elif filename.endswith('.json'):
            import json
            data = json.loads(Path(local_path).read_text())

Documentation

Installation

Option 1: Pixi (Recommended for Development)

# Clone and install
git clone git@github.com:ericmjl/kirin
cd kirin
pixi install

# Set up SSL certificates for cloud storage (one-time setup)
pixi run setup-ssl

# Start the web UI
pixi run python -m kirin.web.app

Option 2: UV Tool (Recommended for Production)

# Install with uv
uv tool install kirin

# Set up SSL certificates (one-time setup)
uv run python -m kirin.setup_ssl

# Start the web UI
uv run kirin ui

Option 3: UVX (One-time Use)

# Run directly with uvx
uvx kirin ui

# If SSL issues occur, set up certificates
uvx python -m kirin.setup_ssl

Option 4: System Python

# Install with pip
pip install kirin

# No SSL setup needed - uses system certificates
kirin ui

Get started for development

To get started:

git clone git@github.com:ericmjl/kirin
cd kirin
pixi install

Development Commands

Once installed, you can use these common development commands:

# Run tests
pixi run -e tests pytest

# Run tests for a specific file
pixi run -e tests pytest tests/test_filename.py

# Run all tests with verbose output
pixi run -e tests pytest -v

# Run tests without coverage
pixi run -e tests pytest --no-cov

# Set up SSL certificates (if needed)
pixi run setup-ssl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kirin-0.0.13.tar.gz (206.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kirin-0.0.13-py3-none-any.whl (264.8 kB view details)

Uploaded Python 3

File details

Details for the file kirin-0.0.13.tar.gz.

File metadata

  • Download URL: kirin-0.0.13.tar.gz
  • Upload date:
  • Size: 206.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kirin-0.0.13.tar.gz
Algorithm Hash digest
SHA256 38b73bc065b7d083f00a8c925b0945dccf807325164bd1aba2cb8006329315ed
MD5 22b7bc5cf68bf78647da92c9b20d8d0b
BLAKE2b-256 bbecc061dbf12b3c61372478e1927e9fec7812b910311599f6f93be39b4e53b8

See more details on using hashes here.

File details

Details for the file kirin-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: kirin-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 264.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kirin-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 aa6a0f4c5f582ae4973048d36f3b437fd2ea834d38d657402121b9ba369bc8b0
MD5 bb6c01c3512a65c5933fb87ffd16c05b
BLAKE2b-256 8e014ca7aa66f503f4dd6e777612aa4d4dccdc5b98ccb6540f2ddcebcba1e346

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page