Skip to main content

A Python feature store library for offline/online feature storage, registry, validation, and serving

Project description

KiteFS

KiteFS is a Python feature store library for machine learning. It manages the full lifecycle of ML features — defining feature groups as Python code, registering them in a versioned registry, storing historical data in Parquet files, and serving the latest values for real-time predictions.

KiteFS is library-first: no running server, no Docker, no infrastructure to manage. Install it with pip, define your features, and start building.

  • SDK: from kitefs import FeatureStore
  • CLI: kitefs init, kitefs apply, kitefs list, kitefs describe, and more
  • Python 3.12+ required

Installation

KiteFS is currently in alpha. To install the latest pre-release version:

pip install --pre kitefs

Usage

Quick Start (CLI)

# 1. Initialize a new KiteFS project
kitefs init

# 2. Edit feature_store/definitions/example_features.py
#    to define your feature groups as Python code

# 3. Register definitions into the registry
kitefs apply

# 4. List registered feature groups
kitefs list

# 5. Inspect a specific feature group
kitefs describe <feature_group_name>

SDK

from kitefs import FeatureStore

# Initialize — finds kitefs.yaml by walking up from cwd
fs = FeatureStore()

# Register all feature group definitions
result = fs.apply()
print(f"Registered {result.group_count} group(s)")

# List all registered feature groups
groups = fs.list_feature_groups()

# Describe a specific feature group
details = fs.describe_feature_group("listing_features")

# Export as JSON
json_output = fs.list_feature_groups(format="json")

# Write JSON to a file
fs.describe_feature_group("listing_features", target="output.json")

Available Features

  • Project scaffolding (kitefs init) — initialize a new KiteFS project with config, directory structure, and example definitions
  • Feature group definitions — define feature groups as Python code using frozen dataclasses (FeatureGroup, Feature, EntityKey, EventTimestamp, Expect, etc.)
  • Registry (kitefs apply, kitefs list, kitefs describe) — register, list, and inspect feature groups via a deterministic JSON registry suitable for Git versioning
  • Local provider — local filesystem storage with Hive-style Parquet partitioning for the offline store
  • Configuration — YAML-based project configuration (kitefs.yaml) with environment variable overrides

In Development

The following features are planned but not yet available:

  • Data validation — schema and data validation engine with configurable strictness modes (ERROR, FILTER, NONE) to enforce quality constraints on ingested data
  • Data ingestion (kitefs ingest) — ingest DataFrames, CSV, or Parquet files into the offline store with automatic partitioning and validation
  • Historical retrieval — query the offline store with column selection and time-range filtering for training dataset generation
  • Point-in-time joins — temporally correct joins across feature groups to prevent data leakage in training sets
  • Materialization (kitefs materialize) — sync the latest feature values from the offline store to a SQLite-backed online store
  • Online serving — single-key lookups for real-time feature serving from the online store
  • AWS provider — S3 + DynamoDB backend for production deployments, installable via pip install kitefs[aws]
  • Registry sync — push and pull the registry between local and remote storage
  • Mock data generation (kitefs mock) — generate synthetic test data that respects schema and expectations
  • Smart sampling (kitefs sample) — pull a representative data subset from a remote store for local development

Contributing

Prerequisites

  • Python 3.12+
  • uv — Python package manager
  • just — command runner

Setup

git clone https://github.com/fedaipaca/kitefs.git
cd kitefs
uv sync

Run checks

# Lint + format check + type check + tests
just check

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kitefs-0.2.0a2.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kitefs-0.2.0a2-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file kitefs-0.2.0a2.tar.gz.

File metadata

  • Download URL: kitefs-0.2.0a2.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kitefs-0.2.0a2.tar.gz
Algorithm Hash digest
SHA256 62c7b8fe4b2cfb3853a3983490726412ae9c148eab02b86ba6cc8d7abe49a5a2
MD5 c8f1310cfb0f1d5dee5038afdf0bab7a
BLAKE2b-256 b44a46e29c36e500229be7387cd91f2a4eb611f9691ef5bf2b8b4b2ca227e7ae

See more details on using hashes here.

File details

Details for the file kitefs-0.2.0a2-py3-none-any.whl.

File metadata

  • Download URL: kitefs-0.2.0a2-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kitefs-0.2.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 6515c7fd31fb3a5010aa845e6982ceb1afff49fc126cdfe56e52863ff71259a8
MD5 1f8563b1bf9002bf96baf1b53f3f90ac
BLAKE2b-256 6a2450bc9be18a8c5b00ca325906a136c5d37ee6e4891923c4a04a00acc36f43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page