Keep It Simple Stupid Tools for Machine Learning

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

lou-k

These details have not been verified by PyPI

Project description

kissml

Keep It Simple Stupid Tools for Machine Learning

A Python library providing simple, powerful tools for ML workflows with minimal boilerplate.

I made this because:

Most data science services are notebook based, but notebooks are difficult to debug
Most frameworks (flyte, metaflow) focus on extending to the cloud. This is great, but for local iteration all we really need is reproducible pipeline steps.

Installation

pip install kissml

Steps

The @step decorator provides:

execution tracking
persistent disk-based caching for your functions
post-run execution (i.e., after effects) for the return value -- useful to visualize data or log stats.

Basic Usage

from kissml import step, CacheConfig
import logging

# Simple execution time logging
@step(log_level=logging.INFO)
def process_data(data):
    # Your processing logic here
    return result

# With persistent caching
@step(
    log_level=logging.INFO,
    cache=CacheConfig(version=1)
)
def expensive_computation(data):
    # This will only run once per unique input
    # Subsequent calls return cached results
    return result

Key Features

Execution Time Tracking: Log how long your functions take to run

@step(log_level=logging.INFO)
def train_model(X, y):
    # Logs: "train_model completed in 45.2341 seconds"
    return model

Persistent Disk Caching: Cache results to disk and reuse them across runs

@step(cache=CacheConfig(version=1))
def load_and_preprocess(filepath):
    # Expensive preprocessing runs once
    # Subsequent calls load from cache in milliseconds
    return processed_data

Version-Based Invalidation: Bump the version to invalidate old cache

# Old implementation
@step(cache=CacheConfig(version=1))
def feature_engineering(df):
    return old_features(df)

# Updated implementation - cache automatically invalidated
@step(cache=CacheConfig(version=2))
def feature_engineering(df):
    return new_improved_features(df)

Smart Serialization: Efficient storage for pandas DataFrames and nested collections

import pandas as pd

@step(cache=CacheConfig(version=1))
def analyze_data(df: pd.DataFrame) -> pd.DataFrame:
    # DataFrames cached as Parquet files (requires pyarrow)
    # Much more efficient than pickle
    return processed_df

@step(cache=CacheConfig(version=1))
def complex_pipeline(data) -> dict:
    # Returns dict with DataFrames, lists, etc.
    # Each type uses optimal serialization
    return {
        "results": some_dataframe,
        "metrics": [metric1, metric2],
        "metadata": {"key": "value"}
    }

Cache Configuration

Control cache behavior with CacheConfig:

from kissml import step, CacheConfig, EvictionPolicy

# No eviction (default) - cache grows forever
@step(cache=CacheConfig(version=1, eviction_policy=EvictionPolicy.NONE))
def permanent_cache(x):
    return x

# Least Recently Used - evicts oldest accessed items
@step(cache=CacheConfig(version=1, eviction_policy=EvictionPolicy.LEAST_RECENTLY_USED))
def lru_cache(x):
    return x

# Least Recently Stored - evicts oldest stored items
@step(cache=CacheConfig(version=1, eviction_policy=EvictionPolicy.LEAST_RECENTLY_STORED))
def lrs_cache(x):
    return x

# Least Frequently Used - evicts least accessed items
@step(cache=CacheConfig(version=1, eviction_policy=EvictionPolicy.LEAST_FREQUENTLY_USED))
def lfu_cache(x):
    return x

AfterEffects

AfterEffects allow you to automatically execute side effects (like visualization, logging, or validation) after a step completes, whether the result was cached or freshly computed.

from typing import Annotated
from kissml import step, AfterEffect, CacheConfig
import mlflow

# Define a custom AfterEffect
class HTMLVisualizer(AfterEffect):
    def __init__(self, max_rows=100):
        self.max_rows = max_rows
    
    def __call__(self, result, was_cached, func_name, execution_time):
        # Create HTML preview
        html = result.head(self.max_rows).to_html()
        html = f"<h3>{func_name} - {execution_time:.2f}s {'(cached)' if was_cached else ''}</h3>" + html
        
        # Log to MLflow
        with open(f"{func_name}.html", "w") as f:
            f.write(html)
        mlflow.log_artifact(f"{func_name}.html")

# Use it with type annotations
@step(cache=CacheConfig(version=1))
def load_data() -> Annotated[pd.DataFrame, HTMLVisualizer(max_rows=200)]:
    return pd.read_csv("data.csv")

# Multiple effects run left-to-right
class DatasetLogger(AfterEffect):
    def __call__(self, result, was_cached, func_name, execution_time):
        if not was_cached:  # Only log once
            mlflow.log_metric(f"{func_name}_rows", len(result))

@step(cache=CacheConfig(version=1))
def process() -> Annotated[pd.DataFrame, DatasetLogger(), HTMLVisualizer()]:
    # Both effects run automatically after the function completes
    return load_data()

Error Handling: Control whether AfterEffect failures stop execution:

# Default: errors are logged but don't stop execution
@step(cache=CacheConfig(version=1))
def safe_pipeline() -> Annotated[pd.DataFrame, MyVisualizer()]:
    return data

# Strict mode: effect errors raise exceptions
@step(cache=CacheConfig(version=1), error_on_affect_failure=True)
def strict_pipeline() -> Annotated[pd.DataFrame, MyVisualizer()]:
    return data

Global AfterEffects: Register an AfterEffect once and have it fire after every @step call — no per-step annotation required. Useful for cross-cutting concerns like logging, persistence, or experiment tracking.

import logging
from kissml import settings, step, AfterEffect

class StepTimingLogger(AfterEffect):
    """Log every step's name, runtime, and cache status."""

    def __call__(self, result, was_cached, func_name, execution_time):
        status = "cached" if was_cached else "fresh"
        logging.info(
            f"{func_name} finished in {execution_time:.3f}s ({status})"
        )

# Register once — fires for every @step call from now on
settings.global_after_effects.append(StepTimingLogger())

@step()
def load_data() -> pd.DataFrame:
    return pd.read_csv("data.csv")  # StepTimingLogger runs after this returns

@step()
def transform(df: pd.DataFrame) -> pd.DataFrame:
    return df.dropna()              # StepTimingLogger runs here too

Per-step effects (declared in the return annotation) fire first, then global effects. Both honor the error_on_effect_failure flag on the step.

Configuration

Configure the cache directory via environment variable or settings:

from kissml import settings
from pathlib import Path

# Set cache directory
settings.cache_directory = Path("/path/to/cache")

# Or use environment variable
# export KISSML_CACHE_DIRECTORY=/path/to/cache

Custom Serialization

from kissml.settings import settings
from kissml.types import Serializer
from typing import Any, BinaryIO

class MyCustomSerializer(Serializer):
    def serialize(self, value: Any, out: BinaryIO) -> None:
        # Your serialization logic
        pass

    def deserialize(self, input: BinaryIO) -> Any:
        # Your deserialization logic
        pass

# Register the serializer
settings.serialize_by_type[MyCustomType] = MyCustomSerializer()

# Register a hash function for cache keys
settings.hash_by_type[MyCustomType] = lambda obj: str(hash(obj))

License

Licensed under CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives). This is a non-commercial license - see the LICENSE file for full details.

For commercial use, please contact the author.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

lou-k

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.11

Apr 30, 2026

0.4.10

Mar 19, 2026

0.4.9

Mar 19, 2026

0.4.8

Mar 19, 2026

0.4.2

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kissml-0.4.11.tar.gz (77.2 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kissml-0.4.11-py3-none-any.whl (20.4 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file kissml-0.4.11.tar.gz.

File metadata

Download URL: kissml-0.4.11.tar.gz
Upload date: Apr 30, 2026
Size: 77.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kissml-0.4.11.tar.gz
Algorithm	Hash digest
SHA256	`d79bf4330104ae40d6526d6218bd1f3c5ecee76ef12277bf654e5d0134b0dbee`
MD5	`45f3a5b58564ad0300689fa5ef14e685`
BLAKE2b-256	`bbffc138338c86977ad938364115bde378a4c6abe051692316d2626fb80e79fc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kissml-0.4.11.tar.gz:

Publisher: publish.yml on lou-k/kissml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kissml-0.4.11.tar.gz
- Subject digest: d79bf4330104ae40d6526d6218bd1f3c5ecee76ef12277bf654e5d0134b0dbee
- Sigstore transparency entry: 1409709500
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: lou-k/kissml@0b9d74b25789aa944f67015af9af3827e7ee5392
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lou-k
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0b9d74b25789aa944f67015af9af3827e7ee5392
- Trigger Event: workflow_run

File details

Details for the file kissml-0.4.11-py3-none-any.whl.

File metadata

Download URL: kissml-0.4.11-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kissml-0.4.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f065dba3e42f100c785d7c69c89ff2245e76a253fa06617ebb177858da634e2`
MD5	`f6ca96ef4cd006b10b24938f354ef590`
BLAKE2b-256	`8e7418c8d81f369cfea01e38bef87db586ecbb2dd8158aaddd6c7cc4397b7802`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kissml-0.4.11-py3-none-any.whl:

Publisher: publish.yml on lou-k/kissml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kissml-0.4.11-py3-none-any.whl
- Subject digest: 1f065dba3e42f100c785d7c69c89ff2245e76a253fa06617ebb177858da634e2
- Sigstore transparency entry: 1409709517
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: lou-k/kissml@0b9d74b25789aa944f67015af9af3827e7ee5392
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lou-k
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0b9d74b25789aa944f67015af9af3827e7ee5392
- Trigger Event: workflow_run

kissml 0.4.11

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

kissml

Installation

Steps

Basic Usage

Key Features

Cache Configuration

AfterEffects

Configuration

Custom Serialization

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance