Skip to main content

Registry functionality for Mindtrace

Project description

Registry Module

The Registry module provides a distributed, versioned object storage system with support for multiple backends. It enables storing, versioning, and retrieving objects with automatic serialization and lock-free concurrency for objects.

Features

  • Multi-Backend Support: Local filesystem, S3-compatible (MinIO, AWS S3) and Google Cloud Storage
  • Lock-Free Concurrency: UUID-based MVCC ensures safe concurrent reads and writes without distributed locks
  • Versioning: Automatic version management with semantic versioning support
  • Caching: Local cache for remote backends with configurable staleness checks
  • Materializers: Pluggable serialization system for different object types
  • Batch Operations: All backend operations support batch mode for efficient bulk access
  • Dict-Like Interface: registry["name"] = obj, obj = registry["name"], del registry["name"]

Quick Start

from mindtrace.registry import Registry

# Create a registry (uses local backend by default)
registry = Registry()

# Save objects
registry.save("my:model", trained_model)
registry.save("my:data", dataset, version="1.0.0")

# Load objects
model = registry.load("my:model")
data = registry.load("my:data", version="1.0.0")

# Dict-like access
registry["my:config"] = config_dict
config = registry["my:config"]

# Check existence
exists = registry.has_object("my:model", "1.0.0")  # -> bool

# Get metadata
info = registry.info("my:model", "1.0.0")  # -> dict

# List objects and versions
print(registry.list_objects())
print(registry.list_versions("my:model"))

Backend Configuration

Local Backend

The local backend stores objects on the filesystem and is the default option.

from mindtrace.registry import Registry, LocalRegistryBackend

# Default local registry
registry = Registry()

# Custom local registry
local_backend = LocalRegistryBackend(uri="/path/to/registry")
registry = Registry(backend=local_backend)

S3-Compatible Backend (MinIO, AWS S3)

The S3 backend provides distributed storage for any S3-compatible service.

from mindtrace.registry import Registry, MinioRegistryBackend

# MinIO / S3-compatible registry
s3_backend = S3RegistryBackend(
    endpoint="localhost:9000",
    access_key="minioadmin",
    secret_key="minioadmin",
    bucket="minio-registry",
    secure=False,
)
registry = Registry(backend=minio_backend)

GCP Backend

The GCP backend uses Google Cloud Storage for distributed object storage.

from mindtrace.registry import Registry, GCPRegistryBackend

gcp_backend = GCPRegistryBackend(
    uri="gs://my-registry-bucket",
    project_id="my-project",
    bucket_name="my-registry-bucket",
    credentials_path="/path/to/service-account.json",
)
registry = Registry(backend=gcp_backend)

Concurrency Model

Cloud backends (GCP, S3) use lock-free MVCC (Multi-Version Concurrency Control):

  • Each push writes artifacts to a unique UUID folder: objects/{name}/{version}/{uuid}/
  • Metadata write is the atomic "commit point" — it references the active UUID
  • For immutable registries: first-write-wins via conditional creation (generation_match=0 on GCS, IfNoneMatch='*' on S3)
  • For mutable registries: last metadata write wins; orphaned UUID folders are cleaned up by the janitor

Locks are only used for register_materializer, which performs a read-modify-write on registry metadata.

Caching

When using a remote backend, the Registry maintains a local cache (enabled by default):

# Caching is on by default for remote backends
registry = Registry(backend=gcp_backend, use_cache=True)

# Control verification level on load
obj = registry.load("my:model", verify="none")       # Trust cache, fastest
obj = registry.load("my:model", verify="integrity")   # Verify hash (default)
obj = registry.load("my:model", verify="full")         # Hash + staleness check

# Clear cache manually
registry.clear_cache()

Verification levels (VerifyLevel):

  • "none": Trust cache completely. Fastest.
  • "integrity": Verify loaded artifacts match the hash in metadata. Default.
  • "full": Integrity check + compare cache hash against remote. Detects stale cache entries.

Version Management

# Versioned registry (auto-increments versions)
registry = Registry(version_objects=True)
registry.save("model", v1)                    # version = "1"
registry.save("model", v2)                    # version = "2"
registry.save("model", v3, version="2.1")     # version = "2.1"

# Load specific or latest version
model = registry.load("model", version="2.1")
latest = registry.load("model", version="latest")

# Unversioned registry (single version per name, default)
registry = Registry(version_objects=False)

Conflict Handling

Control behavior when saving to an existing version (OnConflict):

# Skip (default): raises RegistryVersionConflict for single ops
registry.save("model", obj, version="1.0.0", on_conflict="skip")

# Overwrite: replaces existing version (requires mutable=True)
registry = Registry(mutable=True)
registry.save("model", obj, version="1.0.0", on_conflict="overwrite")

Custom Materializers

Register custom serialization handlers for your object types:

from mindtrace.registry import Registry

registry = Registry()

# Register a materializer for a custom class
registry.register_materializer("my_module.MyClass", "my_module.MyMaterializer")

# Save with explicit materializer
registry.save("custom:obj", my_object, materializer=MyMaterializer)

Metadata and Information

# Get info for a specific object version
info = registry.info("my:model", "1.0.0")

# Get info for all versions of an object
info = registry.info("my:model")

# Get info for all objects
info = registry.info()

# Check existence
exists = registry.has_object("my:model", "1.0.0")  # -> bool

Error Handling

from mindtrace.registry.core.exceptions import (
    RegistryObjectNotFound,
    RegistryVersionConflict,
)

try:
    model = registry.load("nonexistent:model")
except RegistryObjectNotFound as e:
    print(f"Object not found: {e}")

try:
    registry.save("model", obj, version="1.0.0")  # already exists
except RegistryVersionConflict as e:
    print(f"Version conflict: {e}")

Batch Operations

The Registry facade provides clean single-object methods. For batch operations, pass lists:

# Batch save
result = registry.save(
    ["model:a", "model:b"],
    [obj_a, obj_b],
    version=["1.0.0", "1.0.0"],
)
# result is a BatchResult with .results, .errors, .succeeded, .failed

# Batch load
result = registry.load(["model:a", "model:b"], version=["1.0.0", "1.0.0"])

Backend Comparison

Feature Local S3 / MinIO GCP
Storage Filesystem S3-compatible Google Cloud Storage
Concurrency File locks Lock-free MVCC Lock-free MVCC
Caching N/A Local cache Local cache
Batch Ops Sequential Parallel (ThreadPoolExecutor) Parallel (ThreadPoolExecutor)

Troubleshooting

Common Issues

  1. Permission Errors: Verify credentials and bucket access
  2. Network Issues: Check connectivity to remote backends

Debug Logging

import logging
logging.basicConfig(level=logging.DEBUG)

registry = Registry()
# Operations will now show detailed logs

Store (Multi-Registry Facade)

The Store class composes multiple Registry instances behind a single API. Where a Registry maps to exactly one backend, a Store lets you read and write across multiple physical stores with deterministic routing.

Mounts

A Store organises registries as named mounts. Every Store always has a temp mount (backed by a fresh temporary directory) and a configurable default_mount that controls where unqualified writes go.

from mindtrace.registry import Registry, Store

# A bare Store — just the temp mount
store = Store()

# Add named mounts
store.add_mount("models", Registry(backend=gcp_backend))
store.add_mount("datasets", Registry(backend=s3_backend), read_only=True)

# Change the default write target
store.set_default_mount("models")

Key Format

Keys can be qualified (routed to a specific mount) or unqualified (routed automatically):

# Qualified — targets the "models" mount explicitly
store.save("models/my_model", obj)
model = store.load("models/my_model@1.0.0")

# Unqualified — writes go to default_mount, reads discover across all mounts
store.save("my_model", obj)          # -> saves to default_mount
model = store.load("my_model")       # -> searches all mounts

Read and Write Routing

  • Writes: Qualified writes target the specified mount. Unqualified writes go to default_mount.
  • Reads: Qualified reads target the specified mount. Unqualified reads discover across all mounts — if the object exists in exactly one mount it loads; if found in multiple mounts a StoreAmbiguousObjectError is raised.

Default Mount Behaviour

  • default_mount always points to a configured mount (initially temp).
  • Removing the current default mount resets it back to temp.
  • The temp mount cannot be removed.

Store Errors

In addition to the standard Registry exceptions, Store introduces:

  • StoreLocationNotFound — unknown mount
  • StoreKeyFormatError — invalid key format
  • StoreAmbiguousObjectError — unqualified load matched multiple mounts
  • PermissionError — write to a read-only mount

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mindtrace_registry-0.10.1.tar.gz (81.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mindtrace_registry-0.10.1-py3-none-any.whl (87.3 kB view details)

Uploaded Python 3

File details

Details for the file mindtrace_registry-0.10.1.tar.gz.

File metadata

  • Download URL: mindtrace_registry-0.10.1.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mindtrace_registry-0.10.1.tar.gz
Algorithm Hash digest
SHA256 58c6ae6814d508915ecdcfc0858593d8dff3e875310dfce1c0b6802f6e65383b
MD5 184f87fdc38786d2055934fc4820872f
BLAKE2b-256 acc2594d518ff6072ea95729af1f85e74496aa40ad08cc37c89df2d92e05178e

See more details on using hashes here.

File details

Details for the file mindtrace_registry-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: mindtrace_registry-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 87.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mindtrace_registry-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62092a2c082652c51c5ae3f20b6ad4508eaca6fb7613c0953efaaf85a766e4a0
MD5 df7fe427891d3797e1134bbfe0009b2c
BLAKE2b-256 94761277a5a9d34c0f2cf77af031baf54874227b7ebdc92f1a52b4bab3502214

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page