Skip to main content

Registry functionality for Mindtrace

Project description

Registry Module

The Registry module provides a distributed, versioned object storage system with support for multiple backends. It enables storing, versioning, and retrieving objects with automatic serialization and lock-free concurrency for objects.

Features

  • Multi-Backend Support: Local filesystem, S3-compatible (MinIO, AWS S3) and Google Cloud Storage
  • Lock-Free Concurrency: UUID-based MVCC ensures safe concurrent reads and writes without distributed locks
  • Versioning: Automatic version management with semantic versioning support
  • Caching: Local cache for remote backends with configurable staleness checks
  • Materializers: Pluggable serialization system for different object types
  • Batch Operations: All backend operations support batch mode for efficient bulk access
  • Dict-Like Interface: registry["name"] = obj, obj = registry["name"], del registry["name"]

Quick Start

from mindtrace.registry import Registry

# Create a registry (uses local backend by default)
registry = Registry()

# Save objects
registry.save("my:model", trained_model)
registry.save("my:data", dataset, version="1.0.0")

# Load objects
model = registry.load("my:model")
data = registry.load("my:data", version="1.0.0")

# Dict-like access
registry["my:config"] = config_dict
config = registry["my:config"]

# Check existence
exists = registry.has_object("my:model", "1.0.0")  # -> bool

# Get metadata
info = registry.info("my:model", "1.0.0")  # -> dict

# List objects and versions
print(registry.list_objects())
print(registry.list_versions("my:model"))

Backend Configuration

Local Backend

The local backend stores objects on the filesystem and is the default option.

from mindtrace.registry import Registry, LocalRegistryBackend

# Default local registry
registry = Registry()

# Custom local registry
local_backend = LocalRegistryBackend(uri="/path/to/registry")
registry = Registry(backend=local_backend)

S3-Compatible Backend (MinIO, AWS S3)

The S3 backend provides distributed storage for any S3-compatible service.

from mindtrace.registry import Registry, MinioRegistryBackend

# MinIO / S3-compatible registry
s3_backend = S3RegistryBackend(
    endpoint="localhost:9000",
    access_key="minioadmin",
    secret_key="minioadmin",
    bucket="minio-registry",
    secure=False,
)
registry = Registry(backend=minio_backend)

GCP Backend

The GCP backend uses Google Cloud Storage for distributed object storage.

from mindtrace.registry import Registry, GCPRegistryBackend

gcp_backend = GCPRegistryBackend(
    uri="gs://my-registry-bucket",
    project_id="my-project",
    bucket_name="my-registry-bucket",
    credentials_path="/path/to/service-account.json",
)
registry = Registry(backend=gcp_backend)

Concurrency Model

Cloud backends (GCP, S3) use lock-free MVCC (Multi-Version Concurrency Control):

  • Each push writes artifacts to a unique UUID folder: objects/{name}/{version}/{uuid}/
  • Metadata write is the atomic "commit point" — it references the active UUID
  • For immutable registries: first-write-wins via conditional creation (generation_match=0 on GCS, IfNoneMatch='*' on S3)
  • For mutable registries: last metadata write wins; orphaned UUID folders are cleaned up by the janitor

Locks are only used for register_materializer, which performs a read-modify-write on registry metadata.

Caching

When using a remote backend, the Registry maintains a local cache (enabled by default):

# Caching is on by default for remote backends
registry = Registry(backend=gcp_backend, use_cache=True)

# Control verification level on load
obj = registry.load("my:model", verify="none")       # Trust cache, fastest
obj = registry.load("my:model", verify="integrity")   # Verify hash (default)
obj = registry.load("my:model", verify="full")         # Hash + staleness check

# Clear cache manually
registry.clear_cache()

Verification levels (VerifyLevel):

  • "none": Trust cache completely. Fastest.
  • "integrity": Verify loaded artifacts match the hash in metadata. Default.
  • "full": Integrity check + compare cache hash against remote. Detects stale cache entries.

Version Management

# Versioned registry (auto-increments versions)
registry = Registry(version_objects=True)
registry.save("model", v1)                    # version = "1"
registry.save("model", v2)                    # version = "2"
registry.save("model", v3, version="2.1")     # version = "2.1"

# Load specific or latest version
model = registry.load("model", version="2.1")
latest = registry.load("model", version="latest")

# Unversioned registry (single version per name, default)
registry = Registry(version_objects=False)

Conflict Handling

Control behavior when saving to an existing version (OnConflict):

# Skip (default): raises RegistryVersionConflict for single ops
registry.save("model", obj, version="1.0.0", on_conflict="skip")

# Overwrite: replaces existing version (requires mutable=True)
registry = Registry(mutable=True)
registry.save("model", obj, version="1.0.0", on_conflict="overwrite")

Custom Materializers

Register custom serialization handlers for your object types:

from mindtrace.registry import Registry

registry = Registry()

# Register a materializer for a custom class
registry.register_materializer("my_module.MyClass", "my_module.MyMaterializer")

# Save with explicit materializer
registry.save("custom:obj", my_object, materializer=MyMaterializer)

Metadata and Information

# Get info for a specific object version
info = registry.info("my:model", "1.0.0")

# Get info for all versions of an object
info = registry.info("my:model")

# Get info for all objects
info = registry.info()

# Check existence
exists = registry.has_object("my:model", "1.0.0")  # -> bool

Error Handling

from mindtrace.registry.core.exceptions import (
    RegistryObjectNotFound,
    RegistryVersionConflict,
)

try:
    model = registry.load("nonexistent:model")
except RegistryObjectNotFound as e:
    print(f"Object not found: {e}")

try:
    registry.save("model", obj, version="1.0.0")  # already exists
except RegistryVersionConflict as e:
    print(f"Version conflict: {e}")

Batch Operations

The Registry facade provides clean single-object methods. For batch operations, pass lists:

# Batch save
result = registry.save(
    ["model:a", "model:b"],
    [obj_a, obj_b],
    version=["1.0.0", "1.0.0"],
)
# result is a BatchResult with .results, .errors, .succeeded, .failed

# Batch load
result = registry.load(["model:a", "model:b"], version=["1.0.0", "1.0.0"])

Backend Comparison

Feature Local S3 / MinIO GCP
Storage Filesystem S3-compatible Google Cloud Storage
Concurrency File locks Lock-free MVCC Lock-free MVCC
Caching N/A Local cache Local cache
Batch Ops Sequential Parallel (ThreadPoolExecutor) Parallel (ThreadPoolExecutor)

Troubleshooting

Common Issues

  1. Permission Errors: Verify credentials and bucket access
  2. Network Issues: Check connectivity to remote backends

Debug Logging

import logging
logging.basicConfig(level=logging.DEBUG)

registry = Registry()
# Operations will now show detailed logs

Store (Multi-Registry Facade)

The Store class composes multiple Registry instances behind a single API. Where a Registry maps to exactly one backend, a Store lets you read and write across multiple physical stores with deterministic routing.

Mounts

A Store organises registries as named mounts. Every Store always has a temp mount (backed by a fresh temporary directory) and a configurable default_mount that controls where unqualified writes go.

from mindtrace.registry import Registry, Store

# A bare Store — just the temp mount
store = Store()

# Add named mounts
store.add_mount("models", Registry(backend=gcp_backend))
store.add_mount("datasets", Registry(backend=s3_backend), read_only=True)

# Change the default write target
store.set_default_mount("models")

Key Format

Keys can be qualified (routed to a specific mount) or unqualified (routed automatically):

# Qualified — targets the "models" mount explicitly
store.save("models/my_model", obj)
model = store.load("models/my_model@1.0.0")

# Unqualified — writes go to default_mount, reads discover across all mounts
store.save("my_model", obj)          # -> saves to default_mount
model = store.load("my_model")       # -> searches all mounts

Read and Write Routing

  • Writes: Qualified writes target the specified mount. Unqualified writes go to default_mount.
  • Reads: Qualified reads target the specified mount. Unqualified reads discover across all mounts — if the object exists in exactly one mount it loads; if found in multiple mounts a StoreAmbiguousObjectError is raised.

Default Mount Behaviour

  • default_mount always points to a configured mount (initially temp).
  • Removing the current default mount resets it back to temp.
  • The temp mount cannot be removed.

Store Errors

In addition to the standard Registry exceptions, Store introduces:

  • StoreLocationNotFound — unknown mount
  • StoreKeyFormatError — invalid key format
  • StoreAmbiguousObjectError — unqualified load matched multiple mounts
  • PermissionError — write to a read-only mount

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mindtrace_registry-0.10.0.tar.gz (74.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mindtrace_registry-0.10.0-py3-none-any.whl (79.8 kB view details)

Uploaded Python 3

File details

Details for the file mindtrace_registry-0.10.0.tar.gz.

File metadata

  • Download URL: mindtrace_registry-0.10.0.tar.gz
  • Upload date:
  • Size: 74.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mindtrace_registry-0.10.0.tar.gz
Algorithm Hash digest
SHA256 c909ecf97f8a6460bcfdea72b43a93c9a42f22794b0b6abd1491e0d3e10f794c
MD5 c75c0eaebe0dcea9454e0cc5ce45f005
BLAKE2b-256 5bb625ce10b4965ebd1ceb59e082621a0e3c97688775e5a4fedbea5ec1866fe0

See more details on using hashes here.

File details

Details for the file mindtrace_registry-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: mindtrace_registry-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 79.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mindtrace_registry-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e318f65c4c7bd90707121ee2182ee2ec0e98ec17544b4c6ca4e04d041399ec9
MD5 948569de940523d826dc3b021b2847a9
BLAKE2b-256 3ecd1e402ddba0fa93adbd36cd77a05cd458239d0f58cf842d26937424cdcb9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page