Registry functionality for Mindtrace
Project description
Registry Module
The Registry module provides a distributed, versioned object storage system with support for multiple backends. It enables storing, versioning, and retrieving objects with automatic serialization and distributed concurrency control.
Features
- Multi-Backend Support: Local filesystem, MinIO (S3-compatible), and Google Cloud Storage
- Distributed Concurrency: Atomic operations with distributed locking
- Versioning: Automatic version management with semantic versioning support
- Materializers: Pluggable serialization system for different object types
- Thread-Safe: Built-in thread safety for concurrent access
- Metadata: Rich metadata storage and retrieval
Quick Start
from mindtrace.registry import Registry
# Create a registry (uses local backend by default)
registry = Registry()
# Save objects
registry.save("my:model", trained_model)
registry.save("my:data", dataset, version="1.0.0")
# Load objects
model = registry.load("my:model")
data = registry.load("my:data", version="1.0.0")
# List objects and versions
print(registry.list_objects())
print(registry.list_versions("my:model"))
Backend Configuration
Local Backend
The local backend stores objects on the filesystem and is the default option.
from mindtrace.registry import Registry, LocalRegistryBackend
# Default local registry
registry = Registry()
# Custom local registry
local_backend = LocalRegistryBackend(uri="/path/to/registry")
registry = Registry(backend=local_backend)
Features:
- File-based storage with atomic operations
- Cross-platform file locking (Windows/Unix)
- Automatic directory cleanup
- Local metadata storage
MinIO Backend
The MinIO backend provides S3-compatible distributed storage.
from mindtrace.registry import Registry, MinioRegistryBackend
# MinIO registry
minio_backend = MinioRegistryBackend(
uri="gs://my-registry",
endpoint="localhost:9000",
access_key="minioadmin",
secret_key="minioadmin",
bucket="minio-registry",
secure=False
)
registry = Registry(backend=minio_backend)
Features:
- S3-compatible distributed storage
- Atomic operations using S3 object creation
- Distributed locking with S3 objects
- Metadata stored as JSON objects
GCP Backend
The GCP backend uses Google Cloud Storage for distributed object storage.
from mindtrace.registry import Registry, GCPRegistryBackend
# GCP registry
gcp_backend = GCPRegistryBackend(
uri="gs://my-registry-bucket",
project_id="my-project",
bucket_name="my-registry-bucket",
credentials_path="/path/to/service-account.json"
)
registry = Registry(backend=gcp_backend)
Features:
- Google Cloud Storage integration
- Distributed storage with global availability
- Atomic operations using GCS object generation numbers
- Automatic bucket creation and management
Advanced Usage
Custom Materializers
Register custom serialization handlers for your object types:
from mindtrace.registry import Registry
registry = Registry()
# Register a materializer for a custom class
registry.register_materializer("my_module.MyClass", "my_module.MyMaterializer")
# Save with custom materializer
registry.save("custom:obj", my_object, materializer=MyMaterializer)
Version Management
Control versioning behavior:
# Disable versioning (overwrites existing objects)
registry = Registry(version_objects=False)
# Save with specific version
registry.save("model", trained_model, version="2.1.0")
# Load specific version
model = registry.load("model", version="2.1.0")
# Load latest version
model = registry.load("model", version="latest")
Metadata and Information
# Get object information
info = registry.info("my:model")
print(f"Class: {info['class']}")
print(f"Materializer: {info['materializer']}")
print(f"Path: {info['path']}")
# List all objects
objects = registry.list_objects()
print(f"Objects: {objects}")
# List versions for an object
versions = registry.list_versions("my:model")
print(f"Versions: {versions}")
# Check if object exists
exists = registry.has_object("my:model", "1.0.0")
Distributed Operations
The registry handles distributed concurrency automatically:
# These operations are automatically protected by distributed locks
registry.save("shared:resource", data) # Exclusive lock
data = registry.load("shared:resource") # Shared lock
Backend Comparison
| Feature | Local | MinIO | GCP |
|---|---|---|---|
| Storage | Filesystem | S3-compatible | Google Cloud Storage |
| Distributed | ✅ | ✅ | ✅ |
| Locking | File locks | S3 objects | GCS generation numbers |
Error Handling
The registry provides comprehensive error handling:
try:
model = registry.load("nonexistent:model")
except ValueError as e:
print(f"Object not found: {e}")
try:
registry.save("invalid_name", data)
except ValueError as e:
print(f"Invalid name: {e}")
Performance Considerations
- Local Backend: Fastest for single-machine use
- MinIO Backend: Good for distributed teams, moderate latency
- GCP Backend: Best for global distribution, higher latency but better availability
Security
- Local: File system permissions
- MinIO: Access keys and bucket policies
- GCP: Service account authentication and IAM
Troubleshooting
Common Issues
- Lock Acquisition Errors: Increase timeout or check for stuck locks
- Permission Errors: Verify credentials and bucket access
- Network Issues: Check connectivity to remote backends
Debug Logging
Enable debug logging to troubleshoot issues:
import logging
logging.basicConfig(level=logging.DEBUG)
registry = Registry()
# Operations will now show detailed logs
Examples
Machine Learning Pipeline
from mindtrace.registry import Registry
registry = Registry()
# Save training data
registry.save("data:training", X_train, version="1.0.0")
registry.save("data:testing", X_test, version="1.0.0")
# Save trained model
registry.save("model:classifier", trained_model, version="1.0.0")
# Save preprocessing pipeline
registry.save("pipeline:preprocessing", preprocessing_pipeline, version="1.0.0")
# Load for inference
model = registry.load("model:classifier")
pipeline = registry.load("pipeline:preprocessing")
Data Versioning
# Save different versions of data
registry.save("data:raw", raw_data, version="1.0.0")
registry.save("data:processed", processed_data, version="1.1.0")
registry.save("data:cleaned", cleaned_data, version="1.2.0")
# Compare versions
for version in registry.list_versions("data:raw"):
data = registry.load("data:raw", version=version)
print(f"Version {version}: {len(data)} records")
This registry system provides a robust foundation for object storage and versioning across different deployment scenarios.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mindtrace_registry-0.7.0.tar.gz.
File metadata
- Download URL: mindtrace_registry-0.7.0.tar.gz
- Upload date:
- Size: 47.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00e996904b959ea7f31e2ed35470eb39d6a8991465874097bdb90b9eb9ab6b95
|
|
| MD5 |
a52dd9a5462e3b4ec6581f964aa64a6a
|
|
| BLAKE2b-256 |
d28f7c5438fd4097e447f796b846c408b4a3ee9bd1c7b0f4ecb249a983be556b
|
File details
Details for the file mindtrace_registry-0.7.0-py3-none-any.whl.
File metadata
- Download URL: mindtrace_registry-0.7.0-py3-none-any.whl
- Upload date:
- Size: 53.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e43a402e7ef8d5a51afd2b77e0778bdee8e1d7760eb8977624993b57ff25129
|
|
| MD5 |
a80faab085a604c0d4e36a2a2ae2ac5f
|
|
| BLAKE2b-256 |
892d8807e0dc4a0a967c38c905d671ec03761003299521cbfc9cd8d498434fe2
|