The Modular Autonomous Discovery for Science (MADSci) Data Manager.
Project description
MADSci Data Manager
Handles capturing, storing, and querying data generated during experiments - both JSON values and files.
Features
- DataPoint storage: JSON values and files with metadata
- Flexible storage: Local filesystem or S3-compatible object storage (MinIO, AWS S3, GCS)
- Rich metadata: Ownership info, timestamps, custom labels
- Queryable: Search by value and metadata
- Cloud integration: Multi-provider cloud storage support
Installation
See the main README for installation options. This package is available as:
- PyPI:
pip install madsci.data_manager - Docker: Included in
ghcr.io/ad-sdl/madsci - Example configuration: See example_lab/managers/example_data.manager.yaml
Dependencies: MongoDB database, optional MinIO/S3 storage (see example_lab)
Usage
Quick Start
Use the example_lab as a starting point:
# Start with working example
docker compose up # From repo root
# Data Manager available at http://localhost:8004/docs
# Or run standalone
python src/madsci_data_manager/madsci/data_manager/data_server.py
Manager Setup
For custom deployments, see example_data.manager.yaml for configuration options.
Data Client
Use DataClient to store and retrieve experimental data:
from madsci.client.data_client import DataClient
from madsci.common.types.datapoint_types import DataPoint, DataPointTypeEnum
from datetime import datetime
client = DataClient(data_server_url="http://localhost:8004")
# Store JSON data
value_dp = DataPoint(
label="Temperature Reading",
data_type=DataPointTypeEnum.JSON,
value={"temperature": 23.5, "unit": "Celsius"}
)
submitted = client.submit_datapoint(value_dp)
# Store files
file_dp = DataPoint(
label="Experiment Log",
data_type=DataPointTypeEnum.FILE,
path="/path/to/data.txt"
)
submitted_file = client.submit_datapoint(file_dp)
# Retrieve data
retrieved = client.get_datapoint(submitted.datapoint_id)
# Save file locally
client.save_datapoint_value(submitted_file.datapoint_id, "/local/save/path.txt")
Examples: See example_lab/notebooks/experiment_notebook.ipynb for data management workflows.
Storage Configuration
Local Storage (Default)
- Files stored on filesystem with date-based hierarchy
- Simple setup, no additional dependencies
- File paths stored in MongoDB database
Object Storage (S3-Compatible)
Supports cloud and self-hosted storage providers:
- AWS S3
- Google Cloud Storage (with HMAC keys)
- MinIO (self-hosted or cloud)
- Any S3-compatible service
Benefits:
- Automatic upload with fallback to local storage
- Better for large files and distributed setups
- Built-in metadata and versioning support
Quick Setup
# Use example_lab with pre-configured MinIO
docker compose up # From repo root
# MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
Configuration Examples
AWS S3:
from madsci.common.types.datapoint_types import ObjectStorageSettings
aws_config = ObjectStorageSettings(
endpoint="s3.amazonaws.com",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY",
secure=True,
default_bucket="my-bucket",
region="us-east-1"
)
client = DataClient(object_storage_settings=aws_config)
Google Cloud Storage:
gcs_config = ObjectStorageSettings(
endpoint="storage.googleapis.com",
access_key="YOUR_HMAC_ACCESS_KEY",
secret_key="YOUR_HMAC_SECRET",
secure=True,
default_bucket="my-gcs-bucket"
)
Direct Object Storage DataPoints
from madsci.common.types.datapoint_types import DataPoint, DataPointTypeEnum
storage_dp = DataPoint(
label="Large Dataset",
data_type=DataPointTypeEnum.OBJECT_STORAGE,
path="/path/to/data.parquet",
bucket_name="my-bucket",
object_name="datasets/data.parquet",
custom_metadata={"version": "v2.1"}
)
uploaded = client.submit_datapoint(storage_dp)
Authentication: Use IAM users/service accounts with appropriate storage permissions. See cloud provider documentation for detailed setup.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file madsci_data_manager-0.5.2.tar.gz.
File metadata
- Download URL: madsci_data_manager-0.5.2.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.26.1 CPython/3.9.24 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4c261c609bc7e300fc394be2a8fbeb172633c9063c5413fd0ceda2b918020f6
|
|
| MD5 |
8db1aeb1a130b0c280ac8c9e13ec06b8
|
|
| BLAKE2b-256 |
5c8916a32afb85e09939067394f059fd3c7b10dfcc1b785c21fb9354d4d70d85
|
File details
Details for the file madsci_data_manager-0.5.2-py3-none-any.whl.
File metadata
- Download URL: madsci_data_manager-0.5.2-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.26.1 CPython/3.9.24 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db3d7e844381cd2f0c78f828e66fb6dc9e76425cf148310a56ba6d516603af95
|
|
| MD5 |
2ab744a96d7a84613330bb80d8bf70f1
|
|
| BLAKE2b-256 |
015b0e3e765e38fd8f89271a0f193a4698c59869a3ee5e42df20298e196ecad1
|