Skip to main content

The Modular Autonomous Discovery for Science (MADSci) Data Manager.

Project description

MADSci Data Manager

Handles capturing, storing, and querying data generated during experiments - both JSON values and files.

MADSci Data Manager Diagram

Features

  • DataPoint storage: JSON values and files with metadata
  • Flexible storage: Local filesystem or S3-compatible object storage (MinIO, AWS S3, GCS)
  • Rich metadata: Ownership info, timestamps, custom labels
  • Queryable: Search by value and metadata
  • Cloud integration: Multi-provider cloud storage support

Installation

See the main README for installation options. This package is available as:

Dependencies: MongoDB database, optional MinIO/S3 storage (see example_lab)

Usage

Quick Start

Use the example_lab as a starting point:

# Start with working example
docker compose up  # From repo root
# Data Manager available at http://localhost:8004/docs

# Or run standalone
python -m madsci.data_manager.data_server

Manager Setup

For custom deployments, see example_data.manager.yaml for configuration options.

Data Client

Use DataClient to store and retrieve experimental data:

from madsci.client.data_client import DataClient
from madsci.common.types.datapoint_types import ValueDataPoint, FileDataPoint
from datetime import datetime

client = DataClient(url="http://localhost:8004")

# Store JSON data
value_dp = ValueDataPoint(
    label="Temperature Reading",
    value={"temperature": 23.5, "unit": "Celsius"},
    data_timestamp=datetime.now()
)
submitted = client.submit_datapoint(value_dp)

# Store files
file_dp = FileDataPoint(
    label="Experiment Log",
    path="/path/to/data.txt",
    data_timestamp=datetime.now()
)
submitted_file = client.submit_datapoint(file_dp)

# Retrieve data
retrieved = client.get_datapoint(submitted.datapoint_id)

# Save file locally
client.save_datapoint_value(submitted_file.datapoint_id, "/local/save/path.txt")

Examples: See example_lab/notebooks/experiment_notebook.ipynb for data management workflows.

Storage Options

Local Storage (Default)

  • Files stored on filesystem
  • Simple setup, no additional dependencies
  • File paths stored in database

Object Storage (Optional)

Supports S3-compatible storage (MinIO, AWS S3, Google Cloud Storage):

  • Automatic upload to object storage
  • Fallback to local storage if upload fails
  • Better for large files and distributed setups

Object Storage Configuration

See example_data.manager.yaml for MinIO configuration.

Quick setup with example_lab:

docker compose up  # Includes pre-configured MinIO
# MinIO Console: http://localhost:9001 (minioadmin/minioadmin)

Cloud Storage Integration

Supports S3-compatible storage providers for large file handling:

Supported Providers

  • AWS S3
  • Google Cloud Storage (with HMAC keys)
  • MinIO (self-hosted or cloud)
  • Any S3-compatible service

Configuration Examples

AWS S3:

from madsci.common.types.datapoint_types import ObjectStorageSettings

aws_config = ObjectStorageSettings(
    endpoint="s3.amazonaws.com",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    secure=True,
    default_bucket="my-bucket",
    region="us-east-1"
)
client = DataClient(object_storage_settings=aws_config)

Google Cloud Storage:

gcs_config = ObjectStorageSettings(
    endpoint="storage.googleapis.com",
    access_key="YOUR_HMAC_ACCESS_KEY",
    secret_key="YOUR_HMAC_SECRET",
    secure=True,
    default_bucket="my-gcs-bucket"
)

Direct Object Storage DataPoints

from madsci.common.types.datapoint_types import ObjectStorageDataPoint

storage_dp = ObjectStorageDataPoint(
    label="Large Dataset",
    path="/path/to/data.parquet",
    bucket_name="my-bucket",
    object_name="datasets/data.parquet",
    custom_metadata={"version": "v2.1"}
)
uploaded = client.submit_datapoint(storage_dp)

Authentication: Use IAM users/service accounts with appropriate storage permissions. See cloud provider documentation for detailed setup.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

madsci_data_manager-0.4.6.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

madsci_data_manager-0.4.6-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file madsci_data_manager-0.4.6.tar.gz.

File metadata

  • Download URL: madsci_data_manager-0.4.6.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.25.6 CPython/3.9.23 Linux/6.11.0-1018-azure

File hashes

Hashes for madsci_data_manager-0.4.6.tar.gz
Algorithm Hash digest
SHA256 0bfec4099653996a3b0f1b9b1b91fc5096065bbcb9d78c24b171564c51379b46
MD5 25047d3e4d3831b78e0fa3f51fca9cce
BLAKE2b-256 43389b91f03306c40523db0facfcfdce0740df3ad269c3f5db0bcceb466bcc0a

See more details on using hashes here.

File details

Details for the file madsci_data_manager-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: madsci_data_manager-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.25.6 CPython/3.9.23 Linux/6.11.0-1018-azure

File hashes

Hashes for madsci_data_manager-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9260b10b1121afe96314c6b6bf3da453d2da7bc325f2223d6605a0773713fb91
MD5 eeeb6f0d3b3a764645d15932353a4445
BLAKE2b-256 3450938dd5c3d9ba1c50ec6d70a3d617339aefff590c7cf20f10d7b36d8f66a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page