The Modular Autonomous Discovery for Science (MADSci) Data Manager.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Project description

MADSci Data Manager

Handles capturing, storing, and querying data, in either JSON value or file form, created during the course of an experiment (either collected by instruments, or synthesized during anaylsis).

MADSci Data Manager Diagram

Notable Features

Collects and stores data generated in the course of an experiment as "datapoints"
Current datapoint types supported:
- Values, as JSON-serializable data
- Files, stored as-is
Datapoints include metadata such as ownership info and date-timestamps
Datapoints are queryable and searchable based on both value and metadata

Installation

The MADSci Data Manager is available via the Python Package Index, and can be installed via:

pip install madsci.data_manager

This python package is also included as part of the madsci Docker image. You can see an example docker image in this example compose file.

Note that you will also need a MongoDB database (included in the example compose file)

Usage

Manager

To create and run a new MADSci Data Manager, do the following in your MADSci lab directory:

If you're not using docker compose, provision and configure a MongoDB instance.
If you're using docker compose, define your data manager and mongodb services based on the example compose file.

# Create a Data Manager Definition
madsci manager add -t data_manager
# Start the database and Data Manager Server
docker compose up
# OR
python -m madsci.data_manager.data_server

You should see a REST server started on the configured host and port. Navigate in your browser to the URL you configured (default: http://localhost:8004/) to see if it's working.

You can see up-to-date documentation on the endpoints provided by your event manager, and try them out, via the swagger page served at http://your-data-manager-url-here/docs.

Client

You can use MADSci's DataClient (madsci.client.data_client.DataClient) in your python code to save, get, or query datapoints.

Here are some examples of using the DataClient to interact with the Data Manager:

from madsci.client.data_client import DataClient
from madsci.common.types.datapoint_types import ValueDataPoint, FileDataPoint
from datetime import datetime

# Initialize the DataClient
client = DataClient(url="http://localhost:8004")

# Create a ValueDataPoint
value_datapoint = ValueDataPoint(
    label="Temperature Reading",
    value={"temperature": 23.5, "unit": "Celsius"},
    data_timestamp=datetime.now()
)

# Submit the ValueDataPoint
submitted_value_datapoint = client.submit_datapoint(value_datapoint)
print(f"Submitted ValueDataPoint: {submitted_value_datapoint}")

# Retrieve the ValueDataPoint by ID
retrieved_value_datapoint = client.get_datapoint(submitted_value_datapoint.datapoint_id)
print(f"Retrieved ValueDataPoint: {retrieved_value_datapoint}")

# Create a FileDataPoint
file_datapoint = FileDataPoint(
    label="Experiment Log",
    path="/path/to/experiment_log.txt",
    data_timestamp=datetime.now()
)

# Submit the FileDataPoint
submitted_file_datapoint = client.submit_datapoint(file_datapoint)
print(f"Submitted FileDataPoint: {submitted_file_datapoint}")

# Retrieve the FileDataPoint by ID
retrieved_file_datapoint = client.get_datapoint(submitted_file_datapoint.datapoint_id)
print(f"Retrieved FileDataPoint: {retrieved_file_datapoint}")

# Save the file from the FileDataPoint to a local path
client.save_datapoint_value(submitted_file_datapoint.datapoint_id, "/local/path/to/save/experiment_log.txt")
print("File saved successfully.")

Object Storage Integration

The MADSci Data Manager supports optional MinIO object storage for efficient handling of large files. When configured, file datapoints are automatically stored in object storage instead of local filesystem storage. MinIO Documentation

How It Works

With Object Storage Configured:

File datapoints are uploaded to MinIO object storage during submission
Object storage metadata (bucket name, object name, public URL, etc.) is stored in the database
Datapoint type automatically changes from file to object_storage
Automatic fallback to local storage if object storage upload fails

Without Object Storage (Default Behavior):

File datapoints are stored locally on the filesystem
File paths are stored in the database
Existing behavior is preserved with no changes required

Configuration

Enable object storage by adding MinIO configuration to your Data Manager definition:

# example_data_manager.manager.yaml
name: example_data_manager
db_url: mongodb://localhost:27017
host: localhost
port: 8004
file_storage_path: ./data

# Add MinIO object storage configuration
minio_client_config:
  endpoint: "localhost:9000"
  access_key: "minioadmin"
  secret_key: "minioadmin"
  secure: false
  default_bucket: "madsci-data"

Docker Compose Setup

The /MADSci/compose.yaml includes a pre-configured MinIO service:

# Start all services including MinIO
docker compose up

# Access MinIO Console
open http://localhost:9001
# Login: minioadmin / minioadmin

MinIO will be available at:

API Endpoint: http://localhost:9000
Web Console: http://localhost:9001

Cloud Storage Integration

The MadSci Data Client supports multiple cloud storage providers through S3-compatible APIs. This allows you to store large files efficiently across different cloud platforms.

Supported Providers

Amazon Web Services (AWS) S3
Google Cloud Storage (GCS) - using S3-compatible HMAC authentication
MinIO (self-hosted or cloud)
Any S3-compatible storage service

Configuration

AWS S3

from madsci.common.types.datapoint_types import ObjectStorageDefinition
from madsci.client.data_client import DataClient

aws_config = ObjectStorageDefinition(
    endpoint="s3.amazonaws.com",
    access_key="AKIAIOSFODNN7EXAMPLE",  # Your AWS Access Key ID
    secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",  # Your AWS Secret Access Key
    secure=True,
    default_bucket="my-madsci-bucket",
    region="us-east-1"  # Specify your AWS region
)

client = DataClient(object_storage_config=aws_config)

Google Cloud Storage (GCS)

GCS requires HMAC keys for S3-compatible access:

gcs_config = ObjectStorageDefinition(
    endpoint="storage.googleapis.com",
    access_key="GOOGTS7C7FIS2E4U4RBGEXAMPLE",  # Your GCS HMAC Access Key
    secret_key="bGoa+V7g/yqDXvKRqq+JTFn4uQZbPiQJo8rkEXAMPLE",  # Your GCS HMAC Secret
    secure=True,
    default_bucket="my-gcs-bucket"
)

client = DataClient(object_storage_config=gcs_config)

Authentication Setup

AWS S3 Authentication

IAM User Method (Recommended):

# Create IAM user with S3 permissions
# Get Access Key ID and Secret Access Key from AWS Console

Environment Variables:

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

AWS CLI Profile:

aws configure --profile madsci
# Then reference the profile in your application

Google Cloud Storage Authentication

Generate HMAC Keys:

# In Google Cloud Console:
# Storage > Settings > Interoperability > Create Key

Service Account Method:

# Create service account with Storage Admin role
# Generate HMAC key for the service account

Usage Examples

from madsci.common.types.datapoint_types import ObjectStorageDataPoint

# Create object storage datapoint directly
storage_datapoint = ObjectStorageDataPoint(
    label="Preprocessed Data",
    path="/path/to/local-file.parquet",
    bucket_name="my-bucket",
    object_name="datasets/processed_data.parquet",
    storage_endpoint="s3.amazonaws.com",
    public_endpoint="s3.amazonaws.com",
    content_type="application/octet-stream",
    custom_metadata={
        "dataset_version": "v2.1",
        "processing_date": "2024-01-15"
    }
)

uploaded = client.submit_datapoint(storage_datapoint)

Regional Endpoints

AWS S3 Regional Endpoints

# US East (N. Virginia) - Default
endpoint="s3.amazonaws.com"

# US West (Oregon)
endpoint="s3.us-west-2.amazonaws.com"

# Europe (Ireland)
endpoint="s3.eu-west-1.amazonaws.com"

# Asia Pacific (Tokyo)
endpoint="s3.ap-northeast-1.amazonaws.com"

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

luckierdodge

Release history Release notifications | RSS feed

0.7.1

Mar 10, 2026

0.7.0

Mar 5, 2026

0.6.1

Jan 1, 2026

0.6.0

Dec 15, 2025

0.6.0rc7 pre-release

Dec 15, 2025

0.6.0rc6 pre-release

Dec 15, 2025

0.6.0rc5 pre-release

Dec 8, 2025

0.6.0rc4 pre-release

Dec 8, 2025

0.6.0rc3 pre-release

Dec 5, 2025

0.6.0rc2 pre-release

Dec 5, 2025

0.6.0rc1 pre-release

Dec 5, 2025

0.5.4

Nov 19, 2025

0.5.3

Nov 11, 2025

0.5.2

Nov 11, 2025

0.5.1

Nov 7, 2025

0.5.0

Oct 27, 2025

0.5.0rc3 pre-release

Oct 27, 2025

0.5.0rc2 pre-release

Oct 17, 2025

0.5.0rc1 pre-release

Oct 17, 2025

0.4.7

Aug 18, 2025

0.4.6

Aug 15, 2025

0.4.5

Aug 14, 2025

0.4.4

Aug 8, 2025

0.4.3

Jun 25, 2025

0.4.2

Jun 19, 2025

0.4.1

Jun 17, 2025

0.4.0

Jun 7, 2025

This version

0.3.1

May 28, 2025

0.3.0

May 12, 2025

0.2.1

Apr 25, 2025

0.2.0

Apr 11, 2025

0.1.9

Apr 2, 2025

0.1.8

Mar 31, 2025

0.1.7

Mar 25, 2025

0.1.6

Mar 21, 2025

0.1.5

Mar 20, 2025

0.1.4

Mar 19, 2025

0.1.3

Mar 18, 2025

0.1.2

Mar 18, 2025

0.1.1

Mar 18, 2025

0.1.0

Mar 11, 2025

0.0.4

Mar 3, 2025

0.0.3

Mar 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

madsci_data_manager-0.3.1.tar.gz (16.4 kB view details)

Uploaded May 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

madsci_data_manager-0.3.1-py3-none-any.whl (7.2 kB view details)

Uploaded May 28, 2025 Python 3

File details

Details for the file madsci_data_manager-0.3.1.tar.gz.

File metadata

Download URL: madsci_data_manager-0.3.1.tar.gz
Upload date: May 28, 2025
Size: 16.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: pdm/2.24.2 CPython/3.9.22 Linux/6.11.0-1014-azure

File hashes

Hashes for madsci_data_manager-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`3a09d80a8b134cdaa0444703bd54213532d2070adefdd48f8f7ccc89337e1227`
MD5	`7ee1d675aabd920c0151d2df6ed9c223`
BLAKE2b-256	`f37511d0e649ae7b455c8a2cf209461a21284e990d1c190d2bea63691ef7e46f`

See more details on using hashes here.

File details

Details for the file madsci_data_manager-0.3.1-py3-none-any.whl.

File metadata

Download URL: madsci_data_manager-0.3.1-py3-none-any.whl
Upload date: May 28, 2025
Size: 7.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: pdm/2.24.2 CPython/3.9.22 Linux/6.11.0-1014-azure

File hashes

Hashes for madsci_data_manager-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ccf25337794fed42e9d132e7105aeaa453ae98c4ccecdb60b9590edb8344785`
MD5	`629307a2ec33bf5a46147f79300022c1`
BLAKE2b-256	`46767918008180dc3319e8af36068ae7c4f1dcc87814b4a6fbd00c4302f56bfe`

See more details on using hashes here.

madsci.data_manager 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

MADSci Data Manager

Notable Features

Installation

Usage

Manager

Client

Object Storage Integration

How It Works

Configuration

Docker Compose Setup

Cloud Storage Integration

Supported Providers

Configuration

AWS S3

Google Cloud Storage (GCS)

Authentication Setup

AWS S3 Authentication

Google Cloud Storage Authentication

Usage Examples

Regional Endpoints

AWS S3 Regional Endpoints

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes