A database for results collected from the SHIELD permeation rig
Project description
SHIELD-Data
A repository to store and manage raw experimental data produced from the SHIELD permeation rig.
Overview
This repository provides an automated data management system for SHIELD experimental runs. It includes:
- Automated Data Upload: Watchdog-based monitoring system that detects new experimental data and automatically creates GitHub pull requests
- Data Cataloging: Automatic generation of a searchable catalogue (CSV + README) containing metadata for all experimental runs
- Structured Storage: Organized folder structure with run metadata, pressure gauge data, and backups
- PR-based Workflow: All data additions are tracked through GitHub pull requests with detailed metadata
Repository Structure
SHIELD-Data/
├── run_data/ # Main data storage folder
│ ├── YY.MM.DD_run_X_HHhMM/ # Individual run folders
│ │ ├── pressure_gauge_data.csv # Experimental measurements
│ │ ├── run_metadata.json # Run configuration and metadata
│ │ └── backup/ # Backup data files
│ ├── runs_catalogue.csv # Auto-generated catalogue
│ └── README.md # Auto-generated table view of catalogue
└── src/shield_data/ # Python package
├── data_upload_handler.py # Watchdog monitoring and PR creation
├── build_catalogue.py # Catalogue generation
└── pr_template.md # PR body template
Features
Automated Data Upload
The upload_data_from_folder() function monitors a specified folder for new experimental data and automatically:
- Detects new or modified run data
- Validates folder structure and metadata
- Creates a git branch and commits changes
- Regenerates the data catalogue
- Opens a pull request with detailed run information
Data Catalogue
Every time data is added, the catalogue is automatically updated with:
- Run ID (folder name)
- Relative path to data
- Run type (e.g., permeation_exp)
- Date
- Furnace setpoint
- Material (if available)
- Coating (if available)
Run Metadata
Each experimental run includes a run_metadata.json file containing:
- Run information (type, date, furnace setpoint, etc.)
- Gauge configurations
- Valve timing information
- Recording parameters
Usage
Installing the Package
pip install -e .
Monitoring for New Data
from shield_data import upload_data_from_folder
# Monitor the run_data folder with default settings
upload_data_from_folder("run_data")
# Custom monitoring intervals
upload_data_from_folder(
"run_data",
check_interval=5, # Check every 5 seconds
batch_delay=2 # Wait 2 seconds after last change before processing
)
Building the Catalogue
from shield_data import build_catalogue
# Regenerate the catalogue manually
build_catalogue("run_data")
Loading and Analyzing Data
The package provides simple functions to load and filter experimental data:
View the Catalogue
from shield_data import catalogue
# Load the catalogue as a pandas DataFrame
cat = catalogue()
print(cat)
Load a Specific Run
from shield_data import load
# Load pressure gauge data for a specific run
df = load("25.10.06_run_1_10h41")
# The DataFrame includes all measurement data plus a 'run_id' column
print(df.head())
Load Run Metadata
from shield_data import load_metadata
# Load the metadata JSON as a dictionary
metadata = load_metadata("25.10.06_run_1_10h41")
# Access specific metadata fields
run_info = metadata["run_info"]
print(f"Run type: {run_info['run_type']}")
print(f"Furnace setpoint: {run_info['furnace_setpoint']} K")
print(f"Start time: {run_info['start_time']}")
Filter and Load Multiple Runs
from shield_data import load_filtered
# Load all runs at a specific temperature
df_500k = load_filtered(furnace_setpoint=500)
# Load runs by type and date
df_oct6 = load_filtered(run_type="permeation_exp", date="2025-10-06")
# Filter by material (when available)
df_material = load_filtered(material="stainless_steel")
# The result is a combined DataFrame with data from all matching runs
print(f"Loaded {len(df_500k)} data points from {df_500k['run_id'].nunique()} runs")
Example Analysis Workflow
from shield_data import catalogue, load_filtered
import matplotlib.pyplot as plt
# View available runs
cat = catalogue()
print(cat[["run_id", "date", "furnace_setpoint"]])
# Load all 500K experiments
df = load_filtered(furnace_setpoint=500)
# Group by run and plot
for run_id in df["run_id"].unique():
run_data = df[df["run_id"] == run_id]
plt.plot(run_data["time"], run_data["pressure"], label=run_id)
plt.xlabel("Time (s)")
plt.ylabel("Pressure")
plt.legend()
plt.show()
Requirements
- Python >= 3.9
- watchdog
- jinja2
- pandas
- Git
- GitHub CLI (
gh) configured with authentication
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shield_data-0.1a0.tar.gz.
File metadata
- Download URL: shield_data-0.1a0.tar.gz
- Upload date:
- Size: 16.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc6931f2f2993069ea65bf2da0e899cf29c63a9b53517a5bba59a257a1bdf2a5
|
|
| MD5 |
788edec01dfd249e8281c861e66343e9
|
|
| BLAKE2b-256 |
d6912a6f2efb519fd502a04da38f07ab11e9d9064fddadd1aba626a985cd1368
|
Provenance
The following attestation bundles were made for shield_data-0.1a0.tar.gz:
Publisher:
python-publish.yml on PTTEPxMIT/SHIELD-Data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shield_data-0.1a0.tar.gz -
Subject digest:
fc6931f2f2993069ea65bf2da0e899cf29c63a9b53517a5bba59a257a1bdf2a5 - Sigstore transparency entry: 673054585
- Sigstore integration time:
-
Permalink:
PTTEPxMIT/SHIELD-Data@6b0b5b99bed8c397c6d99ce3dd788722d45510cc -
Branch / Tag:
refs/tags/v0.1a - Owner: https://github.com/PTTEPxMIT
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6b0b5b99bed8c397c6d99ce3dd788722d45510cc -
Trigger Event:
release
-
Statement type:
File details
Details for the file shield_data-0.1a0-py3-none-any.whl.
File metadata
- Download URL: shield_data-0.1a0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
110ee1343c0b66e5dd5f99843585a4a0003063783fb55988ea65320f1927f3e8
|
|
| MD5 |
71faa793219901ac8c8c5c14ae743d92
|
|
| BLAKE2b-256 |
df72da2e295464b47821a8b3fa8edb9b6ca98ea460dacd1e1dff7981454df8ef
|
Provenance
The following attestation bundles were made for shield_data-0.1a0-py3-none-any.whl:
Publisher:
python-publish.yml on PTTEPxMIT/SHIELD-Data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shield_data-0.1a0-py3-none-any.whl -
Subject digest:
110ee1343c0b66e5dd5f99843585a4a0003063783fb55988ea65320f1927f3e8 - Sigstore transparency entry: 673054590
- Sigstore integration time:
-
Permalink:
PTTEPxMIT/SHIELD-Data@6b0b5b99bed8c397c6d99ce3dd788722d45510cc -
Branch / Tag:
refs/tags/v0.1a - Owner: https://github.com/PTTEPxMIT
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6b0b5b99bed8c397c6d99ce3dd788722d45510cc -
Trigger Event:
release
-
Statement type: