Merrypopins: Automated pop-in detection for nano-indentation experiments tooling: load_datasets, preprocess, locate, statistics & make_dataset
Project description
Merrypopins
merrypopins is a Python library to streamline the workflow of nano‑indentation experiment data processing, automated pop-in detection and analysis. It provides five core modules:
load_datasets: Load and parse.txtmeasurement files and.tdm/.tdxmetadata files into structured pandas DataFrames. Automatically detects headers, timestamps, and measurement channels.preprocess: Clean and normalize indentation data with filtering, baseline correction, and contact point detection.locate: Identify and extract pop‑in events within indentation curves using advanced detection algorithms, including:- Isolation Forest anomaly detection
- CNN Autoencoder reconstruction error
- Fourier-based derivative outlier detection
- Savitzky-Golay smoothed gradient thresholds
- Majority-vote fusion with confidence scoring
statistics: Perform statistical analysis and model fitting on located pop‑in events (e.g., frequency, magnitude, distribution).make_dataset: Combine raw measurements, metadata, and analysis outputs into a machine‑learning‑ready dataset.
Installation
# From PyPI (⚠️ This will not work because package not published yet.)
pip install merrypopins
# For development
git clone https://github.com/SerpRateAI/merrypopins.git
cd merrypopins
pip install -e .
merrypopins supports Python 3.10+ and depends on:
numpypandasscipyscikit-learntensorflow
These are installed automatically via pip.
Quickstart
Importing merrypopins Modules
from pathlib import Path
from merrypopins.load_datasets import load_txt, load_tdm
from merrypopins.preprocess import default_preprocess, remove_pre_min_load, rescale_data, finalise_contact_index
Load Indentation Data and Metadata
# 1) Load indentation data:
data_file = Path("data/experiment1.txt")
df = load_txt(data_file)
print(df.head())
print("Timestamp:", df.attrs['timestamp'])
print("Number of Points:", df.attrs['num_points'])
# 2) Load tdm metadata:
tdm_meta_file = Path("data/experiment1.tdm")
# Load tdm metadata and channels this will create dataframe for root and channels
df_tdm_meta_root, df_tdm_meta_channels = load_tdm(tdm_meta_file)
# The root metadata is stored as one row with their respective columns
print(df_tdm_meta_root.head())
# To be able to read all the columns of root metadata dataframe it can be transposed
df_tdm_meta_root = df_tdm_meta_root.T.reset_index()
df_tdm_meta_root.columns = ['attribute', 'value']
print(df_tdm_meta_root.head(50))
# The channel metadata is stored as multiple rows with their respective columns
print(df_tdm_meta_channels.head(50))
Preprocess Data
Option 1: Use default pipeline
# This applies:
# 1. Removes all rows before minimum Load
# 2. Detects contact point and shifts Depth so contact = 0
# 3. Removes Depth < 0 rows and adds a flag for the contact point
df_processed = default_preprocess(df)
print(df_processed.head())
print("Contact point index:", df_processed[df_processed["contact_point"]].index[0])
Option 2: Customize each step (with optional arguments)
# Step 1: Remove initial noise based on minimum Load
df_clean = remove_pre_min_load(df, load_col="Load (µN)")
# Step 2: Automatically detect contact point and zero the depth
df_rescaled = rescale_data(
df_clean,
depth_col="Depth (nm)",
load_col="Load (µN)",
N_baseline=30, # number of points for baseline noise estimation
k=5.0, # noise threshold multiplier
window_length=7, # Savitzky-Golay smoothing window (must be odd)
polyorder=2 # Polynomial order for smoothing
)
# Step 3: Trim rows before contact and/or flag the point
df_final = finalise_contact_index(
df_rescaled,
depth_col="Depth (nm)",
remove_pre_contact=True, # remove rows where depth < 0
add_flag_column=True, # add a boolean column marking the contact point
flag_column="contact_point" # customize the column name if needed
)
print(df_final[df_final["contact_point"]]) # display contact row
print("Contact point index:", df_final[df_final["contact_point"]].index[0])
🧪 Tip You can omit or modify any step depending on your data:
- Skip remove_pre_min_load() if your data is already clean.
- Set remove_pre_contact=False if you want to retain all data.
- Customize flag_column to integrate with your own schema.
Locate Pop-in Events
Detect Pop-ins using Default Method
from merrypopins.locate import default_locate
# Detect pop-ins using all methods
results = default_locate(df_processed)
print(results[results.popin])
Customize Detection Thresholds
results_tuned = default_locate(
df_processed,
iforest_contamination=0.002,
cnn_threshold_multiplier=4.0,
fd_threshold=2.5,
savgol_threshold=2.0
)
Visualize Detections
import matplotlib.pyplot as plt
plt.figure(figsize=(8,6))
plt.plot(results_tuned["Depth (nm)"], results_tuned["Load (µN)"], label="Preprocessed", alpha=0.4, color='orange')
colors = {
"popin_iforest": 'red',
"popin_cnn": 'purple',
"popin_fd": 'darkorange',
"popin_savgol": 'green'
}
markers = {
"popin_iforest": '^',
"popin_cnn": 'v',
"popin_fd": 'x',
"popin_savgol": 'D'
}
for method, color in colors.items():
mdf = results_tuned[results_tuned[method]]
plt.scatter(mdf["Depth (nm)"], mdf["Load (µN)"],
c=color, label=method.replace("popin_", "").capitalize(),
marker=markers[method], alpha=0.7)
confident = results_tuned[results_tuned["popin_confident"]]
plt.scatter(confident["Depth (nm)"], confident["Load (µN)"],
edgecolors='k', facecolors='none', label="Majority Vote (2+)", s=100, linewidths=1.5)
plt.xlabel("Depth (nm)"); plt.ylabel("Load (µN)")
plt.title("Pop-in Detections by All Methods")
plt.legend(); plt.grid(True); plt.tight_layout(); plt.show()
Development & Testing
- Install development requirements:
pip install -e '.[dev]'
🔧 Pre-commit Hooks
We use pre-commit to automatically check code formatting and linting before each commit. This helps ensure consistent code quality across the project.
Setup (Run Once)
# After installing the development dependencies, set up pre-commit hooks:
# This will install the hooks defined in .pre-commit-config.yaml
pre-commit install
This sets up a Git hook that will run ruff and black automatically before each commit.
Run Manually
To run all checks on all files:
pre-commit run --all-files
Notes:
- Hooks are defined in .pre-commit-config.yaml.
- You can exclude specific files or directories (e.g., tutorials/) by modifying that config file.
🧪 Running Tests
-
Run tests with coverage:
pytest --cov=merrypopins --cov-report=term-missing
-
Generate HTML coverage report:
pytest --cov=merrypopins --cov-report=html # open htmlcov/index.html in browser
Contributing
Contributions are welcome! Please file issues and submit pull requests on GitHub.
License
This project is licensed under the GNU General Public License v3.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merrypopins-0.1.0.tar.gz.
File metadata
- Download URL: merrypopins-0.1.0.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4be26d07c5d7d569b2b65b490582abd33baa2ca83327dbd4301461a1730cc832
|
|
| MD5 |
e83a258521380fa8c869258498a38b1e
|
|
| BLAKE2b-256 |
70a2d4f45786775bbc9d2cdeb48909a75931adfc52bdab0dd72aa2a7cade4750
|
File details
Details for the file merrypopins-0.1.0-py3-none-any.whl.
File metadata
- Download URL: merrypopins-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1ef82748b8b784591342879f1d49a9b18b9806d917c24bea3e626c5aef8880
|
|
| MD5 |
8cf115c1d847a3bf22c37f898367eae5
|
|
| BLAKE2b-256 |
129220cd0640276eacf7aba645a99ec7b4361cddda1f954d1d014fb90ddf917c
|