Merrypopins: Automated pop-in detection for nano-indentation experiments tooling: load_datasets, preprocess, locate, statistics & make_dataset

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Merrypopins

merrypopins is a Python library to streamline the workflow of nano‑indentation experiment data processing, automated pop-in detection and analysis. It provides five core modules:

load_datasets: Load and parse .txt measurement files and .tdm/.tdx metadata files into structured pandas DataFrames. Automatically detects headers, timestamps, and measurement channels.
preprocess: Clean and normalize indentation data with filtering, baseline correction, and contact point detection.
locate: Identify and extract pop‑in events within indentation curves using advanced detection algorithms, including:
- Isolation Forest anomaly detection
- CNN Autoencoder reconstruction error
- Fourier-based derivative outlier detection
- Savitzky-Golay smoothed gradient thresholds
statistics: Perform statistical analysis and model fitting on located pop‑in events (e.g., frequency, magnitude, distribution).
make_dataset: Construct enriched datasets by running the full merrypopins pipeline and exporting annotated results and visualizations.

Merrypopins is developed by Cahit Acar, Anna Marcelissen, Hugo van Schrojenstein Lantman, and John M. Aiken.

Installation

# From PyPI
pip install merrypopins

# For development
git clone https://github.com/SerpRateAI/merrypopins.git
cd merrypopins
pip install -e .

merrypopins supports Python 3.10+ and depends on:

matplotlib
numpy
pandas
scipy
scikit-learn
tensorflow

These are installed automatically via pip.

Quickstart

Importing merrypopins Modules

from pathlib import Path
from merrypopins.load_datasets import load_txt, load_tdm
from merrypopins.preprocess import default_preprocess, remove_pre_min_load, rescale_data, finalise_contact_index
from merrypopins.locate import default_locate
from merrypopins.make_dataset import merrypopins_pipeline

Load Indentation Data and Metadata

# 1) Load indentation data:
data_file = Path("data/experiment1.txt")
df = load_txt(data_file)
print(df.head())
print("Timestamp:", df.attrs['timestamp'])
print("Number of Points:", df.attrs['num_points'])

# 2) Load tdm metadata:
tdm_meta_file = Path("data/experiment1.tdm")
# Load tdm metadata and channels this will create dataframe for root and channels
df_tdm_meta_root, df_tdm_meta_channels = load_tdm(tdm_meta_file)
# The root metadata is stored as one row with their respective columns
print(df_tdm_meta_root.head())
# To be able to read all the columns of root metadata dataframe it can be transposed
df_tdm_meta_root = df_tdm_meta_root.T.reset_index()
df_tdm_meta_root.columns = ['attribute', 'value']
print(df_tdm_meta_root.head(50))
# The channel metadata is stored as multiple rows with their respective columns
print(df_tdm_meta_channels.head(50))

Preprocess Data

Option 1: Use default pipeline

# This applies:
# 1. Removes all rows before minimum Load
# 2. Detects contact point and shifts Depth so contact = 0
# 3. Removes Depth < 0 rows and adds a flag for the contact point

df_processed = default_preprocess(df)

print(df_processed.head())
print("Contact point index:", df_processed[df_processed["contact_point"]].index[0])

Option 2: Customize each step (with optional arguments)

# Step 1: Remove initial noise based on minimum Load
df_clean = remove_pre_min_load(df, load_col="Load (µN)")

# Step 2: Automatically detect contact point and zero the depth
df_rescaled = rescale_data(
    df_clean,
    depth_col="Depth (nm)",
    load_col="Load (µN)",
    N_baseline=30,     # number of points for baseline noise estimation
    k=5.0,             # noise threshold multiplier
    window_length=7,   # Savitzky-Golay smoothing window (must be odd)
    polyorder=2        # Polynomial order for smoothing
)

# Step 3: Trim rows before contact and/or flag the point
df_final = finalise_contact_index(
    df_rescaled,
    depth_col="Depth (nm)",
    remove_pre_contact=True,       # remove rows where depth < 0
    add_flag_column=True,          # add a boolean column marking the contact point
    flag_column="contact_point"    # customize the column name if needed
)

print(df_final[df_final["contact_point"]])  # display contact row
print("Contact point index:", df_final[df_final["contact_point"]].index[0])

🧪 Tip You can omit or modify any step depending on your data:

Skip remove_pre_min_load() if your data is already clean.
Set remove_pre_contact=False if you want to retain all data.
Customize flag_column to integrate with your own schema.

Locate Pop-in Events

Detect Pop-ins using Default Method

# Detect pop-ins using all methods
results = default_locate(df_processed)
print(results[results.popin])

Customize Detection Thresholds

results_tuned = default_locate(
    df_processed,
    iforest_contamination=0.002,
    cnn_threshold_multiplier=4.0,
    fd_threshold=2.5,
    savgol_threshold=2.0
)

Visualize Detections

import matplotlib.pyplot as plt

plt.figure(figsize=(8,6))
plt.plot(results_tuned["Depth (nm)"], results_tuned["Load (µN)"], label="Preprocessed", alpha=0.4, color='orange')

colors = {
    "popin_iforest": 'red',
    "popin_cnn": 'purple',
    "popin_fd": 'darkorange',
    "popin_savgol": 'green'
}
markers = {
    "popin_iforest": '^',
    "popin_cnn": 'v',
    "popin_fd": 'x',
    "popin_savgol": 'D'
}

for method, color in colors.items():
    mdf = results_tuned[results_tuned[method]]
    plt.scatter(mdf["Depth (nm)"], mdf["Load (µN)"],
                c=color, label=method.replace("popin_", "").capitalize(),
                marker=markers[method], alpha=0.7)

confident = results_tuned[results_tuned["popin_confident"]]
plt.scatter(confident["Depth (nm)"], confident["Load (µN)"],
            edgecolors='k', facecolors='none', label="Majority Vote (2+)", s=100, linewidths=1.5)

plt.xlabel("Depth (nm)"); plt.ylabel("Load (µN)")
plt.title("Pop-in Detections by All Methods")
plt.legend(); plt.grid(True); plt.tight_layout(); plt.show()

Run Full Pipeline with merrypopins_pipeline

This function runs the entire merrypopins workflow, from loading data to locating pop-ins and generating visualizations.

Define Input and Output Paths

# Define the text file that will be processed and output directory that will contain the visualization
text_file = Path("datasets/6microntip_slowloading/grain9_6um_indent03_HL_QS_LC.txt")
output_dir = Path("visualisations/6microntip_slowloading/grain9_6um_indent03_HL_QS_LC")

# Make sure output directory exists
output_dir.mkdir(parents=True, exist_ok=True)

Run The merrypopins Pipeline

df_pipeline = merrypopins_pipeline(
    text_file,
    save_plot_dir=output_dir,
    trim_margin=30
)

View Result DataFrame

df_pipeline.head()

View Result Visualizations

# The pipeline generates plot in the specified output directory for the provided text file.
from PIL import Image
import matplotlib.pyplot as plt

# Load all PNGs from output folder
image_paths = sorted(output_dir.glob("*.png"))

# Only proceed if there are images
if image_paths:
    img = Image.open(image_paths[0])
    plt.figure(figsize=(12, 6))
    plt.imshow(img)
    plt.title(image_paths[0].stem)
    plt.axis('off')
    plt.show()
else:
    print("No plots found in output folder.")

Development & Testing

Install development requirements:
```
pip install -e '.[dev]'
```

🔧 Pre-commit Hooks

We use pre-commit to automatically check code formatting and linting before each commit. This helps ensure consistent code quality across the project.

Setup (Run Once)

# After installing the development dependencies, set up pre-commit hooks:
# This will install the hooks defined in .pre-commit-config.yaml
pre-commit install

This sets up a Git hook that will run ruff and black automatically before each commit.

Run Manually

To run all checks on all files:

pre-commit run --all-files

Notes:

Hooks are defined in .pre-commit-config.yaml.
You can exclude specific files or directories (e.g., tutorials/) by modifying that config file.

🧪 Running Tests

Run tests with coverage:

pytest --cov=merrypopins --cov-report=term-missing

Generate HTML coverage report:

pytest --cov=merrypopins --cov-report=html
# open htmlcov/index.html in browser

Contributing

Contributions are welcome! Please file issues and submit pull requests on GitHub.

Before submitting a PR:

Fork the repository.
Create a feature branch (git checkout -b feature/foo).
Commit your changes (git commit -m "feat: add bar").
Push to the branch (git push origin feature/foo).
Open a pull request.

License

This project is licensed under the GNU General Public License v3.0. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.4

Aug 29, 2025

1.0.3

Jul 11, 2025

1.0.2

Jul 9, 2025

1.0.1

Jun 26, 2025

1.0.0

Jun 15, 2025

This version

0.2.3

Jun 6, 2025

0.2.2

Jun 6, 2025

0.2.1

Jun 6, 2025

0.2.0

Jun 6, 2025

0.1.1

Jun 4, 2025

0.1.0

Jun 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merrypopins-0.2.3.tar.gz (38.7 kB view details)

Uploaded Jun 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

merrypopins-0.2.3-py3-none-any.whl (31.5 kB view details)

Uploaded Jun 6, 2025 Python 3

File details

Details for the file merrypopins-0.2.3.tar.gz.

File metadata

Download URL: merrypopins-0.2.3.tar.gz
Upload date: Jun 6, 2025
Size: 38.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for merrypopins-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`2e28b3ac79400e0c2740d2e27bf676e930742f62f5a58dc7717703d86a32d7ce`
MD5	`c580765c13377824f14065d51f3caf78`
BLAKE2b-256	`336fc9bac5803b55371901e3571b35ce66d276e6f2205a7bb75f3f02432ada1e`

See more details on using hashes here.

File details

Details for the file merrypopins-0.2.3-py3-none-any.whl.

File metadata

Download URL: merrypopins-0.2.3-py3-none-any.whl
Upload date: Jun 6, 2025
Size: 31.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for merrypopins-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`baa38c09ddea3c2344ba396792994947f3d854b00ecfd8315a26243864692258`
MD5	`f82555f78c865b1aee48c942360319f2`
BLAKE2b-256	`0a093b64d146d848846046c8603822177257cc378329e49201d1b4a1576d1287`

See more details on using hashes here.

merrypopins 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Merrypopins

Installation

Quickstart

Importing merrypopins Modules

Load Indentation Data and Metadata

Preprocess Data

Option 1: Use default pipeline

Option 2: Customize each step (with optional arguments)

Locate Pop-in Events

Detect Pop-ins using Default Method

Customize Detection Thresholds

Visualize Detections

Run Full Pipeline with merrypopins_pipeline

Define Input and Output Paths

Run The merrypopins Pipeline

View Result DataFrame

View Result Visualizations

Development & Testing

🔧 Pre-commit Hooks

Setup (Run Once)

🧪 Running Tests

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes