Skip to main content

Analyze, visualize, and interpret root traits output from sleap-roots.

Project description

SLEAP Roots Analyze

PyPI version Python: 3.11+ License: GPL v3 Tests: 1900+

Statistical analysis tools for root trait data from SLEAP Roots.

Installation

pip install sleap-roots-analyze

Or with uv:

uv add sleap-roots-analyze

Optional: input-contract validation

To validate analysis input against the sleap-roots-contracts schema at the data-load boundary, install the optional contracts extra:

pip install "sleap-roots-analyze[contracts]"

This enables the data.validate_input: off | warn | strict config flag for the QC pipeline, and the matching validate_input flag on CrossPlatformConfig for the cross-platform pipeline. When the extra is not installed, validation degrades to a logged no-op (never an error).

For development:

git clone https://github.com/talmolab/sleap-roots-analyze.git
cd sleap-roots-analyze
uv sync --group dev

Quick Start

New Analysis? Start here.

Use the interactive /configure-run-all slash command (in Claude Code) to create a complete, scientifically validated set of configs for a new analysis:

/configure-run-all

It inspects your CSV, walks you through every parameter with statistical guardrails, writes QC + Viz + run manifest configs, and commits them to git as a reproducibility anchor.

Optionally validate a specific config before running:

/validate-config configs/active/qc/<your_analysis>.yaml

Then run:

/run-pipelines --manifest configs/active/run_manifest_<your_analysis>.yaml

See configs/templates/README.md for manual config authoring.

Command-Line Interface

The package provides a CLI for running QC and visualization pipelines:

# Run QC pipeline
sleap-roots-analyze qc configs/qc_turface_150genotypes.yaml

# Run with custom output directory
sleap-roots-analyze qc configs/qc_turface_150genotypes.yaml -o ./my_results

# Validate configuration
sleap-roots-analyze config validate configs/qc_turface_150genotypes.yaml

# List example configs
sleap-roots-analyze config list

# Run all pipelines from a manifest
sleap-roots-analyze run-all configs/active/run_manifest.yaml

# Get help
sleap-roots-analyze --help
sleap-roots-analyze qc --help

See docs/QC_PIPELINE_GUIDE.md for a complete guide to using the QC pipeline.

Python API

Load and Clean Data

from sleap_roots_analyze.data_cleanup import (
    load_trait_data,
    get_trait_columns,
    remove_nan_samples,
)

# Load data
df = load_trait_data("path/to/traits.csv")

# Get trait columns (excludes metadata automatically)
trait_cols = get_trait_columns(df)

# Remove samples with >20% missing data
df_clean, df_removed, stats = remove_nan_samples(
    df, trait_cols, max_nan_fraction=0.2
)

Calculate Heritability

from sleap_roots_analyze.statistics import calculate_heritability_estimates

# Calculate heritability for all traits
h2_results = calculate_heritability_estimates(
    df_clean,
    trait_cols,
    genotype_col="geno",
    replicate_col="rep"
)

# Filter low heritability traits
h2_results, df_filtered, removed, details = calculate_heritability_estimates(
    df_clean,
    trait_cols,
    remove_low_h2=True,
    h2_threshold=0.3
)

PCA Analysis

from sleap_roots_analyze.pca import perform_pca_analysis

# Run PCA with automatic component selection
result = perform_pca_analysis(
    df_filtered,
    standardize=True,
    explained_variance_threshold=0.95
)

# Access results
pca_model = result['pca']
transformed_data = result['transformed_data']
loadings = result['loadings']

Outlier Detection

from sleap_roots_analyze.outlier_detection import (
    detect_outliers_mahalanobis,
    detect_outliers_isolation_forest,
    remove_outliers_from_data
)

# Detect outliers using Mahalanobis distance
outliers_maha = detect_outliers_mahalanobis(
    df_filtered[trait_cols],
    use_robust=True
)

# Or use Isolation Forest for complex patterns
outliers_iso = detect_outliers_isolation_forest(
    df_filtered[trait_cols],
    contamination=0.1
)

# Remove outliers from data
df_clean, df_outliers = remove_outliers_from_data(
    df_filtered,
    outliers_maha['outlier_indices'],
    return_outliers=True
)

Visualization

from sleap_roots_analyze.visualization import (
    create_heritability_plot,
    create_pca_biplot,
    create_feature_contribution_heatmap,
    create_phenotype_variation_plot,
    save_publication_figure
)

# Create heritability plot
fig = create_heritability_plot(h2_results, threshold=0.3)

# Create PCA biplot
fig_biplot = create_pca_biplot(
    pca_result,
    color_by="geno",
    metadata_df=df_filtered[["Barcode", "geno"]]
)

# Create feature contribution heatmap
fig_heatmap = create_feature_contribution_heatmap(
    pca_result['feature_contributions'],
    n_components=5
)

# Save in publication format
save_publication_figure(fig, "heritability", formats=["pdf", "png"])

Comprehensive PCA Analysis with Export

from sleap_roots_analyze.pca import run_pca_and_export_artifacts

# Run comprehensive PCA analysis with CSV exports
results = run_pca_and_export_artifacts(
    df_filtered,
    trait_cols=trait_cols,
    analysis_dir="pca_results",
    n_components=10,
    save_csv=True,
    save_prefix="experiment1_"
)

# Access results DataFrames
loadings_df = results['loadings_df']
pc_scores_df = results['pc_scores_df']
variance_df = results['variance_explained_df']
contributions_df = results['trait_variance_contributions_df']

Interactive Visualization

from sleap_roots_analyze.interactive_visualization import (
    create_interactive_pca_with_images,
    create_interactive_umap_with_hover_highlight,
    create_trait_explorer_dashboard,
    create_interactive_image_gallery
)

# Create interactive PCA with sample images
fig = create_interactive_pca_with_images(
    pca_result,
    image_paths,  # Dict mapping sample IDs to image paths
    show_images=True,
    metadata_df=df_filtered[["Barcode", "geno"]]
)

# Interactive UMAP with hover highlights
fig_umap = create_interactive_umap_with_hover_highlight(
    umap_result,
    highlight_on_hover=True,
    size=8
)

# Create comprehensive trait explorer dashboard
dashboard = create_trait_explorer_dashboard(
    df_filtered,
    trait_cols,
    groupby_col="geno"
)

# Generate interactive HTML gallery with images
html = create_interactive_image_gallery(
    image_paths,
    metadata_df=df_filtered[["Barcode", "geno", "trait1"]],
    images_per_row=4,
    image_width=200
)

Features

  • Data Cleaning: Automatic metadata detection, NaN handling, zero-inflated trait removal
  • Statistical Analysis: Broad-sense heritability (H²), ANOVA, trait statistics
  • PCA Analysis: Dimensionality reduction with automatic component selection, comprehensive export artifacts
  • Outlier Detection: Mahalanobis, PCA reconstruction, and Isolation Forest methods
  • Visualization: Publication-ready plots for heritability, PCA, outliers, and phenotype variation
  • Interactive Visualization: Plotly-based interactive plots with image integration and hover effects
  • UMAP Analysis: Non-linear dimensionality reduction for complex trait relationships
  • Cross-Experiment Analysis: Compare and correlate data across multiple experiments

Data Format

Expected CSV structure:

Barcode,geno,rep,trait1,trait2,trait3,...
BC001,Genotype1,1,100.5,200.3,50.2,...
BC002,Genotype1,2,102.3,195.8,48.9,...

Required columns:

  • Genotype: geno (configurable)
  • Replicate: rep (configurable)
  • Sample ID: Barcode (configurable)
  • Traits: Any numeric columns

Development

# Run tests
uv run pytest

# Format code
uv run black src tests

# Lint code
uv run ruff check src tests

# Coverage report
uv run pytest --cov --cov-branch

Project Structure

sleap-roots-analyze/
├── src/sleap_roots_analyze/
│   ├── cli.py                        # Command-line interface
│   ├── data_cleanup.py               # Data loading and cleaning
│   ├── statistics.py                 # Statistical analysis
│   ├── pca.py                        # PCA analysis
│   ├── outlier_detection.py          # Outlier detection
│   ├── visualization.py              # Plotting and visualization
│   ├── outlier_visualization.py      # Outlier-specific plots
│   ├── interactive_visualization.py  # Interactive Plotly visualizations
│   ├── cross_experiment_analysis.py  # Cross-experiment comparisons
│   ├── depth_profile_plots.py        # Depth profile visualizations
│   ├── pipeline_runner.py            # Pipeline orchestration (run-all)
│   ├── umap.py                       # UMAP dimensionality reduction
│   ├── data_utils.py                 # Utility functions
│   └── pipeline/                     # QC/Viz pipeline steps
├── configs/                     # Pipeline configurations
│   ├── active/                  # Active run manifests
│   └── examples/                # Example configs for different use cases
├── tests/                       # Test suite (1900+ tests)
├── docs/                        # Documentation
└── pyproject.toml              # Project configuration

License

GNU General Public License v3.0 - see LICENSE file.

Citation

@software{sleap_roots_analyze,
  title = {SLEAP Roots Analyze},
  author = {Elizabeth Berrigan},
  year = {2026},
  url = {https://github.com/talmolab/sleap-roots-analyze}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sleap_roots_analyze-0.1.0a4.tar.gz (317.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sleap_roots_analyze-0.1.0a4-py3-none-any.whl (373.5 kB view details)

Uploaded Python 3

File details

Details for the file sleap_roots_analyze-0.1.0a4.tar.gz.

File metadata

  • Download URL: sleap_roots_analyze-0.1.0a4.tar.gz
  • Upload date:
  • Size: 317.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sleap_roots_analyze-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 128fb67cf247dac7b845e8b30eb292f970f1dd75dafbb66622698cbaa4aeea2a
MD5 0649304f92d7b05b3eecf929251cee7a
BLAKE2b-256 4ad452d9e8b56aaf40a606e3ffac001ccd142213e65c5522f5bb3bc6fb00a82a

See more details on using hashes here.

File details

Details for the file sleap_roots_analyze-0.1.0a4-py3-none-any.whl.

File metadata

  • Download URL: sleap_roots_analyze-0.1.0a4-py3-none-any.whl
  • Upload date:
  • Size: 373.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sleap_roots_analyze-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 e5ac8efed6dac2963e4160c2d55370b3ae46623c56ba1cfd05b9cbf2a2308a90
MD5 592ca7d120e668f82c878f98e798fd5e
BLAKE2b-256 f0923c749b3d53dbfb88cf15a1d3f5b709dd21a2871e02d3a07283f28e4a40db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page