CLI and helpers for scoring Drosophila proboscis responses from envelope or raw coordinate data.
Project description
FlyPCA
FlyPCA provides a reproducible, event-aligned lag-embedded PCA workflow for Drosophila proboscis-distance time series. The package smooths and baseline-normalizes traces, performs Hankel (time-delay) embedding, learns compact principal components, derives interpretable behavioral features, and clusters trials into reaction vs. non-reaction cohorts.
Pipeline Overview
- Ingest trial CSVs or manifests (trial_id, fly_id, distance, odor indices).
- Preprocess each trial with Savitzky–Golay smoothing, optional low-pass filtering, and pre-odor z-scoring.
- Lag Embed & PCA using Hankel matrices to preserve local temporal structure; fit PCA or IncrementalPCA.
- Project trials into PC trajectories aligned to odor onset.
- Engineer Features capturing temporal dynamics, velocity, Hilbert envelope, frequency bands, and PC-space summaries.
- Cluster & Evaluate with GMM or HDBSCAN and compute silhouette, Calinski–Harabasz, AUROC, and AUPRC (leave-one-fly-out).
- Visualize & Report scree plots, loadings, trajectories, cluster scatter, violin plots, and markdown reports.
Quickstart
make venv
source .venv/bin/activate
make install
make test
Generate a synthetic demo dataset and full report:
make demo
Running on Real Data
-
Assemble a manifest or wide CSV describing each trial.
- Stacked format: one row per timepoint with columns
trial_id,fly_id,distance,odor_on_idx, optionalodor_off_idx, optionaltime, and optionalfps. - Wide format: one row per trial where the time series samples occupy columns with a consistent prefix (e.g.,
dir_val_0,dir_val_1, …). Provide metadata columns for trial identity, fly identity, odor indices, and fps.
- Stacked format: one row per timepoint with columns
-
Map column names in the config. Copy
configs/default.yamland update theiosection to match your data. Example for the wide file shown in the error transcript:io: format: wide read_csv: low_memory: false dtype: trial_label: str wide: trial_id_column: trial_label trial_id_template: "{fly}_{trial_label}" fly_id_column: fly fps_column: fps odor_on_value: 1230 odor_off_value: 2430 time_columns: prefix: dir_val_
Setting
dtypeensures pandas does not emit mixed-type warnings. For stacked data, adjustio.stacked.distance_column,io.stacked.time_column, etc., instead. -
Verify indices:
odor_on_idxandodor_off_idxare frame indices (0-based). They must be within[0, n_frames)andodor_on_idx < odor_off_idx. Ensure the time column is strictly increasing if present; for wide data the loader generates time stamps usingfps. -
Run the CLI pipeline. The commands below fit the lag-embedded PCA model, project each trial, engineer features, cluster reactions, and generate a Markdown report with key plots.
flypca fit-lag-pca \
--data data/manifest.csv \
--config configs/default.yaml \
--out artifacts/models/lagpca.joblib
flypca project \
--model artifacts/models/lagpca.joblib \
--data data/manifest.csv \
--out artifacts/projections/
flypca features \
--data data/manifest.csv \
--config configs/default.yaml \
--model artifacts/models/lagpca.joblib \
--projections artifacts/projections/ \
--out artifacts/features.parquet
flypca cluster \
--features artifacts/features.parquet \
--config configs/default.yaml \
--projections-dir artifacts/projections/ \
--method gmm \
--out artifacts/cluster.csv \
--labels-path data/labels.csv \
--labels-column-name user_score_odor \
--label-column user_score_odor
flypca report \
--features artifacts/features.parquet \
--clusters artifacts/cluster.csv \
--model artifacts/models/lagpca.joblib \
--projections artifacts/projections/ \
--out-dir artifacts/
Outputs are written under artifacts/ by default: the trained PCA model (models/), projected PC trajectories (projections/), engineered features (features.parquet), clustering assignments, summary figures (figures/), and a Markdown report describing variance explained, cluster metrics, and representative trajectories.
CLI entry points (Typer-based):
flypca fit-lag-pca --data data/manifest.csv --config configs/default.yaml --out artifacts/models/lagpca.joblib
flypca project --model artifacts/models/lagpca.joblib --data data/manifest.csv --out artifacts/projections/
flypca features --data data/manifest.csv --config configs/default.yaml --model artifacts/models/lagpca.joblib --projections artifacts/projections/ --out artifacts/features.parquet
flypca cluster --features artifacts/features.parquet --config configs/default.yaml --projections-dir artifacts/projections/ --method gmm --out artifacts/cluster.csv --label-column reaction
# cluster with label CSV
flypca cluster \
--features artifacts/features.parquet \
--config configs/default.yaml \
--projections-dir artifacts/projections/ \
--labels-path data/labels.csv \
--labels-column-name user_score_odor \
--out artifacts/cluster.csv
flypca report --features artifacts/features.parquet --clusters artifacts/cluster.csv --model artifacts/models/lagpca.joblib --projections artifacts/projections/ --out-dir artifacts/
Clustering configuration
standardize: z-score the feature/projection matrix before fitting the mixture model (enabled by default).min_variance: drop near-constant columns prior to clustering to prevent degeneracy.component_range: sweep a range of Gaussian mixture sizes (inclusive) and pick the lowest-BIC model with a valid silhouette.covariance_types: evaluate multiple covariance structures (full,diag, etc.) during the sweep.use_projections:autoby default; if projections are supplied they are incorporated automatically, otherwise the feature table alone is clustered. Set totrueorfalseto force behaviour.combine_with_features:autoby default; when projections are used they are concatenated with engineered features unless explicitly disabled.projection_components/projection_timepoints: cap how many PCs and aligned samples are flattened from the NPZ files.
Label CSVs can be merged on-the-fly using --labels-path and --labels-column-name. The helper derives trial_id values by
applying the configured template (e.g. {fly}_{trial_label}) or, if absent, by combining fly and trial_label columns. The merged column is available for clustering diagnostics and supervised AUROC/AUPRC evaluation.
When use_projections is enabled the CLI expects projections/manifest.csv (written by flypca project) so trial IDs can be matched automatically.
Expected data layout for manifests:
manifest.csv:
path,trial_id,fly_id,odor_on_idx,odor_off_idx,fps
trial001.csv,tr1,flyA,80,120,40
...
trial001.csv:
frame,time,distance
0,0.00,1.23
...
Testing & Quality
- Type-annotated, vectorized preprocessing and feature routines.
- Deterministic seeds; logging records parameter settings and array shapes.
- Pytest suite covers preprocessing, PCA embedding, feature extraction, and end-to-end synthetic performance (AUROC > 0.8).
Interpreting PCs
- PC1 typically correlates with response amplitude and integrates the rising phase post-odor.
- PC2 captures latency and decay kinetics when present.
- Time-aligned PC trajectories and feature table outputs (parquet) enable downstream classifiers or visualization in standard tools.
Make Targets
make venv: create.venvusing Python 3.11.make install: install flypca in editable mode with requirements.make test: run unit tests (pytest -q).make demo: synthesize data, run the full CLI pipeline, and emit artifacts (models, projections, features, clusters, figures, report).
Refer to examples/01_synthetic_demo.ipynb for a notebook walkthrough replicating the pipeline with code and inline commentary.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flybehavior_response-0.1.0.tar.gz.
File metadata
- Download URL: flybehavior_response-0.1.0.tar.gz
- Upload date:
- Size: 128.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86d6ce7c0f363c79de64abe515fd6b70885ef8650fcd1624376a77dd881326ae
|
|
| MD5 |
3510636bd4264ab3def01423c3bc89e7
|
|
| BLAKE2b-256 |
adade548ed6373ceffbc3458c13e9d528fcf88c11bcd51fb4f6d1b3be5a1355c
|
File details
Details for the file flybehavior_response-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flybehavior_response-0.1.0-py3-none-any.whl
- Upload date:
- Size: 64.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
937f56199183f7d2a2203f85fcee22d2a56a72d2445502a18d10e90502912ef7
|
|
| MD5 |
62b714630297e2ea0848fd04af22ea53
|
|
| BLAKE2b-256 |
b9d07c09a3deac5067bb66bff54a098654940cdb8dded23924bae429f896f359
|