CLI and helpers for scoring Drosophila proboscis responses from envelope or raw coordinate data.

These details have not been verified by PyPI

Project links

Homepage

Project description

FlyPCA

FlyPCA provides a reproducible, event-aligned lag-embedded PCA workflow for Drosophila proboscis-distance time series. The package smooths and baseline-normalizes traces, performs Hankel (time-delay) embedding, learns compact principal components, derives interpretable behavioral features, and clusters trials into reaction vs. non-reaction cohorts.

Pipeline Overview

Ingest trial CSVs or manifests (trial_id, fly_id, distance, odor indices).
Preprocess each trial with Savitzky–Golay smoothing, optional low-pass filtering, and pre-odor z-scoring.
Lag Embed & PCA using Hankel matrices to preserve local temporal structure; fit PCA or IncrementalPCA.
Project trials into PC trajectories aligned to odor onset.
Engineer Features capturing temporal dynamics, velocity, Hilbert envelope, frequency bands, and PC-space summaries.
Cluster & Evaluate with GMM or HDBSCAN and compute silhouette, Calinski–Harabasz, AUROC, and AUPRC (leave-one-fly-out).
Visualize & Report scree plots, loadings, trajectories, cluster scatter, violin plots, and markdown reports.

Quickstart

make venv
source .venv/bin/activate
make install
make test

Generate a synthetic demo dataset and full report:

make demo

Running on Real Data

Assemble a manifest or wide CSV describing each trial.
- Stacked format: one row per timepoint with columns trial_id, fly_id, distance, odor_on_idx, optional odor_off_idx, optional time, and optional fps.
- Wide format: one row per trial where the time series samples occupy columns with a consistent prefix (e.g., dir_val_0, dir_val_1, …). Provide metadata columns for trial identity, fly identity, odor indices, and fps.

Map column names in the config. Copy configs/default.yaml and update the io section to match your data. Example for the wide file shown in the error transcript:

io:
  format: wide
  read_csv:
    low_memory: false
    dtype:
      trial_label: str
  wide:
    trial_id_column: trial_label
    trial_id_template: "{fly}_{trial_label}"
    fly_id_column: fly
    fps_column: fps
    odor_on_value: 1230
    odor_off_value: 2430
    time_columns:
      prefix: dir_val_

Setting dtype ensures pandas does not emit mixed-type warnings. For stacked data, adjust io.stacked.distance_column, io.stacked.time_column, etc., instead.

Verify indices: odor_on_idx and odor_off_idx are frame indices (0-based). They must be within [0, n_frames) and odor_on_idx < odor_off_idx. Ensure the time column is strictly increasing if present; for wide data the loader generates time stamps using fps.
Run the CLI pipeline. The commands below fit the lag-embedded PCA model, project each trial, engineer features, cluster reactions, and generate a Markdown report with key plots.

flypca fit-lag-pca \
  --data data/manifest.csv \
  --config configs/default.yaml \
  --out artifacts/models/lagpca.joblib

flypca project \
  --model artifacts/models/lagpca.joblib \
  --data data/manifest.csv \
  --out artifacts/projections/

flypca features \
  --data data/manifest.csv \
  --config configs/default.yaml \
  --model artifacts/models/lagpca.joblib \
  --projections artifacts/projections/ \
  --out artifacts/features.parquet

flypca cluster \
  --features artifacts/features.parquet \
  --config configs/default.yaml \
  --projections-dir artifacts/projections/ \
  --method gmm \
  --out artifacts/cluster.csv \
  --labels-path data/labels.csv \
  --labels-column-name user_score_odor \
  --label-column user_score_odor

flypca report \
  --features artifacts/features.parquet \
  --clusters artifacts/cluster.csv \
  --model artifacts/models/lagpca.joblib \
  --projections artifacts/projections/ \
  --out-dir artifacts/

Outputs are written under artifacts/ by default: the trained PCA model (models/), projected PC trajectories (projections/), engineered features (features.parquet), clustering assignments, summary figures (figures/), and a Markdown report describing variance explained, cluster metrics, and representative trajectories.

CLI entry points (Typer-based):

flypca fit-lag-pca --data data/manifest.csv --config configs/default.yaml --out artifacts/models/lagpca.joblib
flypca project --model artifacts/models/lagpca.joblib --data data/manifest.csv --out artifacts/projections/
flypca features --data data/manifest.csv --config configs/default.yaml --model artifacts/models/lagpca.joblib --projections artifacts/projections/ --out artifacts/features.parquet
flypca cluster --features artifacts/features.parquet --config configs/default.yaml --projections-dir artifacts/projections/ --method gmm --out artifacts/cluster.csv --label-column reaction

# cluster with label CSV
flypca cluster \
  --features artifacts/features.parquet \
  --config configs/default.yaml \
  --projections-dir artifacts/projections/ \
  --labels-path data/labels.csv \
  --labels-column-name user_score_odor \
  --out artifacts/cluster.csv
flypca report --features artifacts/features.parquet --clusters artifacts/cluster.csv --model artifacts/models/lagpca.joblib --projections artifacts/projections/ --out-dir artifacts/

Clustering configuration

standardize: z-score the feature/projection matrix before fitting the mixture model (enabled by default).
min_variance: drop near-constant columns prior to clustering to prevent degeneracy.
component_range: sweep a range of Gaussian mixture sizes (inclusive) and pick the lowest-BIC model with a valid silhouette.
covariance_types: evaluate multiple covariance structures (full, diag, etc.) during the sweep.
use_projections: auto by default; if projections are supplied they are incorporated automatically, otherwise the feature table alone is clustered. Set to true or false to force behaviour.
combine_with_features: auto by default; when projections are used they are concatenated with engineered features unless explicitly disabled.
projection_components / projection_timepoints: cap how many PCs and aligned samples are flattened from the NPZ files.

Label CSVs can be merged on-the-fly using --labels-path and --labels-column-name. The helper derives trial_id values by applying the configured template (e.g. {fly}_{trial_label}) or, if absent, by combining fly and trial_label columns. The merged column is available for clustering diagnostics and supervised AUROC/AUPRC evaluation.

When use_projections is enabled the CLI expects projections/manifest.csv (written by flypca project) so trial IDs can be matched automatically.

Expected data layout for manifests:

manifest.csv:
path,trial_id,fly_id,odor_on_idx,odor_off_idx,fps
trial001.csv,tr1,flyA,80,120,40
...

trial001.csv:
frame,time,distance
0,0.00,1.23
...

Testing & Quality

Type-annotated, vectorized preprocessing and feature routines.
Deterministic seeds; logging records parameter settings and array shapes.
Pytest suite covers preprocessing, PCA embedding, feature extraction, and end-to-end synthetic performance (AUROC > 0.8).

Interpreting PCs

PC1 typically correlates with response amplitude and integrates the rising phase post-odor.
PC2 captures latency and decay kinetics when present.
Time-aligned PC trajectories and feature table outputs (parquet) enable downstream classifiers or visualization in standard tools.

Make Targets

make venv: create .venv using Python 3.11.
make install: install flypca in editable mode with requirements.
make test: run unit tests (pytest -q).
make demo: synthesize data, run the full CLI pipeline, and emit artifacts (models, projections, features, clusters, figures, report).

Refer to examples/01_synthetic_demo.ipynb for a notebook walkthrough replicating the pipeline with code and inline commentary.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Oct 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flybehavior_response-0.1.0.tar.gz (128.8 kB view details)

Uploaded Oct 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flybehavior_response-0.1.0-py3-none-any.whl (64.1 kB view details)

Uploaded Oct 17, 2025 Python 3

File details

Details for the file flybehavior_response-0.1.0.tar.gz.

File metadata

Download URL: flybehavior_response-0.1.0.tar.gz
Upload date: Oct 17, 2025
Size: 128.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for flybehavior_response-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`86d6ce7c0f363c79de64abe515fd6b70885ef8650fcd1624376a77dd881326ae`
MD5	`3510636bd4264ab3def01423c3bc89e7`
BLAKE2b-256	`adade548ed6373ceffbc3458c13e9d528fcf88c11bcd51fb4f6d1b3be5a1355c`

See more details on using hashes here.

File details

Details for the file flybehavior_response-0.1.0-py3-none-any.whl.

File metadata

Download URL: flybehavior_response-0.1.0-py3-none-any.whl
Upload date: Oct 17, 2025
Size: 64.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for flybehavior_response-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`937f56199183f7d2a2203f85fcee22d2a56a72d2445502a18d10e90502912ef7`
MD5	`62b714630297e2ea0848fd04af22ea53`
BLAKE2b-256	`b9d07c09a3deac5067bb66bff54a098654940cdb8dded23924bae429f896f359`

See more details on using hashes here.

flybehavior-response 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FlyPCA

Pipeline Overview

Quickstart

Running on Real Data

Clustering configuration

Testing & Quality

Interpreting PCs

Make Targets

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes