Analysis of brain structural and functional connectomes from the Human Connectome Project
Project description
Connectopy
A Python package for analyzing brain structural and functional connectomes from the Human Connectome Project (HCP).
Features
- Data Loading: Load and merge HCP structural/functional connectome data with traits
- Dimensionality Reduction: PCA and VAE for connectome feature extraction
- Statistical Analysis: Sexual dimorphism analysis with effect sizes and FDR correction
- Mediation Analysis: Test brain network mediation of cognitive-alcohol relationships by sex
- Machine Learning: Multiple classifier options with unified interface
- Random Forest, XGBoost, EBM (Explainable Boosting), SVM, Logistic Regression
- Cross-validation with hyperparameter tuning (GridSearchCV)
- Class imbalance handling (sample weights, SMOTE, undersampling)
- Feature selection (SelectKBest)
- Optimal threshold finding (F1-based)
- Comprehensive metrics (AUC, balanced accuracy, precision, recall, F1)
- Alcohol Classification: Sex-stratified prediction of alcohol use disorder from brain + cognitive features
- Visualization: Publication-ready plots for connectome analysis (ROC curves, feature importance)
- Reproducibility: Docker container and automated pipelines
One-Click Demo (No Setup Required)
Try the analysis instantly in Google Colab - no installation needed!
Just click the badge above and then Runtime → Run all to execute the entire analysis.
Quick Start with Docker
The easiest way to run the analysis pipeline locally:
# Pull the latest image
docker pull ghcr.io/sean0418/connectopy:latest
# Run the pipeline (mount your data and output directories)
docker run -v /path/to/your/data:/app/data \
-v /path/to/output:/app/output \
ghcr.io/sean0418/connectopy:latest
# Run with options
docker run -v /path/to/data:/app/data \
-v /path/to/output:/app/output \
ghcr.io/sean0418/connectopy:latest --quick
# See all options
docker run ghcr.io/sean0418/connectopy:latest --help
Installation (Development)
# Clone the repository
git clone https://github.com/Sean0418/connectopy.git
cd connectopy
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode with all dependencies
pip install -e ".[dev,docs]"
# Install pre-commit hooks
pre-commit install
Running the Pipeline
# Run the full analysis pipeline
python Runners/run_pipeline.py
# Quick mode (skip PCA, VAE, and plots)
python Runners/run_pipeline.py --quick
# Skip specific steps
python Runners/run_pipeline.py --skip-vae --skip-plots
Pipeline Steps
| Step | Analysis | Output |
|---|---|---|
| 1 | Data Loading | Merged dataset |
| 2 | PCA Analysis | pca_variance.csv, pca_scores.csv |
| 3 | VAE Analysis | vae_latent.csv, vae_training_history.csv |
| 4 | Dimorphism Analysis | dimorphism_results.csv |
| 5 | ML Classification | ml_results.csv, ebm_results.csv |
| 6 | Mediation Analysis | mediation_results.csv |
| 7 | Visualization | output/plots/*.png |
Additional standalone analyses:
| Analysis | Runner | Output |
|---|---|---|
| Alcohol Classification | run_alcohol_analysis.py |
output/alcohol_analysis/ |
| Mediation (Extended) | run_mediation_hcp.py |
output/mediation_*.csv |
Mediation Analysis on HCP Data
For a comprehensive sex-stratified mediation analysis:
# Run mediation analysis on HCP data
python Runners/run_mediation_hcp.py
# Outputs:
# - output/mediation_results_full.csv
# - output/mediation_sex_comparison.csv
# - output/mediation_results_significant.csv
# - output/mediation_*.png (visualizations)
# - output/MEDIATION_ANALYSIS_RESULTS.pdf (report)
Alcohol Use Disorder Classification
Train Random Forest and Explainable Boosting Machine (EBM) classifiers to predict alcohol use disorder from brain connectome and cognitive features, stratified by sex:
# Run full analysis with all variants and models
python Runners/run_alcohol_analysis.py
# Run with specific variants only
python Runners/run_alcohol_analysis.py --variants tnpca
# Run Random Forest only
python Runners/run_alcohol_analysis.py --model-types rf
# Outputs:
# - output/alcohol_analysis/alcohol_classification_summary.csv
# - output/alcohol_analysis/models/ (trained model files)
# - output/alcohol_analysis/plots/roc/ (ROC curves)
# - output/alcohol_analysis/plots/importance/ (feature importance)
Python API
from connectopy import (
ConnectomeDataLoader,
DimorphismAnalysis,
ConnectomeRandomForest,
ConnectomeEBM,
)
from connectopy.models import get_cognitive_features, get_connectome_features
# Load data
loader = ConnectomeDataLoader("data/")
data = loader.load_merged_dataset()
# Analyze sexual dimorphism
analysis = DimorphismAnalysis(data)
features = [f"Struct_PC{i}" for i in range(1, 61)]
results = analysis.analyze(feature_columns=features)
# Get significant features
print(analysis.get_top_features(10))
# Train a classifier with CV and class imbalance handling
X = data[features].values
y = (data["Gender"] == "M").astype(int).values
clf = ConnectomeRandomForest()
metrics = clf.fit_with_cv(X, y, feature_names=features, handle_imbalance=True)
print(f"Test AUC: {metrics['test_auc']:.3f}")
print(f"Top biomarkers:\n{clf.get_top_features(5)}")
# Get feature sets for analysis
cog_features = get_cognitive_features(data) # HCP cognitive measures
conn_features = get_connectome_features(data, "tnpca") # TNPCA connectome features
Mediation Analysis
Test whether brain networks mediate the relationship between cognitive traits and alcohol outcomes:
from connectopy.analysis import SexStratifiedMediation
# Run sex-stratified mediation analysis
# Model: Cognitive → Brain Network → Alcohol Dependence
mediation = SexStratifiedMediation(n_bootstrap=1000)
result = mediation.fit(
data=df,
cognitive_col="FluidIntelligence",
brain_col="SC_PC1",
alcohol_col="AlcoholSeverity",
sex_col="Gender",
)
print(f"Male indirect effect: {result.male.indirect_effect:.4f}")
print(f"Female indirect effect: {result.female.indirect_effect:.4f}")
print(f"Sex difference significant: {result.diff_significant}")
Project Structure
connectopy/
├── src/
│ └── connectopy/ # Python package (src layout)
│ ├── data/ # Data loading (HCPLoader, preprocessing)
│ ├── analysis/ # PCA, VAE, dimorphism, mediation analysis
│ ├── models/ # ML classifiers (RF, XGBoost, EBM)
│ └── visualization/ # Plotting functions
├── Runners/ # Pipeline execution scripts
├── tests/ # Unit tests
├── docs/ # Sphinx documentation
├── .github/workflows/ # CI/CD pipelines
├── data/ # Data directory
│ ├── raw/ # Raw HCP data files
│ └── processed/ # Generated datasets
├── output/ # Analysis outputs
│ └── plots/ # Generated visualizations
├── Dockerfile # Container definition
└── pyproject.toml # Package configuration
Data
The package expects HCP data in the following structure:
data/
├── raw/
│ ├── SC/ # Structural Connectome .mat files
│ ├── FC/ # Functional Connectome .mat files
│ ├── TNPCA_Result/ # Tensor Network PCA coefficients
│ └── traits/ # Subject trait CSV files
└── processed/ # Generated datasets
Data Access: Raw data must be downloaded from ConnectomeDB after agreeing to HCP data usage terms.
Development
Running Tests
pytest
Linting
ruff check .
ruff format .
mypy src/connectopy/
Building Documentation
cd docs
make html
# View the docs:
# macOS
open _build/html/index.html
# Linux
xdg-open _build/html/index.html
# Windows
start _build/html/index.html
Building Docker Image Locally
docker build -t connectopy .
docker run -v $(pwd)/data:/app/data -v $(pwd)/output:/app/output connectopy
Reproducibility Checklist
| Feature | Status |
|---|---|
Python package with pyproject.toml |
✅ |
| 7-step automated analysis pipeline | ✅ |
| CI (linting, type checking, tests) | ✅ |
| Docker container (multi-arch: amd64 + arm64) | ✅ |
| GitHub Container Registry hosting | ✅ |
| Pre-commit hooks | ✅ |
| Sphinx documentation | ✅ |
| Reproducibility documentation | ✅ |
CI/CD
This project uses GitHub Actions for:
- CI (on every push/PR): Linting, type checking, tests across Python 3.10-3.12
- Docker (on push to main): Builds and pushes multi-arch images to GitHub Container Registry
Legacy R Code
The original R analysis is preserved in the code/ directory. The jasa-template git tag marks the state before Python refactoring.
git checkout jasa-template
Contributors
- Riley Harper
- Sean Shen
- Yinyu Yao
License
MIT License - see LICENSE file for details.
References
- Van Essen, D. C., et al. (2013). The WU-Minn Human Connectome Project: An overview. NeuroImage.
- Zhu, H., et al. (2019). Tensor Network Factorizations. NeuroImage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file connectopy-0.1.0.tar.gz.
File metadata
- Download URL: connectopy-0.1.0.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8198e647d5f13cf5b883d7eee4111fbfaac7e20649812006e43db8531202116
|
|
| MD5 |
e3251c1872ab1072f400b11ba0fa79e0
|
|
| BLAKE2b-256 |
54eb6c82879cf6fd0b20dd28f3aff15d0149a4dc382c6694ba052e4814ecff68
|
Provenance
The following attestation bundles were made for connectopy-0.1.0.tar.gz:
Publisher:
publish.yml on Sean0418/connectopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
connectopy-0.1.0.tar.gz -
Subject digest:
a8198e647d5f13cf5b883d7eee4111fbfaac7e20649812006e43db8531202116 - Sigstore transparency entry: 748174397
- Sigstore integration time:
-
Permalink:
Sean0418/connectopy@ca91a09006f9e2c85512ef4a10831624b8ef23ba -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Sean0418
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca91a09006f9e2c85512ef4a10831624b8ef23ba -
Trigger Event:
release
-
Statement type:
File details
Details for the file connectopy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: connectopy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 40.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7de7248bc3daf43536fc2a53e50e04ab205a39a9d99d438aa2d0d52a1c158e6
|
|
| MD5 |
1cb1a12634b2e788a4f33bf32843f7b6
|
|
| BLAKE2b-256 |
c2949399ec446af46ead61f2a84fefc5cc6b2edd8577932f60574c82f3f2014b
|
Provenance
The following attestation bundles were made for connectopy-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Sean0418/connectopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
connectopy-0.1.0-py3-none-any.whl -
Subject digest:
f7de7248bc3daf43536fc2a53e50e04ab205a39a9d99d438aa2d0d52a1c158e6 - Sigstore transparency entry: 748174409
- Sigstore integration time:
-
Permalink:
Sean0418/connectopy@ca91a09006f9e2c85512ef4a10831624b8ef23ba -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Sean0418
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca91a09006f9e2c85512ef4a10831624b8ef23ba -
Trigger Event:
release
-
Statement type: