Python library for longitudinal depression trajectories workflows.
Project description
Longitudinal Depression Trajectories Toolkit
About The Project
The Longitudinal Depression Trajectories Toolkit (LDT-Toolkit) initiative is designed for social, medical, and clinical researchers who work with repeated-measure data and require a stepping-stone path from raw cohort files to downstream modelling.
LDT-Toolkit is intended as a general toolkit for longitudinal depression trajectory exploration. It is currently using the Millennium Cohort Study (MCS) from CLS/UCL as a proof-of-concept case study.
The initiative delivers two interconnected components. First, ldt-toolkit (this repository) is the Python engine of tools and reproducible pipelines to accelerate exploration of longitudinal studies toward downstream modelling, while remaining fully usable in Python scripts or notebooks. Second, ldt is a fully interactive Go CLI with a no-code terminal interface for running and orchestrating the toolkit from start to finish.
The toolset supports two broad lines of exploration. Playground methods allow researchers to quickly iterate on their own datasets by running operations across data preparation, data preprocessing, and machine learning phases. With Presets, researchers can run stage-level reproducible pipelines for a given longitudinal study, and community contributions are encouraged so this can grow into a reusable collection of presets for preparing, preprocessing, and modelling longitudinal datasets.
And last, within the data preprocessing stage, we offer a brand-new and novel Trajectories Builder Playground: apply existing trajectory-building algorithms to your longitudinal datasets, or design and submit your own builders to help shape a community consensus around reproducible depression-trajectory construction.
Setup And Launch
[!IMPORTANT] Use both components:
ldt-toolkit(Python toolkit) andldt(Go CLI). The CLI is the primary no-code interface and is highly recommended.
[!WARNING] Windows is not supported by default yet. As a workaround, run the toolkit inside a Docker Linux environment, or use a macOS-based machine. Please open an issue if you want to contribute Windows support.
Machine Requirements (R, Python, uv, Go)
Install the following first:
- Python
3.10to3.12 uv- Go (for
ldtCLI) - R +
Rscript(required only when runningLCGA/GMMtools throughlcmm)
Quick checks:
python --version || python3 --version
uv --version
go version
Rscript --version
Example installs by OS:
# macOS (Homebrew)
brew update
brew install python@3.12 uv go r
# Linux (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install -y python3 python3-pip golang r-base
curl -LsSf https://astral.sh/uv/install.sh | sh
1) Install ldt-toolkit (Python)
git clone https://github.com/OWNER/ldt-toolkit.git
cd ldt-toolkit
uv python pin 3.10
uv sync
Validate the install:
uv run python -c "import ldt; print(ldt.__file__)"
If you plan to run LCGA/GMM, install R-side dependencies:
Rscript --vanilla setup_R/install_requirements.R
2) Install ldt (Go CLI, essential no-code interface)
git clone https://github.com/OWNER/ldt.git
cd ldt
make build
Install it for your shell:
# bash/zsh users
make install-bash
# fish users
make install-fish
If your CLI repo uses short aliases, make bash / make fish are equivalent.
Reload your shell config, then run:
ldt
[!TIP] The
ldtCLI is the fastest way to explore the toolkit with no code. Use Python directly when you want low-level control.
Getting Started W/ the Toolset
(1) Generate Synthetic Data (Multi-Technique)
from pathlib import Path
from ldt.data_preparation import EventShockRecovery, MissingDataScenarios, TrendPatterns
out = Path("/path/to/your/project/data")
out.mkdir(parents=True, exist_ok=True)
TrendPatterns(n_samples=400, n_waves=5, random_state=7).prepare().to_csv(
out / "synthetic_trend_patterns.csv", index=False
)
EventShockRecovery().prepare(
n_samples=400,
n_waves=5,
random_state=7,
feature_cols=["depressive_score"],
shock_wave=3,
shock_mean=3.5,
recovery_rate=0.9,
noise_sd=0.8,
).to_csv(out / "synthetic_event_shock.csv", index=False)
MissingDataScenarios().prepare(
n_samples=400,
n_waves=5,
random_state=7,
feature_cols=["depressive_score", "sleep_score"],
mechanism="mixed",
missing_rate=0.20,
dropout_rate=0.10,
mar_strength=1.0,
).to_csv(out / "synthetic_with_missing.csv", index=False)
(2) Build Trajectories (Two Techniques) + ShowTable
from pathlib import Path
from ldt.data_preprocessing import BuildTrajectories, ShowTable
input_long = Path("/path/to/your/project/data/synthetic_trend_patterns.csv")
out = Path("/path/to/your/project/outputs")
out.mkdir(parents=True, exist_ok=True)
BuildTrajectories().fit_preprocess(
mode="from_scratch",
input_path=input_long,
output_path=out / "trajectories_dtw_kmeans.csv",
id_col="subject_id",
time_col="wave",
value_cols=["depressive_score"],
builder="dtw_kmeans",
n_trajectories=4,
)
BuildTrajectories().fit_preprocess(
mode="from_scratch",
input_path=input_long,
output_path=out / "trajectories_clusterMLD.csv",
id_col="subject_id",
time_col="wave",
value_cols=["depressive_score"],
builder="clusterMLD",
n_trajectories=4,
)
ShowTable().fit_preprocess(
input_path=out / "trajectories_clusterMLD.csv",
output_html=out / "trajectories_clusterMLD_report.html",
open_browser=False,
)
(3) End-to-End: Synthetic Data to Standard ML
from pathlib import Path
from ldt.data_preparation import MissingDataScenarios
from ldt.data_preprocessing import (
AggregateLongToCrossSectional,
BuildTrajectories,
CleanDataset,
CombineDatasetWithTrajectories,
MissingImputation,
)
from ldt.machine_learning import StandardMachineLearning
root = Path("/path/to/your/project")
raw_long = root / "data/synthetic_long_with_missing.csv"
clean_long = root / "outputs/long_clean.csv"
imputed_long = root / "outputs/long_imputed.csv"
trajectories = root / "outputs/trajectories.csv"
cross_sectional = root / "outputs/cross_sectional.csv"
model_ready = root / "outputs/model_ready.csv"
long_df = MissingDataScenarios().prepare(
n_samples=1200,
n_waves=6,
random_state=42,
feature_cols=["depressive_score", "sleep_score", "anxiety_score"],
mechanism="mixed",
missing_rate=0.20,
dropout_rate=0.15,
mar_strength=1.10,
)
raw_long.parent.mkdir(parents=True, exist_ok=True)
long_df.to_csv(raw_long, index=False)
CleanDataset().fit_preprocess(input_path=raw_long, output_path=clean_long)
MissingImputation().fit_preprocess(
technique="mice_imputation",
input_path=clean_long,
output_path=imputed_long,
random_state=42,
)
BuildTrajectories().fit_preprocess(
mode="from_scratch",
input_path=imputed_long,
output_path=trajectories,
id_col="subject_id",
time_col="wave",
value_cols=["depressive_score"],
builder="clusterMLD",
n_trajectories=4,
)
AggregateLongToCrossSectional().fit_preprocess(
input_path=imputed_long,
output_path=cross_sectional,
subject_id_col="subject_id",
numeric_columns=["depressive_score", "sleep_score", "anxiety_score", "age_baseline"],
numeric_agg="mean",
)
CombineDatasetWithTrajectories().fit_preprocess(
input_original_data_path=cross_sectional,
input_trajectories_data_path=trajectories,
output_path=model_ready,
original_id_col="subject_id",
trajectory_id_col="subject_id",
merge_type="left",
trajectory_columns=["trajectory_id", "trajectory_name"],
)
ml_result = StandardMachineLearning().fit_predict(
technique="run_experiment",
input_path=model_ready,
target_column="trajectory_id",
feature_columns="depressive_score,sleep_score,anxiety_score,age_baseline",
estimator_key="random_forest",
metric_keys="accuracy,f1_macro",
cv_folds=5,
validation_split="none",
multiclass_mode="multiclass",
random_seed=42,
output_dir=str(root / "outputs/standard_ml"),
)
print(ml_result["mean_score"], ml_result["report_path"])
But there is more: explore the full documentation for the complete tool and presets catalogue.
Citation
@software{Provost_Longitudinal_Depression_Trajectories_Toolkit,
author = {Provost, Simon and Branco, Bianca and Kwong, Alex},
title = {{Longitudinal Depression Trajectories Toolkit: Machine Learning for Longitudinal Studies}},
version = {0.0.1}
}
Use GitHub's "Cite this repository" for citation metadata updates.
License, Data, Security
- Access to Millennium Cohort Study data is governed by CLS/UCL data access rules.
- Review SECURITY.md for handling expectations.
- Licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ldt_toolkit-0.0.1.tar.gz.
File metadata
- Download URL: ldt_toolkit-0.0.1.tar.gz
- Upload date:
- Size: 201.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3ff8ef5338222e340829282c30a5d722c1015f6fed7ebfd9b58ebb8d7e7d7ab
|
|
| MD5 |
a10dab3f76d8a48eeefde20a29a911f8
|
|
| BLAKE2b-256 |
9200eaedad6ec3e016e1ef9db0ffa5af601c077f7f4df0bbb16a9c6f2e41e671
|
File details
Details for the file ldt_toolkit-0.0.1-py3-none-any.whl.
File metadata
- Download URL: ldt_toolkit-0.0.1-py3-none-any.whl
- Upload date:
- Size: 317.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
032ecb342bcc0a4e39944357a7f828e4f2c8029c1b1eb21efd5e610a15c93435
|
|
| MD5 |
fd358239f39739d1ccea4b9c14b5fa8a
|
|
| BLAKE2b-256 |
a1c2aa268116f3b72e2306a7f9c0b0343b08c676c226410cc3aa6bc80539bb00
|