ldt-toolkit

Python library for longitudinal depression trajectories workflows.

These details have not been verified by PyPI

Project description

Longitudinal Depression Trajectories Toolkit

Documentation (placeholder) - LEAP Group

About The Project

The Longitudinal Depression Trajectories Toolkit (LDT-Toolkit) initiative is designed for social, medical, and clinical researchers who work with repeated-measure data and require a stepping-stone path from raw cohort files to downstream modelling.

LDT-Toolkit is intended as a general toolkit for longitudinal depression trajectory exploration. It is currently using the Millennium Cohort Study (MCS) from CLS/UCL as a proof-of-concept case study.

The initiative delivers two interconnected components. First, ldt-toolkit (this repository) is the Python engine of tools and reproducible pipelines to accelerate exploration of longitudinal studies toward downstream modelling, while remaining fully usable in Python scripts or notebooks. Second, ldt is a fully interactive Go CLI with a no-code terminal interface for running and orchestrating the toolkit from start to finish.

The toolset supports two broad lines of exploration. Playground methods allow researchers to quickly iterate on their own datasets by running operations across data preparation, data preprocessing, and machine learning phases. With Presets, researchers can run stage-level reproducible pipelines for a given longitudinal study, and community contributions are encouraged so this can grow into a reusable collection of presets for preparing, preprocessing, and modelling longitudinal datasets.

And last, within the data preprocessing stage, we offer a brand-new and novel Trajectories Builder Playground: apply existing trajectory-building algorithms to your longitudinal datasets, or design and submit your own builders to help shape a community consensus around reproducible depression-trajectory construction.

Setup And Launch

[!IMPORTANT] Use both components: ldt-toolkit (Python toolkit) and ldt (Go CLI). The CLI is the primary no-code interface and is highly recommended.

Machine Requirements (R, Python, uv, Go)

Install the following first:

Python 3.10 to 3.12
uv
Go (for ldt CLI)
R + Rscript (required only when running LCGA/GMM tools through lcmm)

Quick checks:

python --version || python3 --version
uv --version
go version
Rscript --version

Example installs by OS:

# macOS (Homebrew)
brew update
brew install python@3.12 uv go r

# Linux (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install -y python3 python3-pip golang r-base
curl -LsSf https://astral.sh/uv/install.sh | sh

Install `ldt-toolkit` (PyPi Index)

uv add ldt-toolkit
# 👆 We recommend using `uv` as a state-of-the-art Python Package Manager
# 👆 Make sure to have `uv`'s utilising the right Python version. `uv ptyhon pin 3.10` for example, do the trick.

If you plan to run LCGA/GMM, install R-side dependencies:

Rscript --vanilla -e "repos <- 'https://cloud.r-project.org'; required_packages <- c('lcmm'); missing <- setdiff(required_packages, rownames(installed.packages())); if (length(missing)) install.packages(missing, repos = repos) else message('All required R packages are already installed.')"
# 👇 If you have cloned the repo, you can also run:
Rscript --vanilla setup_R/install_requirements.R

Install the `ldt` GO CLI (No-code Terminal Interface)

Homebrew (macOS/Linux):

brew tap Longitudinal-Depression-Toolkit/homebrew-tap
brew install ldt

Voila! Now, you can run:

ldt

[!WARNING] The full ldt-toolkit workflow is not supported by default on Windows yet. As a workaround, run the toolkit inside a Docker Linux environment, or use a macOS-based machine. Please open an issue if you want to contribute Windows support.

Getting Started W/ the Toolset

(1) Generate Synthetic Data (Multi-Technique)

from pathlib import Path

from ldt.data_preparation import EventShockRecovery, MissingDataScenarios, TrendPatterns

out = Path("/path/to/your/project/data")
out.mkdir(parents=True, exist_ok=True)

TrendPatterns(n_samples=400, n_waves=5, random_state=7).prepare().to_csv(
    out / "synthetic_trend_patterns.csv", index=False
)

EventShockRecovery().prepare(
    n_samples=400,
    n_waves=5,
    random_state=7,
    feature_cols=["depressive_score"],
    shock_wave=3,
    shock_mean=3.5,
    recovery_rate=0.9,
    noise_sd=0.8,
).to_csv(out / "synthetic_event_shock.csv", index=False)

MissingDataScenarios().prepare(
    n_samples=400,
    n_waves=5,
    random_state=7,
    feature_cols=["depressive_score", "sleep_score"],
    mechanism="mixed",
    missing_rate=0.20,
    dropout_rate=0.10,
    mar_strength=1.0,
).to_csv(out / "synthetic_with_missing.csv", index=False)

(2) Build Trajectories (Two Techniques) + ShowTable

from pathlib import Path

from ldt.data_preprocessing import BuildTrajectories, ShowTable

input_long = Path("/path/to/your/project/data/synthetic_trend_patterns.csv")
out = Path("/path/to/your/project/outputs")
out.mkdir(parents=True, exist_ok=True)

BuildTrajectories().fit_preprocess(
    mode="from_scratch",
    input_path=input_long,
    output_path=out / "trajectories_dtw_kmeans.csv",
    id_col="subject_id",
    time_col="wave",
    value_cols=["depressive_score"],
    builder="dtw_kmeans",
    n_trajectories=4,
)
BuildTrajectories().fit_preprocess(
    mode="from_scratch",
    input_path=input_long,
    output_path=out / "trajectories_clusterMLD.csv",
    id_col="subject_id",
    time_col="wave",
    value_cols=["depressive_score"],
    builder="clusterMLD",
    n_trajectories=4,
)

ShowTable().fit_preprocess(
    input_path=out / "trajectories_clusterMLD.csv",
    output_html=out / "trajectories_clusterMLD_report.html",
    open_browser=False,
)

(3) End-to-End: Synthetic Data to Standard ML

from pathlib import Path

from ldt.data_preparation import MissingDataScenarios
from ldt.data_preprocessing import (
    AggregateLongToCrossSectional,
    BuildTrajectories,
    CleanDataset,
    CombineDatasetWithTrajectories,
    MissingImputation,
)
from ldt.machine_learning import StandardMachineLearning

root = Path("/path/to/your/project")
raw_long = root / "data/synthetic_long_with_missing.csv"
clean_long = root / "outputs/long_clean.csv"
imputed_long = root / "outputs/long_imputed.csv"
trajectories = root / "outputs/trajectories.csv"
cross_sectional = root / "outputs/cross_sectional.csv"
model_ready = root / "outputs/model_ready.csv"

long_df = MissingDataScenarios().prepare(
    n_samples=1200,
    n_waves=6,
    random_state=42,
    feature_cols=["depressive_score", "sleep_score", "anxiety_score"],
    mechanism="mixed",
    missing_rate=0.20,
    dropout_rate=0.15,
    mar_strength=1.10,
)
raw_long.parent.mkdir(parents=True, exist_ok=True)
long_df.to_csv(raw_long, index=False)

CleanDataset().fit_preprocess(input_path=raw_long, output_path=clean_long)
MissingImputation().fit_preprocess(
    technique="mice_imputation",
    input_path=clean_long,
    output_path=imputed_long,
    random_state=42,
)
BuildTrajectories().fit_preprocess(
    mode="from_scratch",
    input_path=imputed_long,
    output_path=trajectories,
    id_col="subject_id",
    time_col="wave",
    value_cols=["depressive_score"],
    builder="clusterMLD",
    n_trajectories=4,
)
AggregateLongToCrossSectional().fit_preprocess(
    input_path=imputed_long,
    output_path=cross_sectional,
    subject_id_col="subject_id",
    numeric_columns=["depressive_score", "sleep_score", "anxiety_score", "age_baseline"],
    numeric_agg="mean",
)
CombineDatasetWithTrajectories().fit_preprocess(
    input_original_data_path=cross_sectional,
    input_trajectories_data_path=trajectories,
    output_path=model_ready,
    original_id_col="subject_id",
    trajectory_id_col="subject_id",
    merge_type="left",
    trajectory_columns=["trajectory_id", "trajectory_name"],
)

ml_result = StandardMachineLearning().fit_predict(
    technique="run_experiment",
    input_path=model_ready,
    target_column="trajectory_id",
    feature_columns="depressive_score,sleep_score,anxiety_score,age_baseline",
    estimator_key="random_forest",
    metric_keys="accuracy,f1_macro",
    cv_folds=5,
    validation_split="none",
    multiclass_mode="multiclass",
    random_seed=42,
    output_dir=str(root / "outputs/standard_ml"),
)
print(ml_result["mean_score"], ml_result["report_path"])

But there is more: explore the full documentation for the complete tool and presets catalogue.

Citation

@software{Provost_Longitudinal_Depression_Trajectories_Toolkit,
    author = {Provost, Simon and Branco, Bianca and Kwong, Alex},
    title = {{Longitudinal Depression Trajectories Toolkit: Machine Learning for Longitudinal Studies}},
    version = {0.0.1}
}

Use GitHub's "Cite this repository" for citation metadata updates.

License, Data, Security

Access to Millennium Cohort Study data is governed by CLS/UCL data access rules.
Review SECURITY.md for handling expectations.
Licensed under the MIT License.

Special thanks to @charm.land for their amazing TUI framework!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.4

Mar 6, 2026

This version

0.0.3

Feb 25, 2026

0.0.2

Feb 25, 2026

0.0.1

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ldt_toolkit-0.0.3.tar.gz (204.1 kB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ldt_toolkit-0.0.3-py3-none-any.whl (320.3 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file ldt_toolkit-0.0.3.tar.gz.

File metadata

Download URL: ldt_toolkit-0.0.3.tar.gz
Upload date: Feb 25, 2026
Size: 204.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.11

File hashes

Hashes for ldt_toolkit-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0bceeaa9d9d015380c807f80c6357d92e2b3d32581f24c440933c8701b44d947`
MD5	`b8f6fcecacc184ee9ae8bdad3851ad8b`
BLAKE2b-256	`98181edd52c53f49293361105ad3753c6bcba7b27d0f27cd335ba4fd3e9a70d8`

See more details on using hashes here.

File details

Details for the file ldt_toolkit-0.0.3-py3-none-any.whl.

File metadata

Download URL: ldt_toolkit-0.0.3-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 320.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.11

File hashes

Hashes for ldt_toolkit-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4d035fe3deb953a12bcea0befafef7408c5de81b93ceb8724a6b3b9001f1d8f`
MD5	`fc63fef4711444b3a4ba1519f919080d`
BLAKE2b-256	`be1f1cc47c302fe372858e1e8d93ce5aa59ccfcc6b291772517c5cfc4ff9c46d`

See more details on using hashes here.

ldt-toolkit 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

About The Project

Setup And Launch

Install `ldt-toolkit` (PyPi Index)

Install the `ldt` GO CLI (No-code Terminal Interface)

Getting Started W/ the Toolset

Citation

License, Data, Security

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

ldt-toolkit 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

About The Project

Setup And Launch

Install ldt-toolkit (PyPi Index)

Install the ldt GO CLI (No-code Terminal Interface)

Getting Started W/ the Toolset

Citation

License, Data, Security

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Install `ldt-toolkit` (PyPi Index)

Install the `ldt` GO CLI (No-code Terminal Interface)