Skip to main content

A framework for clustering longitudinal data -- Python port of the R package latrend.

Project description

latrend (Python)

CI Python 3.9+ License: GPL v2+

Python port of the R package latrend for longitudinal trajectory clustering.

This project is an explicit effort to give Python users the core functionality and workflow of the R latrend package, while keeping a familiar Python API.

latrend provides a standardised framework to cluster longitudinal (trajectory) data. The name is short for latent-class trend analysis. This Python port reproduces the core pipeline, plotting theme, and API conventions of the upstream R package so that analyses are interchangeable between languages.


Installation

# Install from PyPI (recommended for users)
pip install latrend
# Install latest from GitHub (before/without PyPI release)
pip install "git+https://github.com/s-rani1/latrend-py.git"
# Editable install (development)
pip install -e ".[dev,plot]"

Quickstart

import latrend as lt

# Built-in demo dataset (mirrors R's data(latrendData))
data = lt.latrendData()

# Or generate synthetic trajectories
data = lt.generateLongData(nIndividuals=200, nClusters=3, seed=1)

# Cluster with Linear-Mixed K-Means
method = lt.lcMethodLMKM(formula="Y ~ Time", nClusters=3, seed=1)
model  = lt.latrendCluster(method, data)

# Visualise cluster trajectories (ggplot2-style if plotnine installed)
p = lt.plotClusterTrajectories(model, ci=True)

# Save the plot
try:
    p.save("cluster_trajectories.png", dpi=150)          # plotnine
except AttributeError:
    p.figure.savefig("cluster_trajectories.png", dpi=150) # matplotlib

Features

Clustering methods

Method Class Description
Random baseline lcMethodRandom Assigns trajectories to clusters uniformly at random
Linear-mixed K-means lcMethodLMKM Per-individual linear regression + KMeans on coefficients
Feature-based lcMethodFeatures 20+ trajectory features + KMeans
R backend (any) lcMethodR / dynamic lcMethod* Delegates to the upstream R package via rpy2

Pipeline

# Single model
model = lt.latrendCluster(method, data)

# Batch: sweep over k = 1..6
models = lt.latrendBatchCluster(method, data, nClusters=range(1, 7))

# Repeated runs (different seeds) for stability
models = lt.latrendRepCluster(method, data, nRep=10)

# Model selection
best = models.bestModel(key="silhouette", maximize=True)

Plotting (R ggplot2-matching theme)

All plots use theme_light() styling and the ggplot2 default discrete colour palette (#F8766D, #00BA38, #619CFF, ...) so output looks identical to the R package.

lt.plotTrajectories(data)                          # Spaghetti plot
lt.plotTrajectories(model, facet=True)              # Faceted by cluster
lt.plotClusterTrajectories(model, ci=True)          # Mean + 95% CI ribbon
lt.plotClusterTrajectories(model, trajectories=True) # With individual overlay
lt.plotMetric(models)                               # Elbow / silhouette plot
lt.plotClassProportions(model)                      # Cluster size bar chart
lt.plotClassProbabilities(model)                    # Posterior histograms

Backends: Uses plotnine (ggplot2-like) when installed; falls back to matplotlib otherwise.

Reproducing R plot(kmlModel4)

from pathlib import Path
import pandas as pd
import latrend as lt
from plotnine import labs

# Example paths (repo-local)
repo = Path(".")
df = pd.read_csv(repo / "tests" / "data" / "latrend_data.csv").drop(
    columns=["Unnamed: 0"], errors="ignore"
)
assign = pd.read_csv(repo / "tests" / "data" / "kml_model4_assignments.csv")

# Build LCModel from fixed assignments
clusters = assign.set_index("Id")["Cluster"]
method = lt.LCMethod(id="Id", time="Time", outcome="Y", name="KML")
model = lt.LCModel(method=method, data=df[["Id", "Time", "Y"]], clusters=clusters)

# Equivalent of R's plot(kmlModel4): faceted assigned trajectories + black mean line
p = lt.plotClusterTrajectories(
    model,
    trajectories=True,
    backend="plotnine",
    figure_size=(7, 5.8),
    base_size=11,
)
p = p + labs(
    subtitle="Cluster trajectories for KML model with 4 clusters, along with the assigned trajectories."
)

p.save("kml_model4_like_R.png", dpi=150)

Data utilities

lt.latrendData()                  # Built-in 200-trajectory dataset
lt.generateLongData(...)          # Custom synthetic data
lt.tsmatrix(data)                 # Long -> wide format
lt.tsframe(wide_matrix)           # Wide -> long format
lt.trajectories(method, data)     # Per-individual trajectory dict

Reporting

lt.lcModelReport(model, "output/")   # Markdown report + PNG plots

Optional R backend

If you have R + the R package latrend installed, any missing lcMethod* constructor is automatically delegated to R via rpy2:

pip install -e ".[r]"
method = lt.lcMethodLcmmGMM(formula="Y ~ Time", nClusters=3)
model  = lt.latrendCluster(method, data)  # runs in R

Project structure

latrend_py/
  src/latrend/
    __init__.py          # Public API
    core/                # LCMethod, LCModel, pipeline, matrix converters
    data/                # Data generation + built-in latrendData
    methods/             # lcMethodRandom, lcMethodLMKM, lcMethodFeatures, lcMethodR
    metrics/             # Silhouette score
    plots/               # All plotting functions + theme
    backends/            # rpy2-based R integration
    report.py            # Markdown report generator
  tests/
  .github/workflows/    # CI (Python 3.9-3.12)

Running tests

pytest -q

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Citation

If you use latrend (Python) in academic work, please cite this repository. Citation metadata is provided in CITATION.cff (GitHub will expose this via "Cite this repository").

License

GPL-2.0-or-later (aligned with the upstream R package).

Acknowledgements

This package is a Python port of the latrend R package by Niek Den Teuling (Philips Research).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latrend-0.1.0.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latrend-0.1.0-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file latrend-0.1.0.tar.gz.

File metadata

  • Download URL: latrend-0.1.0.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for latrend-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2d9e4973f56fe6c70be6f5b35e4637ed915f83bfee2863ceb620397277056b50
MD5 2c2cea5ae7623a30c5c313223ee2ba4a
BLAKE2b-256 ce9a2953f8e7a65057bdc659952ce5b4bdf4b2e99ef41a68f6b7096d0ef4e392

See more details on using hashes here.

File details

Details for the file latrend-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: latrend-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for latrend-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04c8b0f7e0391582e418cf71be49d49b851b9ca272a8184335e51fb417885520
MD5 74ff5a092441096d691578b913468f71
BLAKE2b-256 b7e53f6fb9adc8e89875975879c43619e89925cdc125ed4bf21e8f8488f606a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page