Skip to main content

A framework for clustering longitudinal data -- Python port of the R package latrend.

Project description

latrend (Python)

CI Python 3.9+ License: GPL v2+

Python port of the R package latrend for longitudinal trajectory clustering.

This project is an explicit effort to give Python users the core functionality and workflow of the R latrend package, while keeping a familiar Python API.

latrend provides a standardised framework to cluster longitudinal (trajectory) data. The name is short for latent-class trend analysis. This Python port reproduces the core pipeline, plotting theme, and API conventions of the upstream R package so that analyses are interchangeable between languages.


Installation

# Install from PyPI (recommended for users)
pip install latrend
# Install latest from GitHub (before/without PyPI release)
pip install "git+https://github.com/s-rani1/latrend-py.git"
# Editable install (development)
pip install -e ".[dev,plot]"

Quickstart

import latrend as lt

# Built-in demo dataset (mirrors R's data(latrendData))
data = lt.latrendData()

# Or generate synthetic trajectories
data = lt.generateLongData(nIndividuals=200, nClusters=3, seed=1)

# Cluster with Linear-Mixed K-Means
method = lt.lcMethodLMKM(formula="Y ~ Time", nClusters=3, seed=1)
model  = lt.latrendCluster(method, data)

# Visualise cluster trajectories (ggplot2-style if plotnine installed)
p = lt.plotClusterTrajectories(model, ci=True)

# Save the plot
try:
    p.save("cluster_trajectories.png", dpi=150)          # plotnine
except AttributeError:
    p.figure.savefig("cluster_trajectories.png", dpi=150) # matplotlib

Features

Clustering methods

Method Class Description
Random baseline lcMethodRandom Assigns trajectories to clusters uniformly at random
KML-style lcMethodKML KMeans clustering on trajectory vectors (kml_fast/kml_strict)
Linear-mixed K-means lcMethodLMKM Per-individual linear regression + KMeans on coefficients
Feature-based lcMethodFeatures 20+ trajectory features + KMeans
R backend (any) lcMethodR / dynamic lcMethod* Delegates to the upstream R package via rpy2

Pipeline

# Single model
model = lt.latrendCluster(method, data)

# Batch: sweep over k = 1..6
models = lt.latrendBatchCluster(method, data, nClusters=range(1, 7))

# Repeated runs (different seeds) for stability
models = lt.latrendRepCluster(method, data, nRep=10)

# Model selection
best = models.bestModel(key="silhouette", maximize=True)

KML Parity Mode

Use kml_strict to better match R KML behavior via multi-start selection:

method = lt.lcMethodKML(
    nClusters=4,
    mode="kml_strict",      # or: "kml_fast"
    nStarts=20,
    nInit=100,
    maxIter=500,
    center=True,
    scale=False,
    distance="euclidean",
    seed=265368763,
)
model = lt.latrendCluster(method, data)

Plotting (R ggplot2-matching theme)

All plots use theme_light() styling and the ggplot2 default discrete colour palette (#F8766D, #00BA38, #619CFF, ...) so output looks identical to the R package.

lt.plotTrajectories(data)                          # Spaghetti plot
lt.plotTrajectories(model, facet=True)              # Faceted by cluster
lt.plotClusterTrajectories(model, ci=True)          # Mean + 95% CI ribbon
lt.plotClusterTrajectories(model, trajectories=True) # With individual overlay
lt.plotMetric(models)                               # Elbow / silhouette plot
lt.plotClassProportions(model)                      # Cluster size bar chart
lt.plotClassProbabilities(model)                    # Posterior histograms

Backends: Uses plotnine (ggplot2-like) when installed; falls back to matplotlib otherwise.

Reproducing R plot(kmlModel4)

from pathlib import Path
import pandas as pd
import latrend as lt
from plotnine import labs

# Example paths (repo-local)
repo = Path(".")
df = pd.read_csv(repo / "tests" / "data" / "latrend_data.csv").drop(
    columns=["Unnamed: 0"], errors="ignore"
)
assign = pd.read_csv(repo / "tests" / "data" / "kml_model4_assignments.csv")

# Build LCModel from fixed assignments
clusters = assign.set_index("Id")["Cluster"]
method = lt.LCMethod(id="Id", time="Time", outcome="Y", name="KML")
model = lt.LCModel(method=method, data=df[["Id", "Time", "Y"]], clusters=clusters)

# Equivalent of R's plot(kmlModel4): faceted assigned trajectories + black mean line
p = lt.plotClusterTrajectories(
    model,
    trajectories=True,
    backend="plotnine",
    figure_size=(7, 5.8),
    base_size=11,
)
p = p + labs(
    subtitle="Cluster trajectories for KML model with 4 clusters, along with the assigned trajectories."
)

p.save("docs/images/kml_model4_python_generated.png", dpi=150)

R plot(kmlModel4) equivalent generated by this Python code

Data utilities

lt.latrendData()                  # Built-in 200-trajectory dataset
lt.generateLongData(...)          # Custom synthetic data
lt.tsmatrix(data)                 # Long -> wide format
lt.tsframe(wide_matrix)           # Wide -> long format
lt.trajectories(method, data)     # Per-individual trajectory dict

Reporting

lt.lcModelReport(model, "output/")   # Markdown report + PNG plots

Optional R backend

If you have R + the R package latrend installed, any missing lcMethod* constructor is automatically delegated to R via rpy2:

pip install -e ".[r]"
method = lt.lcMethodLcmmGMM(formula="Y ~ Time", nClusters=3)
model  = lt.latrendCluster(method, data)  # runs in R

Project structure

latrend_py/
  src/latrend/
    __init__.py          # Public API
    core/                # LCMethod, LCModel, pipeline, matrix converters
    data/                # Data generation + built-in latrendData
    methods/             # lcMethodRandom, lcMethodKML, lcMethodLMKM, lcMethodFeatures, lcMethodR
    metrics/             # Silhouette score
    plots/               # All plotting functions + theme
    backends/            # rpy2-based R integration
    report.py            # Markdown report generator
  tests/
  .github/workflows/    # CI (Python 3.9-3.12)

Running tests

pytest -q

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Citation

If you use latrend (Python) in academic work, please cite this repository. Citation metadata is provided in CITATION.cff (GitHub will expose this via "Cite this repository").

License

GPL-2.0-or-later (aligned with the upstream R package).

Acknowledgements

This package is a Python port of the latrend R package by Niek Den Teuling (Philips Research).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latrend-0.1.1.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latrend-0.1.1-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file latrend-0.1.1.tar.gz.

File metadata

  • Download URL: latrend-0.1.1.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for latrend-0.1.1.tar.gz
Algorithm Hash digest
SHA256 675422dbb7a043f114fdb1e8d6c2b64204068945921313bd9b29216e44ec42e3
MD5 c0d0b0215657959fbbfd17aa3c81d4e9
BLAKE2b-256 46f822a025edbdf2c6bb1f7e58dcaab8ff79a1ccd140723d3dbe9240512a1c6f

See more details on using hashes here.

File details

Details for the file latrend-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: latrend-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for latrend-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8c0eafbdeaaf292f0a9fa3a8ec086e8881c6feb0ef90093bf3a641d90e99fa25
MD5 b089047d46891f46f2630cc9fb38ccf3
BLAKE2b-256 509584187e2613545098fc24a84d9a209aa67d61ae07af988591e444a8b14d8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page