Skip to main content

Un package pour l’analyse des trajectoires de soins par clustering

Project description

Trajectory Clustering Analysis (TCA)

🚀 Description

TrajectoryClusteringAnalysis (TCA) is a Python package designed to analyze and visualize individual trajectories over time using sequence clustering techniques. While initially developed for modeling healthcare trajectories (e.g., treatment sequences for cancer patients), TCA is versatile and can be applied to a wide range of life course data such as employment histories, education paths, or any form of individual longitudinal states.

🔍 Main Features

  • Unidimensional Analysis:
    • Modeling Care Trajectories: Representation of patients through chronological sequences of treatments.
  • Multidimensional Analysis:
    • Tensor Decomposition using the SWoTTeD model to identify and analyze complex, multi-event trajectories.
  • Flexible Distance Metrics: Includes Hamming, Levenshtein, DTW, Optimal Matching (OM), and GAK.
  • Clustering Algorithms:
    • Hierarchical clustering (CAH).
    • K-Medoids clustering (for robustness against noise):Clustering based on a precomputed distance matrix.
    • K-Means Clustering: Two methods available:
      • Clustering based on the frequency of states.
      • Clustering directly on the wide-format encoded sequences.
  • Visualization Tools: Heatmaps, dendrograms, cluster plots, etc.
  • Notebook Examples: Provided for quick experimentation.

📦 Installation

  1. Clone the repository:

    git clone https://github.com/QuanTIMLab/TrajectoryClusteringAnalysis.git
    cd TrajectoryClusteringAnalysis
    
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Install the package:

    pip install .
    

⚙️ Basic Usage

from trajectoryclusteringanalysis.tca import TCA

# Example data
trajectories = [
    ["Surgery", "Chemotherapy", "Radiotherapy"],
    ["Chemotherapy", "Radiotherapy"],
    ["Surgery", "Radiotherapy"]
]

# Preprocessing data

data_format

# Initialization and clustering
# Example for DataFrame input (ensure df_wide_format is defined, e.g., from pivoted data)
model = tca(data=df_wide_format,
            index_col='id',
            time_col=None,  # Not used in unidimensional analysis
            event_col=None,  # Not used in unidimensional analysis
            alphabet=["Surgery", "Chemotherapy", "Radiotherapy"],
            states=["Surgery State", "Chemotherapy State", "Radiotherapy State"],
            mode='unidimensional')

# Compute distance matrix (e.g., Hamming or Optimal Matching)
distance_matrix = model.compute_distance_matrix(metric='hamming')
# OR with optimal matching and custom costs:
# custom_costs = {'Surgery:Chemotherapy': 1, 'Surgery:Radiotherapy': 2, 'Chemotherapy:Radiotherapy': 3}
# sub_matrix = model.compute_substitution_cost_matrix(method='custom', custom_costs=custom_costs)
# distance_matrix = model.compute_distance_matrix(metric='optimal_matching', substitution_cost_matrix=sub_matrix, indel_cost=1.5)

# Hierarchical Clustering (CAH)
linkage_matrix = model.hierarchical_clustering(distance_matrix)
model.plot_dendrogram(linkage_matrix)
# Visualization
model.plot_clustermap(model.data,linkage_matrix,title="Clustermap of individuals")
# Assign clusters
clusters = model.assign_clusters(linkage_matrix, num_clusters=4)
model.plot_cluster_heatmaps(model.data,clusters,title='Heatmaps of Treatment Sequences by Cluster')

🔬 Applications

TCA is suitable for analyzing sequential data in various domains, such as:

  • Healthcare: Patient treatment pathways, diagnosis sequences

  • Social Sciences: Employment trajectories, education paths

  • Marketing: Customer journey modeling

  • Sociology/Demography: Life course studies

📁 Repository Structure

TrajectoryClusteringAnalysis/
├── data/                   # Example and demo datasets
├── Notebooks/               # Jupyter notebooks (examples)
├── src/
│   └── trajectoryclusteringanalysis/
│       ├── tca.py
│       ├── plotting.py
│       ├── utils.py
│       ├── logger.py
│       ├── images/                  # Visuals for documentation
│       ├── optimal_matching.pyx
│       ├── unidimensional/
│       └── multidimensional/
├── tests/                  # Unit tests
├── requirements.txt
├── setup.py
├── pyproject.toml
├── MANIFEST.in
├── LICENSE
└── README.md

🧪 Examples

Example notebooks are available in the Notebooks folder to illustrate different trajectory analyses.

🧪 Running Tests

To run the tests, use the following command:

python -m unittest discover -s tests

🤝 Contributing

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📧 Contact

Authors: DIENG Ndiaga & GREVET Nicolas
Email: ndiaga.dieng@univ-amu.fr Email: nicolas.GREVET@univ-amu.fr


© 2024 - Trajectory Clustering Analysis (TCA). All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trajectoryclusteringanalysis-0.0.2a1.tar.gz (171.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trajectoryclusteringanalysis-0.0.2a1-cp311-cp311-win_amd64.whl (235.3 kB view details)

Uploaded CPython 3.11Windows x86-64

File details

Details for the file trajectoryclusteringanalysis-0.0.2a1.tar.gz.

File metadata

File hashes

Hashes for trajectoryclusteringanalysis-0.0.2a1.tar.gz
Algorithm Hash digest
SHA256 45d431ad2589cbdbded6bfe43b8892d099b0c9a66054b2e8b2fddb68136707d8
MD5 67c9034c29a3fb8905cf9ac9ea072453
BLAKE2b-256 ac6baf1f942ab2a861066418cb9e800bb04eee58ba8cb9b5263398dc1405c499

See more details on using hashes here.

File details

Details for the file trajectoryclusteringanalysis-0.0.2a1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for trajectoryclusteringanalysis-0.0.2a1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 27019d581ab63733ee8efbe9721e43cd8148db7ebeb96a997fe5b43afcc07bba
MD5 81d1d3a8e6c7523fdd9f948f271c1b4c
BLAKE2b-256 7fa55cf176fa23e9ae1685cf8129f121a47709014ab425a5116bbbcf9b51b6b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page