Skip to main content

Gaussian Mixture Model-based thresholding for single-cell gene expression analysis

Project description

cc_mapping# Cell Cycle Mapping Package

PyPI version## Step 1: Install Environment

Python 3.10+

License: MITFrom the root directory of this repository:

Gaussian Mixture Model-based thresholding for single-cell gene expression analysis```

conda env create -f .\environments\cc_mapping.yml

cc_mapping provides robust statistical methods for categorizing cells based on gene expression levels using Gaussian Mixture Models (GMMs). Originally developed for cell cycle analysis, it's applicable to any single-cell RNA-seq thresholding task.```

Features## Step 2: Update Global Variables

  • 🎯 Automatic thresholding using GMM-based statistical inferenceDue to the fact this is not an actual package, whenever you want to use it, you will have to tell your computer where to look. You will need to update these two files:

  • 📊 Single & sequential thresholding for simple or complex categorization schemes

  • 🔄 Sequential refinement to progressively narrow down cell populations* cc_mapping\GLOBAL_VARIABLES\GLOBAL_VARIABLES.py

  • 📈 Built-in visualization with density plots and QC metrics* notebooks\GLOBAL_VARIABLES\GLOBAL_VARIABLES.py

  • 🧪 AnnData integration for seamless single-cell workflow compatibility

  • ⚙️ Flexible configuration with manual threshold overrides when neededReplace the variable 'cc_mapping_package_dir' with the path to the root directory for the cc_mapping repository.

InstallationThis means that if you want to use the cc_mapping package in another folder, you should copy this GLOBAL VARIABLES folder into that directory and add this to the imports of your python scripts

Install from PyPI using pip:```

import sys

pip install cc-mapping

```from GLOBAL_VARIABLES.GLOBAL_VARIABLES import cc_mapping_package_dir

sys.path.append(cc_mapping_package_dir)

Or using Poetry:```



```bash## Step 3: Use the cc_mapping package!

poetry add cc-mapping

```There is a test notebook in the notebooks directory of this repository that can be used as an example.

## Quick Start

```python
import anndata as ad
from cc_mapping import GMMThresholding

# Load your AnnData object
adata = ad.read_h5ad('your_data.h5ad')

# Initialize thresholding for a gene
gmm = GMMThresholding(
    adata=adata,
    feature='PCNA',  # Gene name
    label_obs_save_str='PCNA_categories'
)

# Fit GMM with automatic component selection
gmm.fit(n_components=2)

# Categorize cells
gmm.categorize_samples(ordered_labels=['Low', 'High'])

# Get updated AnnData with new categories
adata = gmm.return_adata()

# Visualize results
fig = gmm.plot_density()
fig.savefig('pcna_thresholding.png')

Sequential Thresholding

For more complex categorization schemes (e.g., cell cycle phases):

from cc_mapping import SequentialGMM

# Initialize sequential thresholding
seq_gmm = SequentialGMM(
    adata=adata,
    features=['PCNA', 'CDK1'],
    parent_labels=['All'],
    ordered_labels_list=[
        ['PCNA-', 'PCNA+'],
        ['CDK1-', 'CDK1+']
    ]
)

# Run sequential refinement
seq_gmm.fit_all(n_components_list=[2, 2])
adata = seq_gmm.return_adata()

# Collapse labels to final categories
seq_gmm.collapse_labels(
    final_labels=['G0', 'G1', 'S', 'G2M'],
    collapse_map={
        'PCNA-_CDK1-': 'G0',
        'PCNA+_CDK1-': 'G1',
        'PCNA+_CDK1+': ['S', 'G2M']
    }
)

Boolean Label Operations

Combine categorical observations with boolean logic:

from cc_mapping import create_boolean_label_combination

adata = create_boolean_label_combination(
    adata=adata,
    obs_key_1='treatment',
    match_values_1=['control'],
    obs_key_2='cell_cycle',
    match_values_2=['G0'],
    operator='AND',
    output_obs_key='control_G0',
    true_label='control_G0',
    false_label='other'
)

Documentation

For detailed documentation, tutorials, and API reference, visit our documentation.

Examples

  • Single thresholding: See notebooks/Single_Thresholding_Workflow.ipynb
  • Sequential thresholding: See notebooks/Sequential_Thresholding_Workflow.ipynb
  • CSV to AnnData: See notebooks/CSV_to_Anndata.ipynb

Requirements

  • Python ≥ 3.10
  • AnnData ≥ 0.10.0
  • NumPy < 2.0.0
  • scikit-learn ≥ 1.3.0
  • pandas ≥ 2.0.0
  • matplotlib ≥ 3.7.0
  • scipy ≥ 1.11.0
  • pydantic ≥ 2.10.0

Citation

If you use cc_mapping in your research, please cite:

@software{cc_mapping,
  author = {Your Name},
  title = {cc_mapping: GMM-based thresholding for single-cell analysis},
  year = {2025},
  url = {https://github.com/StallaertLab/cc_mapping}
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cc_mapping-0.2.3.tar.gz (64.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cc_mapping-0.2.3-py3-none-any.whl (74.6 kB view details)

Uploaded Python 3

File details

Details for the file cc_mapping-0.2.3.tar.gz.

File metadata

  • Download URL: cc_mapping-0.2.3.tar.gz
  • Upload date:
  • Size: 64.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.9 Windows/10

File hashes

Hashes for cc_mapping-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a664f0742a368b11fcc2824e19ec31d870eddce7ec07285246891a07d3c39a02
MD5 4e92fe61a6b5e75b1be266e8f31ef864
BLAKE2b-256 6ed4855851d3f23f09d11eb51f503bc7d86efc46a8c8c903f6a1d49f666ec68a

See more details on using hashes here.

File details

Details for the file cc_mapping-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: cc_mapping-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 74.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.9 Windows/10

File hashes

Hashes for cc_mapping-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 25eee663eb5f9be69fea9dbd3b8cd829c706d63c3e6ee385f96e290a3cb0f873
MD5 d68b689831a8a7065b2ebf1365eb5345
BLAKE2b-256 9c35dd1b833041f38f99b20384ee6efb61064f2caac25fcb58a7c4495b458b0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page