CONCORD: Contrastive Learning for Cross-domain Reconciliation and Discovery

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

CONCORD: COntrastive learNing for Cross-dOmain Reconciliation and Discovery

Qin Zhu, Gartner Lab, UCSF

Description

Batch integration, denoising, and dimensionality reduction remain fundamental challenges in single-cell data analysis. While many machine learning tools aim to overcome these challenges by engineering model architectures, we use a different strategy, building on the insight that optimized mini-batch sampling during training can profoundly influence learning outcomes. We present CONCORD (COntrastive learNing for Cross-dOmain Reconciliation and Discovery), a self-supervised learning approach that implements a unified, probabilistic data sampling scheme combining neighborhood-aware and dataset-aware sampling: the former enhancing resolution while the latter removing batch effects. Using only a minimalist one-hidden-layer neural network and contrastive learning, CONCORD achieves state-of-the-art performance without relying on deep architectures, auxiliary losses, or supervision. It generates high-resolution cell atlases that seamlessly integrate data across batches, technologies, and species, without relying on prior assumptions about data structure. The resulting latent representations are denoised, interpretable, and biologically meaningful—capturing gene co-expression programs, resolving subtle cellular states, and preserving both local geometric relationships and global topological organization. We demonstrate CONCORD’s broad applicability across diverse datasets, establishing it as a general-purpose framework for learning unified, high-fidelity representations of cellular identity and dynamics.

Full Documentation available at https://qinzhu.github.io/Concord_documentation/.

Installation

It is recommended to use conda to create and set up a clean virtual environment for CONCORD.

1. Install PyTorch

You must install the correct version of PyTorch based on your system's CUDA setup. Follow the instructions on the official PyTorch website.

For CPU:

pip install torch torchvision torchaudio

For CUDA (adjust based on your GPU version):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

2. Install CONCORD (Stable or Development)

Stable Version (PyPI)

pip install concord-sc

Development Version (GitHub)

pip install git+https://github.com/Gartner-Lab/Concord.git

Optional Installations

(Recommended) Enable Additional Functionalities

For GO enrichment, benchmarking, and R integration, install:

pip install "concord-sc[optional]"

(Recommended) Install FAISS for Accelerated KNN Search

Note: If using Mac, you may need to disable FAISS when running Concord:
cur_ccd = ccd.Concord(adata=adata, input_feature=feature_list, use_faiss=False, device=device)

FAISS with GPU:
```
pip install faiss-gpu
```
FAISS with CPU:
```
pip install faiss-cpu
```

(Optional) Integration with VisCello

CONCORD integrates with the R package VisCello, a tool for interactive visualization.
To explore results interactively, visit VisCello GitHub for more details.

Getting Started

Concord integrates seamlessly with anndata objects. Single-cell datasets, such as 10x Genomics outputs, can easily be loaded into an annData object using the Scanpy package. If you're using R and have data in a Seurat object, you can convert it to anndata format by following this tutorial. In this quick-start example, we'll demonstrate CONCORD using the pbmc3k dataset provided by the scanpy package.

Load package and data

# Load required packages
import concord as ccd
import scanpy as sc
import torch
# Load and prepare example data
adata = sc.datasets.pbmc3k_processed()
adata = adata.raw.to_adata()  # Store raw counts in adata.X, by default Concord will run standard total count normalization and log transformation internally, not necessary if you want to use your normalized data in adata.X, if so, specify 'X' in cur_ccd.encode_adata(input_layer_key='X', output_key='Concord')

Run CONCORD:

# Set device to cpu or to gpu (if your torch has been set up correctly to use GPU), for mac you can use either torch.device('mps') or torch.device('cpu')
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# (Optional) Select top variably expressed/accessible features for analysis (other methods besides seurat_v3 available)
feature_list = ccd.ul.select_features(adata, n_top_features=5000, flavor='seurat_v3')

# Initialize Concord with an AnnData object, skip input_feature to use all features
cur_ccd = ccd.Concord(adata=adata, input_feature=feature_list, device=device) 

# If integrating data across batch, simply add the domain_key argument to indicate the batch key in adata.obs
# cur_ccd = ccd.Concord(adata=adata, input_feature=feature_list, domain_key='batch', device=device) 

# Encode data, saving the latent embedding in adata.obsm['Concord']
cur_ccd.encode_adata(output_key='Concord')

Visualization:

CONCORD latent embeddings can be directly used for downstream analyses such as visualization with UMAP and t-SNE or constructing k-nearest neighbor (kNN) graphs. Unlike PCA, it is important to utilize the full CONCORD latent embedding in downstream analyses, as each dimension is designed to capture meaningful and complementary aspects of the underlying data structure.

ccd.ul.run_umap(adata, source_key='Concord', result_key='Concord_UMAP', n_components=2, n_neighbors=15, min_dist=0.1, metric='euclidean')

# Plot the UMAP embeddings
color_by = ['n_genes', 'louvain'] # Choose which variables you want to visualize
ccd.pl.plot_embedding(
    adata, basis='Concord_UMAP', color_by=color_by, figsize=(10, 5), dpi=600, ncols=2, font_size=6, point_size=10, legend_loc='on data',
    save_path='Concord_UMAP.png'
)

The latent space produced by CONCORD often capture complex biological structures that may not be fully visualized in 2D projections. We recommend exploring the latent space using a 3D UMAP to more effectively capture and examine the intricacies of the data. For example:

ccd.ul.run_umap(adata, source_key='Concord', result_key='Concord_UMAP_3D', n_components=3, n_neighbors=15, min_dist=0.1, metric='euclidean')

# Plot the 3D UMAP embeddings
col = 'louvain'
fig = ccd.pl.plot_embedding_3d(
    adata, basis='Concord_UMAP_3D', color_by=col, 
    save_path='Concord_UMAP_3D.html',
    point_size=3, opacity=0.8, width=1500, height=1000
)

License

This project is licensed under the MIT License.
See the LICENSE file for details.

Citation

If you use CONCORD in your research, please cite the following preprint:

"Revealing a coherent cell state landscape across single-cell datasets with CONCORD"
bioRxiv, 2025

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.13

Jan 26, 2026

1.0.12

Nov 19, 2025

1.0.11

Nov 15, 2025

1.0.10

Oct 7, 2025

1.0.9

Sep 11, 2025

1.0.8

Sep 1, 2025

1.0.7

Jul 17, 2025

1.0.6

Jul 14, 2025

1.0.4

Jul 4, 2025

1.0.0

Jun 25, 2025

This version

0.9.7

Apr 30, 2025

0.9.6

Apr 27, 2025

0.9.5

Apr 18, 2025

0.9.4

Apr 17, 2025

0.9.3

Mar 17, 2025

0.9.2

Mar 17, 2025

0.9.1

Mar 17, 2025

0.9.0

Mar 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concord_sc-0.9.7.tar.gz (123.5 kB view details)

Uploaded Apr 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

concord_sc-0.9.7-py3-none-any.whl (140.4 kB view details)

Uploaded Apr 30, 2025 Python 3

File details

Details for the file concord_sc-0.9.7.tar.gz.

File metadata

Download URL: concord_sc-0.9.7.tar.gz
Upload date: Apr 30, 2025
Size: 123.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for concord_sc-0.9.7.tar.gz
Algorithm	Hash digest
SHA256	`bfda46777f3b6ec541d84a9670620a55bdbd99fbfab03674475c496ac9817fc8`
MD5	`c08fa782c0e979627f47f8c88b003968`
BLAKE2b-256	`7b7eafc7826dc853c736974af065a5092ff6256f1718222a4eb91d403cc4d52c`

See more details on using hashes here.

File details

Details for the file concord_sc-0.9.7-py3-none-any.whl.

File metadata

Download URL: concord_sc-0.9.7-py3-none-any.whl
Upload date: Apr 30, 2025
Size: 140.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for concord_sc-0.9.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e4bb076df103e0b8924b194311514e6adb7ad18a5e2dde31009da960bf4c3ba8`
MD5	`0acac4307e359f5038ac1f02b75d0538`
BLAKE2b-256	`7fc8887aee9b43aca00279f9a2c2c6944b32f6341d8ac1ef297a04773e3d5061`

See more details on using hashes here.

concord-sc 0.9.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CONCORD: COntrastive learNing for Cross-dOmain Reconciliation and Discovery

Description

Installation

1. Install PyTorch

2. Install CONCORD (Stable or Development)

Stable Version (PyPI)

Development Version (GitHub)

Optional Installations

(Recommended) Enable Additional Functionalities

(Recommended) Install FAISS for Accelerated KNN Search

(Optional) Integration with VisCello

Getting Started

Load package and data

Run CONCORD:

Visualization:

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes