MutClust: Mutual rank-based coexpression, clustering and GO term enrichment analysis.
Project description
MutClust: Mutual Rank-Based Clustering and GO Enrichment Analysis
MutClust is a Python package designed for RNA-seq gene coexpression analyses. It performs mutual rank (MR)-based clustering of coexpressed genes and identifies enriched Gene Ontology (GO) terms for the resulting clusters. The package is optimized for speed, able to run a whole-genome coexpression analysis in minutes.
Features
- Mutual Rank Analysis: Calculates MR from Pearson correlation coefficients to identify coexpressed genes.
- Leiden Clustering: Groups genes into clusters based on mutual rank and exponential decay weights.
- Gene Annotations: Merge cluster members with gene annotations, if provided.
- GO Enrichment Analysis: Identifies enriched GO terms for each cluster using GOATOOLS.
- Highly Configurable: Supports adjustable thresholds, resolution parameters, and multi-threading for performance optimization.
Installation
TODO: You can install MutClust directly from PyPI:
pip install mutclust
Alternatively, you can clone the repository and install it locally:
git clone https://github.com/eporetsky/mutclust.git
cd mutclust
pip install .
Usage
MutClust provides a command-line interface (CLI) for running the full pipeline. After installation, you can use the mutclust command.
Command-Line Arguments
| Argument | Short | Description | Default |
|---|---|---|---|
--expression |
-ex |
Path to the RNA-seq dataset (TSV format). | -ex or -mr required |
--mutual_rank |
-mr |
Path to Mutual Rank file (TSV format). | -ex or -mr required |
--annotations |
-a |
Path to the gene annotation file. | Optional |
--go_obo |
-go |
Path to the Gene Ontology (GO) OBO file. | Optional |
--go_gaf |
-gf |
Path to the GO annotation file (GAF format). | Optional |
--output |
-o |
Output prefix for the results. | Required |
--mr_threshold |
-m |
Mutual rank threshold for filtering. | 100 |
--e_value |
-e |
Exponential decay constant. | 10 |
--resolution |
-r |
Resolution parameter for Leiden clustering. | 0.1 |
--threads |
-t |
Number of threads for correlation calculation. | 4 |
--save_intermediate |
-t |
Number of threads for correlation calculation. | Optional |
Example Command
mutclust --expression data/AtCol-0.cpm.tsv \
--annotations annotations/AtCol-0.annot.tsv \
--go_obo go-basic.obo \
--go_gaf tair.gaf \
--output results/mutclust_output \
--mr_threshold 100 \
--e_value 10 \
--resolution 0.1 \
--threads 8
Input File Formats
RNA-seq Dataset
- Format: Tab-separated values (TSV).
- Columns: Gene IDs as row indices and samples as columns.
- Example:
geneID Sample1 Sample2 Sample3
GeneA 1.23 2.34 3.45
GeneB 4.56 5.67 6.78
Gene Annotation File
- Format: Tab-separated values (TSV).
- Columns:
geneIDand additional annotation fields. - Example:
geneID description
GeneA Photosynthesis-related protein
GeneB Transcription factor
GO OBO File
- Description: The Gene Ontology (GO) OBO file contains the ontology structure.
- Source: Download from Gene Ontology.
GO GAF File
- Description: The Gene Annotation File (GAF) maps genes to GO terms.
- Source: Download from Gene Ontology.
Output Files
-
Filtered MR and e-values (
<output_prefix>.mrs.tsv):- Lists of coexpressed genes with MR and e-values.
- Columns:
cluster_id,geneID.
Example:
Gene1 Gene2 MR ED GeneA GeneB 10.2 0.39 GeneB GeneC 6 0.6
-
Clustered Genes (
<output_prefix>.clusters.tsv):- Lists genes in each cluster.
- Annotation columns if provided.
- Columns:
cluster_id,geneID.
Example:
cluster_id geneID Annotations 1 GeneA ... 1 GeneB ...
-
GO Enrichment Results (
<output_prefix>_go_enrichment_results.tsv):- Contains enriched GO terms for each cluster.
- Columns:
cluster,type,size,term,p-val,FC,desc.
Example:
cluster type size term p-val FC desc 1 BP 25 GO:0008150 0.00123 3.5 Biological Process
Dependencies
The following Python libraries are required and will be installed automatically:
numpypandaspynetcorpython-igraphgoatools
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
Contributions, suggestions and issues are welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mutclust-0.1.1.tar.gz.
File metadata
- Download URL: mutclust-0.1.1.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5612f4ebaa06fc6b44c71ce0e90162c4eb71f6d6e8616d5710bf5353c7c81d2
|
|
| MD5 |
a96e617ed7c77ae909b4aa63f58eebed
|
|
| BLAKE2b-256 |
d3c6ec92ae2c67a95bc35fd3b6829f9791362553a3390d061c45bf8448aa84fd
|
File details
Details for the file mutclust-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mutclust-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
815e17b7fb0fb10cf5ef046039cac17439b48b15d0b94ed162b89077aae734cb
|
|
| MD5 |
ef13a5bec0a7a0659536104d514af545
|
|
| BLAKE2b-256 |
36744deda38dfd3981d080c0535edae2effba8371552f88b2f9445afae263c84
|
File details
Details for the file MutClust-0.1.1-py3-none-any.whl.
File metadata
- Download URL: MutClust-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55c7ecbcc8a27e35dda8f20df29db5f3a6bfc8b61d1c736be1a0e21ea1ae4f51
|
|
| MD5 |
2b5a1c44c07a9e21223a3bb28e7e4c81
|
|
| BLAKE2b-256 |
eb5f4753467547ea7a8f54b8e97899444c73f18ea9d23a28cecca172c14db936
|