Skip to main content

MutClust: Mutual rank-based coexpression, clustering and GO term enrichment analysis.

Project description

MutClust: Mutual Rank-Based Clustering and GO Enrichment Analysis

MutClust is a Python package designed for RNA-seq gene coexpression analyses. It performs mutual rank (MR)-based clustering of coexpressed genes and identifies enriched Gene Ontology (GO) terms for the resulting clusters. The package is optimized for speed, able to run a whole-genome coexpression analysis in minutes.


Features

  • Mutual Rank Analysis: Calculates MR from Pearson correlation coefficients to identify coexpressed genes.
  • Leiden Clustering: Groups genes into clusters based on mutual rank and exponential decay weights.
  • Gene Annotations: Merge cluster members with gene annotations, if provided.
  • GO Enrichment Analysis: Identifies enriched GO terms for each cluster using GOATOOLS.
  • Highly Configurable: Supports adjustable thresholds, resolution parameters, and multi-threading for performance optimization.

Installation

TODO: You can install MutClust directly from PyPI:

pip install mutclust

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/eporetsky/mutclust.git
cd mutclust
pip install .

Usage

MutClust provides a command-line interface (CLI) for running the full pipeline. After installation, you can use the mutclust command.

Command-Line Arguments

Argument Short Description Default
--expression -ex Path to the RNA-seq dataset (TSV format). -ex or -mr required
--mutual_rank -mr Path to Mutual Rank file (TSV format). -ex or -mr required
--annotations -a Path to the gene annotation file. Optional
--go_obo -go Path to the Gene Ontology (GO) OBO file. Optional
--go_gaf -gf Path to the GO annotation file (GAF format). Optional
--output -o Output prefix for the results. Required
--mr_threshold -m Mutual rank threshold for filtering. 100
--e_value -e Exponential decay constant. 10
--resolution -r Resolution parameter for Leiden clustering. 0.1
--threads -t Number of threads for correlation calculation. 4
--save_intermediate -t Number of threads for correlation calculation. Optional

Example Command

mutclust --expression data/AtCol-0.cpm.tsv \
         --annotations annotations/AtCol-0.annot.tsv \
         --go_obo go-basic.obo \
         --go_gaf tair.gaf \
         --output results/mutclust_output \
         --mr_threshold 100 \
         --e_value 10 \
         --resolution 0.1 \
         --threads 8

Input File Formats

RNA-seq Dataset

  • Format: Tab-separated values (TSV).
  • Columns: Gene IDs as row indices and samples as columns.
  • Example:
geneID    Sample1    Sample2    Sample3
GeneA     1.23       2.34       3.45
GeneB     4.56       5.67       6.78

Gene Annotation File

  • Format: Tab-separated values (TSV).
  • Columns: geneID and additional annotation fields.
  • Example:
geneID    description
GeneA     Photosynthesis-related protein
GeneB     Transcription factor

GO OBO File

  • Description: The Gene Ontology (GO) OBO file contains the ontology structure.
  • Source: Download from Gene Ontology.

GO GAF File

  • Description: The Gene Annotation File (GAF) maps genes to GO terms.
  • Source: Download from Gene Ontology.

Output Files

  1. Filtered MR and e-values (<output_prefix>.mrs.tsv):

    • Lists of coexpressed genes with MR and e-values.
    • Columns: cluster_id, geneID.

    Example:

    Gene1    Gene2    MR    ED
    GeneA    GeneB    10.2  0.39
    GeneB    GeneC    6     0.6
    
  2. Clustered Genes (<output_prefix>.clusters.tsv):

    • Lists genes in each cluster.
    • Annotation columns if provided.
    • Columns: cluster_id, geneID.

    Example:

    cluster_id    geneID    Annotations
    1             GeneA     ...
    1             GeneB     ...
    
  3. GO Enrichment Results (<output_prefix>_go_enrichment_results.tsv):

    • Contains enriched GO terms for each cluster.
    • Columns: cluster, type, size, term, p-val, FC, desc.

    Example:

    cluster    type    size    term       p-val       FC    desc
    1          BP      25      GO:0008150 0.00123     3.5   Biological Process
    

Dependencies

The following Python libraries are required and will be installed automatically:

  • numpy
  • pandas
  • pynetcor
  • python-igraph
  • goatools

License

This project is licensed under the MIT License. See the LICENSE file for details.


Contributing

Contributions, suggestions and issues are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutclust-0.1.1.tar.gz (8.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mutclust-0.1.1-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

MutClust-0.1.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file mutclust-0.1.1.tar.gz.

File metadata

  • Download URL: mutclust-0.1.1.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for mutclust-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f5612f4ebaa06fc6b44c71ce0e90162c4eb71f6d6e8616d5710bf5353c7c81d2
MD5 a96e617ed7c77ae909b4aa63f58eebed
BLAKE2b-256 d3c6ec92ae2c67a95bc35fd3b6829f9791362553a3390d061c45bf8448aa84fd

See more details on using hashes here.

File details

Details for the file mutclust-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mutclust-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for mutclust-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 815e17b7fb0fb10cf5ef046039cac17439b48b15d0b94ed162b89077aae734cb
MD5 ef13a5bec0a7a0659536104d514af545
BLAKE2b-256 36744deda38dfd3981d080c0535edae2effba8371552f88b2f9445afae263c84

See more details on using hashes here.

File details

Details for the file MutClust-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: MutClust-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for MutClust-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55c7ecbcc8a27e35dda8f20df29db5f3a6bfc8b61d1c736be1a0e21ea1ae4f51
MD5 2b5a1c44c07a9e21223a3bb28e7e4c81
BLAKE2b-256 eb5f4753467547ea7a8f54b8e97899444c73f18ea9d23a28cecca172c14db936

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page