Skip to main content

MutClust: Mutual rank-based coexpression, clustering and GO term enrichment analysis.

Project description

MutClust: Mutual Rank-Based Clustering and GO Enrichment Analysis

MutClust is a Python package designed for RNA-seq gene coexpression analyses. It performs mutual rank (MR)-based clustering of coexpressed genes and identifies enriched Gene Ontology (GO) terms for the resulting clusters. The package is optimized for speed, able to run a whole-genome coexpression analysis in minutes.


Features

  • Mutual Rank Analysis: Calculates MR from Pearson correlation coefficients to identify coexpressed genes.
  • Leiden Clustering: Groups genes into clusters based on mutual rank and exponential decay weights.
  • Gene Annotations: Merge cluster members with gene annotations, if provided.
  • GO Enrichment Analysis: Identifies enriched GO terms for each cluster using GOATOOLS.
  • Highly Configurable: Supports adjustable thresholds, resolution parameters, and multi-threading for performance optimization.

Installation

You can install MutClust directly from PyPI:

pip install mutclust

Note: Because of a known dependency issue with PyNetCor, MutClust is not currently available on MacOS through PyPI but installs properly on Linux.

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/eporetsky/mutclust.git
cd mutclust
pip install .

Docker Installation

For users who prefer containerized deployment, MutClust is available as a Docker container:

# Build the container
docker build -t mutclust .

# Run MutClust with your data
docker run -v /path/to/your/data:/data mutclust --expression /data/your_expression.tsv --output /data/results

The container uses Ubuntu 20.04 and includes all necessary dependencies. Mount your data directory to /data inside the container to access your files.


Usage

MutClust provides a command-line interface (CLI) for running the full pipeline. After installation, you can use the mutclust command.

Command-Line Arguments

Argument Short Description Default
--expression -ex Path to the RNA-seq dataset (TSV format). -ex or -mr required
--mutual_rank -mr Path to Mutual Rank file (TSV format). -ex or -mr required
--annotations -a Path to the gene annotation file. Optional
--go_obo -go Path to the Gene Ontology (GO) OBO file. Optional
--go_gaf -gf Path to the GO annotation file (GAF format). Optional
--output -o Output prefix for the results. Required
--mr_threshold -m Mutual rank threshold for filtering. 100
--e_value -e Exponential decay constant. 10
--resolution -r Resolution parameter for Leiden clustering. 0.1
--threads -t Number of threads for correlation calculation. 4
--save_intermediate -t Number of threads for correlation calculation. Optional

Example Command

mutclust --expression data/AtCol-0.cpm.tsv \
         --annotations annotations/AtCol-0.annot.tsv \
         --go_obo go-basic.obo \
         --go_gaf tair.gaf \
         --output results/mutclust_output \
         --mr_threshold 100 \
         --e_value 10 \
         --resolution 0.1 \
         --threads 8

Input File Formats

RNA-seq Dataset

  • Format: Tab-separated values (TSV).
  • Columns: Gene IDs as row indices and samples as columns.
  • Example:
geneID    Sample1    Sample2    Sample3
GeneA     1.23       2.34       3.45
GeneB     4.56       5.67       6.78

Gene Annotation File

  • Format: Tab-separated values (TSV).
  • Columns: geneID and additional annotation fields.
  • Example:
geneID    description
GeneA     Photosynthesis-related protein
GeneB     Transcription factor

GO OBO File

  • Description: The Gene Ontology (GO) OBO file contains the ontology structure.
  • Source: Download from Gene Ontology.

GO GAF File

  • Description: The Gene Annotation File (GAF) maps genes to GO terms.
  • Source: Download from Gene Ontology.

Output Files

  1. Filtered MR and e-values (<output_prefix>.mrs.tsv):

    • Lists of coexpressed genes with MR and e-values.
    • Columns: cluster_id, geneID.

    Example:

    Gene1    Gene2    MR    ED
    GeneA    GeneB    10.2  0.39
    GeneB    GeneC    6     0.6
    
  2. Clustered Genes (<output_prefix>.clusters.tsv):

    • Lists genes in each cluster.
    • Annotation columns if provided.
    • Columns: cluster_id, geneID.

    Example:

    cluster_id    geneID    Annotations
    1             GeneA     ...
    1             GeneB     ...
    
  3. GO Enrichment Results (<output_prefix>_go_enrichment_results.tsv):

    • Contains enriched GO terms for each cluster.
    • Columns: cluster, type, size, term, p-val, FC, desc.

    Example:

    cluster    type    size    term       p-val       FC    desc
    1          BP      25      GO:0008150 0.00123     3.5   Biological Process
    

Dependencies

The following Python libraries are required and will be installed automatically:

  • numpy
  • pandas
  • pynetcor
  • python-igraph
  • goatools

License

This project is licensed under the MIT License. See the LICENSE file for details.


Contributing

Contributions, suggestions and issues are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutclust-0.1.2.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutclust-0.1.2-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file mutclust-0.1.2.tar.gz.

File metadata

  • Download URL: mutclust-0.1.2.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for mutclust-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e6c795ed7bd90c4731fa76477747bfe2821f3af9a402f385af7a26574d6424c0
MD5 4d1f19feffc558a8a08a4ba621d5109b
BLAKE2b-256 b1eb9381f47e9afeca32bff5a9340025f4eaf79fed4ac2777e5e020d759adafa

See more details on using hashes here.

File details

Details for the file mutclust-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mutclust-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for mutclust-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b9e5a3f92b36813aeab97c7127c7116180a4dc9e25c4aaad89b3194c7e91aba0
MD5 df2ede548ba9e1b2a42d799b12f494e3
BLAKE2b-256 c42da303acf1f08fcbd0c240cf6a7edca82c7d55d5c25e67c01ae132a0e207a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page