Skip to main content

Python wrapper for DESeq2 RNA-seq differential expression analysis in Arabidopsis

Project description

DESeq2 Python Wrapper

A Python interface for RNA-seq differential expression analysis using DESeq2 and ClusterProfiler, specifically designed for Arabidopsis thaliana research.

Overview

This package provides a seamless Python interface to R's DESeq2 and ClusterProfiler packages, enabling differential expression analysis and GO enrichment analysis within Python workflows. The wrapper handles all data type conversions between Python and R, ensuring compatibility and ease of use.

Installation

Prerequisites

Before installing this package, ensure you have R installed with the following Bioconductor packages:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("DESeq2", "clusterProfiler", "org.At.tair.db", "biomaRt"))

Install from PyPI

pip install deseq2-wrapper

Install from Source

git clone https://github.com/yourusername/deseq2-wrapper.git
cd deseq2-wrapper
pip install .

Usage

Basic Workflow

import pandas as pd
from deseq2_wrapper import initialize, make_dds, compare_filter_annot, GO_from_DEGs
IMPORTANT NOTE: the first import of deseq2_wrapper library may fail due to stupid Rtools bugs. Repeat it 1-2 times and it will automatically work and I don't know why.

# Initialize the package (required before first use)
initialize()

# Load your data
df_counts = pd.read_excel("normalized_count.xlsx")
meta_tags = pd.read_excel("metadata.xlsx")

# Create DESeq2 dataset
dds = make_dds(df_counts, meta_tags)

# Perform differential expression analysis
df_deg = compare_filter_annot(
    dds, 
    grouping_var_name="group",
    group_test="treated", 
    group_base="control", 
    treatment="drug_treatment",
    min_baseMean_threshold=10,
    max_padj_threshold=0.05,
    min_log2FC_threshold=1,
    write_df=True
)

# Perform GO enrichment analysis
GO_results = GO_from_DEGs(df_deg, write_df=True)

Data Format Requirements

Count Matrix Format:

  • First column: Gene IDs
  • Subsequent columns: Sample expression counts
  • Column names must match sample names in metadata

Metadata Format:

  • Must contain a 'sample' column with sample names matching count matrix columns
  • Must contain a 'group' column defining experimental conditions
  • Additional columns can include other experimental factors

Features

  • Differential Expression Analysis: Identifies significantly regulated genes using DESeq2
  • GO Enrichment Analysis: Performs Gene Ontology enrichment with EPRN/EPRI metrics
  • BioMart Integration: Automatically annotates genes with descriptions and symbols
  • Excel Integration: Reads input from and writes results to Excel files
  • Comprehensive Error Handling: Provides clear error messages for troubleshooting

Functions

check_r_libraries()

Verifies that all required R packages are installed and provides installation instructions if needed.

make_dds(df_counts, meta_tags)

Creates a DESeq2 dataset from count matrix and metadata.

compare_filter_annot(dds, grouping_var_name, group_test, group_base, treatment, ...)

Performs differential expression analysis with customizable filtering thresholds.

GO_from_DEGs(df_deg, write_df=False)

Performs GO enrichment analysis on differentially expressed genes.

Output Files

The package generates Excel files with standardized naming conventions:

  • Differential expression results: df_deg_{group_test}_vs_{group_base}.xlsx
  • GO enrichment results: GO_df_deg_{group_test}_vs_{group_base}.xlsx

Citation

If you use this package in your research, please cite:

  • DESeq2: Love MI, Huber W, Anders S (2014). Genome Biology
  • clusterProfiler: Yu G, et al. (2012). OMICS: A Journal of Integrative Biology

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Known Issues and Troubleshooting

Windows Users

On Windows systems with Rtools installed, you may see warnings about PATH being redefined. These warnings are harmless and can be ignored. The package automatically suppresses these warnings.

If you encounter "access violation" errors on import, ensure you're calling initialize() manually after importing rather than letting it run automatically.

Initialization Required

Starting from version 0.1.0, you must call initialize() before using any analysis functions. This prevents conflicts on some systems:

from deseq2_wrapper import initialize
initialize()  # Call this once before using other functions

Support

For issues and questions, please use the GitHub issue tracker.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deseq2_wrapper-0.1.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deseq2_wrapper-0.1.1-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file deseq2_wrapper-0.1.1.tar.gz.

File metadata

  • Download URL: deseq2_wrapper-0.1.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for deseq2_wrapper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d46c9247b4e014fa9d98a0d9fb5382e45e4a4802152aba25caa42f3c4fbc2241
MD5 1ba49e098f21341c38f3b1fddd1a354d
BLAKE2b-256 50b7cca8262e56f301363c62556f165e6d607737071da021191bf0f95457d7f2

See more details on using hashes here.

File details

Details for the file deseq2_wrapper-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: deseq2_wrapper-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for deseq2_wrapper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d1694f1a8dd4c825e19c7e907182cc7371783427cc9a1052dca2a3d13df49d74
MD5 b7c8af690565fc2acca9f54eec6fa6fe
BLAKE2b-256 7a92cc3393a341c6fee0d2cf03860faa4d4474e89af25664140d9285a7d7a1bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page