Skip to main content

A package to generate a matrix for chord diagrams

Project description

chord-matrix

Overview

chord-matrix is a Python package designed to process gene expression data and generate a co-occurrence matrix. This matrix can then be utilized to create chord diagrams, providing a visual representation of the co-occurrences between different gene mutations.

Features

  • Data Processing: Reads gene expression data from a TSV file and filters it based on specific criteria.
  • Co-occurrence Matrix: Creates a matrix showing the co-occurrence of gene mutations across samples.
  • Details Calculation: Computes various statistics, including odds ratios and tendencies for mutual exclusivity or co-occurrence.
  • Output: Generates a detailed output file including the co-occurrence matrix and the computed statistics.

Installation

Prerequisites

Ensure you have Python 3.6 or higher installed on your system.

Using pip

You can install chord-matrix using pip:

pip install chord-matrix

The logic behind

Data Reading and Filtering

The input data file is read and filtered based on a specific status (RMG_53). Only the necessary columns (UPN, AF, and SYMBOL) are retained, and duplicates are removed. Customize the filtering according to your requirements.

Matrix Creation

The filtered data is used to create a pivot table where rows represent gene symbols and columns represent patient id(UPN). The presence of a mutation is indicated by 1, and the absence by 0.

Why use the dot product?

To generate the co-occurrence matrix, we use matrix multiplication (dot product) of the gene-patient matrix with its transpose. This process is fundamental in identifying how many times pairs of genes co-occur across all patients.

The dot product of two binary vectors (gene presence/absence vectors) results in a scalar value representing the number of times both genes are mutated in the same patient. By performing this operation for each pair of genes, we can construct a full co-occurrence matrix that quantifies the co-occurrences of all gene pairs across the dataset.

Details Calculation

The package calculates several statistics for each pair of genes, including:

  • Both: Number of samples with both genes mutated.
  • A Not B: Number of samples with only gene A mutated.
  • B Not A: Number of samples with only gene B mutated.
  • Neither: Number of samples with neither gene mutated.
  • Log2 Odds Ratio: Log2 of the odds ratio indicating the tendency for co-occurrence or mutual exclusivity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chord_matrix-0.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

chord_matrix-0.1.1-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file chord_matrix-0.1.1.tar.gz.

File metadata

  • Download URL: chord_matrix-0.1.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for chord_matrix-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2b4355d6ab2cb735d7cc68e43691d9c7c9688cadfa82515add488ea73884b79f
MD5 b105bbd14c3260ac13b4868ec91d8250
BLAKE2b-256 60d5fb5e769e4e8598fb8224c4376ac2d3924c866940f6439a17610695f2af34

See more details on using hashes here.

File details

Details for the file chord_matrix-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for chord_matrix-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5baac855c9fb319811f70d48513eda5d5a359a2c7faa3a0d5d869572fc83107d
MD5 f4d70b2630e7fce4cebcd3aa414a71de
BLAKE2b-256 fdcc7161cb20422cdd0f046ac7c2a3b3d1c3ed62316beb9817fbb4cb265a0822

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page