Skip to main content

New Data type known as SeqData for RNA-Seq data analysis

Project description

\n# ernav2_seqdata A new data type known as SeqData is designed for RNA-seq data analysis. The data type is designed for data integration from various sources and dimensions:

Expression of RNA is measured by read counts of transcripts. A typical bioinformatics pipeline of mRNA-seq determines reads counts (RC) of transcripts. The RCs are typically 2-D table, of which samples are in rows, and transcripts (or genes) are in columns, or in the reverse. After that, the RC table would be normalized as FPM or FPKM or somewhere else by a certain normalized method. The next, co-founding factors among samples would be removed using a certain method namely DESEQ2 or EdgeR etc. Moreover, those data would be transformed into various table, namely log, or partitioned into some subset. Bioinformatician should manage all those data sets during statistical anlaysis.

Biological scientists may be more care about significance of mRNA-seq data analysis, and what those significance reveals. In this case, sample informations, or patient information, or features of samples (namely single cells) shall be considered. Moreover, aside from transcript ID or Gene ID, other annotations would be integrated, for example, genomic annoations namely chromosome locus, protein annotations namely domain identification would be integrated, too. Those annoation data may not be used in statistical process, but really needed for further study.

SeqData is tree structure. The root contains data of phenotypes and annotations. Each node contains various attributes including X in m x n, and var (statistical aggregations). Nodes inherite the attributes of the root nodes. Data of children nodes is determined by those of parent nodes.

installation

It is convenient to install the repository using pip. The package could be found at Pythone Package Index.

pip install --upgrade rnaseqdata

Development

git clone git@github.com:Tiezhengyuan/ernav2_seqdata.git
cd ernav2_seqdata

create virtual environment

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Unit testing

pytest tests/unittests

quick tourial

In Python3

from rnaseqdata import RootData, SeqData
import numpy as np
import pandas as pd

Create SeqData

root = RootData()
c = SeqData(root)
c.put_data('test', np.eye(3), root)
c.to_df('test)
      0    1    2
 0  1.0  0.0  0.0
 1  0.0  1.0  0.0
 2  0.0  0.0  1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rnaseqdata-0.0.8.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

rnaseqdata-0.0.8-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file rnaseqdata-0.0.8.tar.gz.

File metadata

  • Download URL: rnaseqdata-0.0.8.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for rnaseqdata-0.0.8.tar.gz
Algorithm Hash digest
SHA256 b693c31050cb04ceb32118ea46a3a10d1839e2d60f9632c8bde5a074f4c2344c
MD5 2d706a9de14c411002e34303f6a4b909
BLAKE2b-256 ec6954812b98888d4db4ab2828dc8c60f0ea645166a4e25130d27a549493c052

See more details on using hashes here.

File details

Details for the file rnaseqdata-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: rnaseqdata-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for rnaseqdata-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f4a27344e70ad1feaee0472fb741e00f0cd201af694bfc214964c7ac155ac410
MD5 c70757d8d7c4d58444a2869a12f9e65f
BLAKE2b-256 9e82c6074abebed877b5df96336271b76fe83860f0360d4d5f72ab868616d72c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page