Package for estimating UMI counts in Transcript Tag Counting data.
Project description
# umis
**umis** provides tools for estimating expression in RNA-Seq data which performs
sequencing of end tags of trancsript, and incorporate molecular tags to
correct for amplification bias.
There are three steps in this process.
1. Formatting reads
2. Pseodomapping to cDNAs
3. Counting molecular identifiers
## 1. Formatting reads
We want to strip out all non-biological segments of the sequenced reads for
the sake of mapping. While also keeping this information for later use. We
consider non-biological information such as Cellular Barcode and Molecular
Barcode. To later be able to extract the optional CB and the MB these are put
in the read header, with the followign format.
@HWI-ST808:130:H0B8YADXX:1:1101:2088:2222:CELL_GGTCCA:UMI_CCCT
AGGAAGATGGAGGAGAGAAGGCGGTGAAAGAGACCTGTAAAAAGCCACCGN
+
@@@DDBD>=AFCF+<CAFHDECII:DGGGHGIGGIIIEHGIIIGIIDHII#
The command `umis fastqtransform` is for transforming a (pair of) read(s) to
this format based on a _transform file_. The transform file is a json file
which has a Python flavored regular expression for each read, made to extract
the necessary components of the reads.
## 2. Pseodomapping to cDNAs
This is done by pseduoaligners, either Kallisto or RapMap. The SAM file output
from these tools need to be saved.
## 3. Counting molecular identifiers
The final step is to infer which cDNA was the origin of the tag a UMI was
attached to. We use the pseudoalignments to the cDNAs, and consider a tag
assigned to a cDNA as a partial _evidence_ for a (cDNA, UMI) pairing. For
actual counting, we only count unique UMIs for (gene, UMI) pairings with
sufficient evidence.
**umis** provides tools for estimating expression in RNA-Seq data which performs
sequencing of end tags of trancsript, and incorporate molecular tags to
correct for amplification bias.
There are three steps in this process.
1. Formatting reads
2. Pseodomapping to cDNAs
3. Counting molecular identifiers
## 1. Formatting reads
We want to strip out all non-biological segments of the sequenced reads for
the sake of mapping. While also keeping this information for later use. We
consider non-biological information such as Cellular Barcode and Molecular
Barcode. To later be able to extract the optional CB and the MB these are put
in the read header, with the followign format.
@HWI-ST808:130:H0B8YADXX:1:1101:2088:2222:CELL_GGTCCA:UMI_CCCT
AGGAAGATGGAGGAGAGAAGGCGGTGAAAGAGACCTGTAAAAAGCCACCGN
+
@@@DDBD>=AFCF+<CAFHDECII:DGGGHGIGGIIIEHGIIIGIIDHII#
The command `umis fastqtransform` is for transforming a (pair of) read(s) to
this format based on a _transform file_. The transform file is a json file
which has a Python flavored regular expression for each read, made to extract
the necessary components of the reads.
## 2. Pseodomapping to cDNAs
This is done by pseduoaligners, either Kallisto or RapMap. The SAM file output
from these tools need to be saved.
## 3. Counting molecular identifiers
The final step is to infer which cDNA was the origin of the tag a UMI was
attached to. We use the pseudoalignments to the cDNAs, and consider a tag
assigned to a cDNA as a partial _evidence_ for a (cDNA, UMI) pairing. For
actual counting, we only count unique UMIs for (gene, UMI) pairings with
sufficient evidence.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
umis-0.2.0.tar.gz
(4.2 kB
view details)
Built Distribution
File details
Details for the file umis-0.2.0.tar.gz
.
File metadata
- Download URL: umis-0.2.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21438b02ee9af86eea999b9e0d7b508add30f1dc50f3ade0fbfa8b4663fced3d |
|
MD5 | 0689d36024a2750f811de062e5d9167a |
|
BLAKE2b-256 | d86b89d23d5a120173124d5a3bd75355d0798d7dcda9ffc4ff59eb7880fefbc1 |
File details
Details for the file umis-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: umis-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d2075d26da9c62ea5bc21c0a7e99f031aaa8df86f9b604784468491061d95d2 |
|
MD5 | a31b3f36e138ad70e5007e3ddbe15f23 |
|
BLAKE2b-256 | 7d55cb49accfde920a1face60c23518b6e2bcee1c7e696502c771ff4cc5391c4 |