Skip to main content

Package for estimating UMI counts in Transcript Tag Counting data.

Project description

# umis

**Note: This tool works on heuristic counting. For more principled UMI quantification, see [Kallisto](https://github.com/pachterlab/kallisto).**
Some scripts of `umis` might still be useful for pre-processing and investigating UMI data. In particular the regex based UMI extraction.

**umis** provides tools for estimating expression in RNA-Seq data which performs
sequencing of end tags of trancsript, and incorporate molecular tags to
correct for amplification bias.

There are three steps in this process.

1. Formatting reads
2. Pseodomapping to cDNAs
3. Counting molecular identifiers

## 1. Formatting reads

We want to strip out all non-biological segments of the sequenced reads for
the sake of mapping. While also keeping this information for later use. We
consider non-biological information such as Cellular Barcode and Molecular
Barcode. To later be able to extract the optional CB and the MB these are put
in the read header, with the followign format.

@HWI-ST808:130:H0B8YADXX:1:1101:2088:2222:CELL_GGTCCA:UMI_CCCT
AGGAAGATGGAGGAGAGAAGGCGGTGAAAGAGACCTGTAAAAAGCCACCGN
+
@@@DDBD>=AFCF+<CAFHDECII:DGGGHGIGGIIIEHGIIIGIIDHII#

The command `umis fastqtransform` is for transforming a (pair of) read(s) to
this format based on a _transform file_. The transform file is a json file
which has a Python flavored regular expression for each read, made to extract
the necessary components of the reads.

## 2. Pseodomapping to cDNAs

This is done by pseduoaligners, either Kallisto or RapMap. The SAM file output
from these tools need to be saved.

## 3. Counting molecular identifiers

The final step is to infer which cDNA was the origin of the tag a UMI was
attached to. We use the pseudoalignments to the cDNAs, and consider a tag
assigned to a cDNA as a partial _evidence_ for a (cDNA, UMI) pairing. For
actual counting, we only count unique UMIs for (gene, UMI) pairings with
sufficient evidence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umis-0.3.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

umis-0.3.macosx-10.6-x86_64.tar.gz (17.9 kB view details)

Uploaded Source

File details

Details for the file umis-0.3.tar.gz.

File metadata

  • Download URL: umis-0.3.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for umis-0.3.tar.gz
Algorithm Hash digest
SHA256 225051fd062e3142645d4a5e4868df028972feb7bc90f674d3ab053f38607584
MD5 03c3a38883dc5c53fa174ea369d19afa
BLAKE2b-256 7884ffef30c86b2d87c886af892807a7349343dba9263ad6de52dd1e6642fbc6

See more details on using hashes here.

File details

Details for the file umis-0.3.macosx-10.6-x86_64.tar.gz.

File metadata

File hashes

Hashes for umis-0.3.macosx-10.6-x86_64.tar.gz
Algorithm Hash digest
SHA256 660db26599f43f9c1ef5dd9451ea12662d8d5b7ea0550a247d084ef9566f9c80
MD5 253732519370e2360d4e760507c671a4
BLAKE2b-256 0d0c0510cbd2ba7e784b73c2f25e5d7fab2ebba5ae0e59a37321e32df4487a7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page