Skip to main content

Lightweight utility for 10x single cell assays

Project description

umiread

umiread is a lightweight utility that allows for seamless parsing of R1 FASTQ files generated from 10x single cell 3' gene expression assays. The functions contained within this package show the user very important baseline quality control metrics about the unique molecular identifiers (UMIs) contained within one or more R1 FASTQs, including total number of unique and non-unique sequences, nucleotide distribution, and basecalling errors. umiread can be used for preliminary parsing of 10x single cell FASTQ files prior to downstream analysis. For more information on 10x single cell 3' gene expression assays, visit https://support.10xgenomics.com/single-cell-gene-expression/sequencing/doc/specifications-sequencing-requirements-for-single-cell-3

Installation & Dependencies

umiread requires the following packages:

- pandas

- numpy

- matplotlib

In PyCharm or another Python IDE (Python 3 or greater), installation can be completed as follows:

pip install umiread

Reading & Extracting from FASTQ files

umiread has the ability to parse one or more R1 FASTQ files from a 10x 3' single cell assay (any version).

For parsing a single FASTQ file:

from umiread import extract_from_single
umi_data = extract_from_single("path/to/FASTQ file/~.fastq.gz")

For parsing multiple FASTQ files contained within the same folder (i.e. multiple FASTQs corresponding to the same sample):

from umiread import extract_from_folder
umi_data = extract_from_folder("path/to/FASTQ folder/")

The parser will collect the 10bp UMI at the beginning of each R1 sequence contained within the targeted FATSQ files. Assigning the parsed sequences to an object such as umi_data will allow for the user to generate quality control metrics.

Collecting Quality Control Metrics & Sequencing Errors

Having information about the total number of UMI sequences, unique UMI sequences, and nucleotide distribution can be useful for tracking the quality of wet lab workflows and overall data quality before downstream applications and further data analysis. umiread possesses a handful of functions that can print these statistics to console.

For basic statistics such as total number of UMIs, unique UMIs, percentage of UMIs that are unique, and the number of UMIs with a sequencing error, use the following command:

from umiread import UMIStats
UMIStats(umi_data).collect_qc_statistics()

This function will output a simple csv with all aforementioned QC metrics.

For nucleotide distributions (i.e. the number of A, C, T, G in all unique UMIS):

UMIStats(umi_data).base_distribution()

For a graph of the positional distribution of sequencing errors in the UMI (denoted by "N" in the FASTQ file):

UMIStats(umi_data).show_seq_errors()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umiread-1.9.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

umiread-1.9.1-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file umiread-1.9.1.tar.gz.

File metadata

  • Download URL: umiread-1.9.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.13.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for umiread-1.9.1.tar.gz
Algorithm Hash digest
SHA256 1e3a42c0e1531960d2b9192b837d6de99ceb45543ea6332f9492f16a415f22b9
MD5 e28c2d007a70bf44d381a4ceb2a497f3
BLAKE2b-256 478595d7b8dfdf2ac1094d23422715c2451ed7197679cff63397ac1d8d4e56e8

See more details on using hashes here.

File details

Details for the file umiread-1.9.1-py3-none-any.whl.

File metadata

  • Download URL: umiread-1.9.1-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.13.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for umiread-1.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 20e22a75dca04fd17cae87b06c80989545cf15aee989874e7eede33b5f60ff36
MD5 f00c4e4c9b008f60784115c2a6a51fd2
BLAKE2b-256 6f28bb3a1324be6d9f8ed7e0df1eb97bb1716640fc77e8ee50113f61b109b1d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page