Skip to main content

Lightweight utility for 10x single cell assays

Project description


umiread is a lightweight utility that allows for seamless parsing of R1 FASTQ files generated from 10x single cell 3' gene expression assays. The functions contained within this package show the user very important baseline quality control metrics about the unique molecular identifiers (UMIs) contained within one or more R1 FASTQs, including total number of unique and non-unique sequences, nucleotide distribution, and basecalling errors. umiread can be used for preliminary parsing of 10x single cell FASTQ files prior to downstream analysis. For more information on 10x single cell 3' gene expression assays, visit

Installation & Dependencies

umiread requires the following packages:

- pandas

- numpy

- matplotlib

In PyCharm or another Python IDE (Python 3 or greater), installation can be completed as follows:

pip install umiread

Reading & Extracting from FASTQ files

umiread has the ability to parse one or more R1 FASTQ files from a 10x 3' single cell assay (any version).

For parsing a single FASTQ file:

from umiread import extract_from_single
umi_data = extract_from_single("path/to/FASTQ file/~.fastq.gz")

For parsing multiple FASTQ files contained within the same folder (i.e. multiple FASTQs corresponding to the same sample):

from umiread import extract_from_folder
umi_data = extract_from_folder("path/to/FASTQ folder/")

The parser will collect the 10bp UMI at the beginning of each R1 sequence contained within the targeted FATSQ files. Assigning the parsed sequences to an object such as umi_data will allow for the user to generate quality control metrics.

Collecting Quality Control Metrics & Sequencing Errors

Having information about the total number of UMI sequences, unique UMI sequences, and nucleotide distribution can be useful for tracking the quality of wet lab workflows and overall data quality before downstream applications and further data analysis. umiread possesses a handful of functions that can print these statistics to console.

For basic statistics such as total number of UMIs, unique UMIs, percentage of UMIs that are unique, and the number of UMIs with a sequencing error, use the following command:

from umiread import UMIStats

This function will output a simple csv with all aforementioned QC metrics.

For nucleotide distributions (i.e. the number of A, C, T, G in all unique UMIS):


For a graph of the positional distribution of sequencing errors in the UMI (denoted by "N" in the FASTQ file):


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umiread-1.9.1.tar.gz (4.4 kB view hashes)

Uploaded source

Built Distribution

umiread-1.9.1-py3-none-any.whl (5.9 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page