EZQC is a streamlined, terminal-based alternative to FastQC.
Project description
EZQC: Easy Quality Control for FastQ Files
Table of Contents
Introduction
EZQC is a streamlined, terminal-based alternative to FastQC. Instead of generating individual report files per analysis, EZQC displays the analysis results, reasons, and suggestions directly in the terminal, making it easier to quickly assess the quality of multiple files. Additionally, EZQC generates figures for each analysis, providing a visual aid to spot potential issues for further examination.
EZQC is capable of performing the following analyses:
- Per base sequence quality
- Per sequence quality scores
- Per base sequence content
- Per sequence GC content
- Per base N content
- Sequence Length Distribution
- Overrepresented sequences
- Adapter Content
Quick Start Guide
- Install EZQC following Installation guide.
- Run the tool on a toy example using the command
ezqc tests/SRR020182.fastq
(fastq file from IGSR). - The results will be displayed in the terminal, and figures as well as csv tables will be saved to a directory named
ezqc_output
in your current working directory. Note that this file is choosen intendedly to fail multiple QC tests.
Installation
You can install EZQC from source using the provided setup.py
script. Here are the steps:
- Clone the repository:
git clone https://github.com/skysky2333/ezqc
- Navigate to the EZQC directory:
cd ezqc
- Install the package:
pip3 install .
Or install directly:
python setup.py install
EZQC requires Python 3.x and depends on the following packages, which will be installed automatically during setup:
- numpy
- matplotlib
- pandas
- scipy
- Bio
Usage
After installation, you can use EZQC from the command line as follows:
ezqc <fastq file(s)>
Replace <fastq file(s)>
with the path(s) to your FastQ files. If you want to analyze multiple files, separate the file paths with spaces:
ezqc file1.fastq file2.fastq file3.fastq
Analysis Methods
Here's a brief description of the analyses performed by EZQC:
- Per Base Sequence Quality: Checks the quality of each base call in a sequence read.
- Per Sequence Quality Scores: Provides a histogram of quality scores over all sequences.
- Per Base Sequence Content: Analyzes the proportion of each base (A, T, G, C) at each position across all sequences.
- Per Sequence GC Content: Calculates the GC content in each sequence.
- Per Base N Content: Identifies sequences with a high proportion of unknown (N) bases.
- Sequence Length Distribution: Provides a histogram showing the distribution of sequence lengths.
- Overrepresented sequences: Identifies any sequences that occur more often than expected.
- Adapter Content: Detects the presence of adapter sequences in the reads.
Contributing
We welcome contributions! Please see CONTRIBUTING.md
for details on how to contribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.