Skip to main content

A simple FASTQ reader / toolbox for small to medium size projects without dependencies.

Project description

fastq logo

A simple FASTQ toolbox for small to medium size projects without strange dependencies.

DOI Test Badge Code style: black Download Badge Python Version Badge

FASTQ files are text-based files for storing nucleotide sequences and its corresponding quality scores. Reading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.

fastq offers an alternative to this and brings many useful functions without relying on third party packages.

Installation

Using pip / pip3:

pip install fastq

Or by source:

git clone git@github.com:not-a-feature/fastq.git
cd fastq
pip install .

How to use

fastq offers easy to use functions for fastq handling. The main parts are:

  • read()
  • write()
  • fastq_object()
    • head
    • body
    • qstr
    • info
    • toFasta()
    • len() / str() / eq()

Reading FASTQ files

read() is a fastq reader which is able to handle compressed and non-compressed files. Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read. This function returns a iterator of fastq_objects.

import fastq as fq
fos = fq.read("dolphin.fastq") # Iterator of fastq entries.
fos = list(fos) # Cast to list
fos = fq.read("reads.tar.gz") # Is able to handle compressed files.

fastq_object()

The core component of fastq is the fastq_object().

This object represents an FASTQ entry and consists of a head and body.

import fastq as fq
fo = fq.fastq_object("@M01967:23:0", "GATTTGGGG", "!''*((((*")
fo.getHead() or fo.head # @M01967:23:0
fo.getSeq()  or fo.body # GATTTGGGG
fo.getQual() or fo.qstr # !''*((((*

When fastq_object(..).info is requested, some summary statistics are computed and returned as dict. This computation is "lazy". I.e. the first query takes longer than the second. If the body or qstr is changed, info is automatically reset.

fo.getInfo() or fo.info
{'a_num': 1, 'g_num': 5,             # Absolute counts of AGTC
 't_num': 3, 'c_num': 0,             #
 'gc_content': 0.5555555555555556,   # Relative GC content
 'at_content': 0.4444444444444444,   # Relative AT content
 'qual': 6.444444444444445,          # Mean quality (Illumina Encoding)
 'qual_median': 7,                   # Median quality
 'qual_variance': 7.027777777777778, # Variance of quality
 'qual_min': 0, 'qual_max': 9}       # Min / Max quality

Following methods are defined on a fastq_object():

str(fo) # will return:
# @M01967:23:0
# GATTTGGGG
# +
# !''*((((*


# Body length
len(fo) # will return 10, the length of the body

# Equality
# Checks only the body, not the header and not the quality string
print(fo == fo) # True

fo_b = fq.fastq_object("@different header", "GATTTGGGG", "!!!!!!!!!")
print(fo == fo_b) # True

fo_c = fq..fastq_object(">Different Body", "ZZZZ", "!--!")
print(fo == fo_c) # False

Writing FASTQ files

write() is a basic fastq writer. It takes a single or a list of fastq_objects and writes it to the given path.

The file is usually overwritten. Set write(fo, "path.fastq", mode="a") to append file.

fos = fq.read("dolphin.fastq") # Iterator of fastq entries
fos = list(fos)

fq.write(fos, "new.fastq")

License

Copyright (C) 2024 by Jules Kreuer - @not_a_feature

This piece of software is published unter the GNU General Public License v3.0 TLDR:

Permissions Conditions Limitations
✓ Commercial use Disclose source ✕ Liability
✓ Distribution License and copyright notice ✕ Warranty
✓ Modification Same license
✓ Patent use State changes
✓ Private use

Go to LICENSE.md to see the full version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastq-2.0.4.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

fastq-2.0.4-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file fastq-2.0.4.tar.gz.

File metadata

  • Download URL: fastq-2.0.4.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for fastq-2.0.4.tar.gz
Algorithm Hash digest
SHA256 b64b3041045b0220483571564ee9ee43ba2a2fb6e5df5eb2dbcbaf5e68b3ebad
MD5 44abd1bd48142067886c6b1d238cce04
BLAKE2b-256 c930fce26cb88a55f3eacf22f644aa8051afb4781929ee4031f30dab9756148d

See more details on using hashes here.

File details

Details for the file fastq-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: fastq-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for fastq-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cc1199c4473152621ea64839c80abf16ace0bd14b3bbc17162e307d2215c583c
MD5 56c4f1803fb9bb7a73a8aa62e5abc47b
BLAKE2b-256 5982e3e7bbcdabb85af04fa2bd1f894b3dd31c028fb317828f76c77e8af35cb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page