Skip to main content

A simple FASTQ toolbox for small to medium size projects without dependencies.

Project description

fastq

A simple FASTQ toolbox for small to medium size projects without dependencies.

DOI Test Badge Code style: black Download Badge Python Version Badge

FASTQ files are text-based files for storing nucleotide sequences and its corresponding quality scores. Reading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.

fastq offers an alternative to this and brings many useful functions without relying on third party packages.

Installation

Using pip / pip3:

pip install fastq

Or by source:

git clone git@github.com:not-a-feature/fastq.git
cd fastq
pip install .

How to use

fastq offers easy to use functions for fastq handling. The main parts are:

  • read()
  • write()
  • fastq_object()
    • head
    • body
    • qstr
    • info
    • toFasta()
    • len() / str() / eq()

Reading FASTQ files

read() is a fastq reader which is able to handle compressed and non-compressed files. Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read. This function returns a iterator of fastq_objects.

fos = fq.read("dolphin.fastq") # Iterator of fastq entries.
fos = list(fos) # Cast to list
fos = fq.read("reads.tar.gz") # Is able to handle compressed files.

Writing FASTA files

write() is a basic fastq writer. It takes a single or a list of fastq_objects and writes it to the given path.

The file is usually overwritten. Set write(fo, "path.fastq", mode="a") to append file.

fos = fq.read("dolphin.fastq") # Iterator of fastq entries
fos = list(fos)
fq.write(fos, "new.fastq")

fastq_object()

The core component of fastq is the fastq_object().

This object represents an FASTQ entry and consists of a head and body.

import fastq as fq
fo = fq.fastq_object("@M01967:23:0", "GATTTGGGG", "!''*((((*")
fo.getHead() or fo.head # @M01967:23:0
fo.getSeq()  or fo.body # GATTTGGGG
fo.getQual() or fo.qstr # !''*((((*

When fastq_object(..).info is requested, some summary statistics are computed and returned as dict. This computation is "lazy". I.e. the first query takes longer than the second. If the body or qstr is changed manually, info is automatically reset.

fo.getInfo() or fo.info
{'a_num': 1, 'g_num': 5,             # Absolute counts of ACTG
 't_num': 3, 'c_num': 0,             #
 'gc_content': 0.5555555555555556,   # Relatice GC content
 'at_content': 0.4444444444444444,   # Relative AT content
 'qual': 6.444444444444445,          # Mean quality (Illumina Encoding)
 'qual_median': 7,                   # Median quality
 'qual_variance': 7.027777777777778, # Variance of quality
 'qual_min': 0, 'qual_max': 9}       # Min / Max quality

Following methods are defined on a fastq_object():

str(fo) # will return:
# @M01967:23:0
# GATTTGGGG
# +
# !''*((((*


# Body length
len(fo) # will return 10, the length of the body

# Equality
# Checks only the body, not the header and not the quality string
print(fo == fo) # True

fo_b = fq.fastq_object("@different header", "GATTTGGGG", "!!!!!!!!!")
print(fo == fo_b) # True

fo_c = fq..fastq_object(">Different Body", "ZZZZ", "!--!")
print(fo == fo_c) # False

License

Copyright (C) 2022 by Jules Kreuer - @not_a_feature
This piece of software is published unter the GNU General Public License v3.0
TLDR:

| Permissions      | Conditions                   | Limitations |
| ---------------- | ---------------------------- | ----------- |
| ✓ Commercial use | Disclose source              | ✕ Liability |
| ✓ Distribution   | License and copyright notice | ✕ Warranty  |
| ✓ Modification   | Same license                 |             |
| ✓ Patent use     | State changes                |             |
| ✓ Private use    |                              |             |

Go to LICENSE.md to see the full version.

Dependencies

In addition to packages included in Python 3, this piece of software uses 3rd-party software packages for development purposes that are not required in the published version. Go to DEPENDENCIES.md to see all dependencies and licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastq-2.0.2.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastq-2.0.2-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file fastq-2.0.2.tar.gz.

File metadata

  • Download URL: fastq-2.0.2.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for fastq-2.0.2.tar.gz
Algorithm Hash digest
SHA256 b9a0cac33f56f2580f299bb3bbf41b9dcc68bf1f97fa992fedb0d9d3dea7a774
MD5 880769938fcf03df7969a57c016ad27e
BLAKE2b-256 bb7b6a849b0fa1546ffa02b982da5e5bbfe6ce0afaf8ea5994b30c1060e8f391

See more details on using hashes here.

File details

Details for the file fastq-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: fastq-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for fastq-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83931e85fb482aea502ee4a9d0366fe03f0833c89e45e8971c1acf733e7fef1b
MD5 aa4b38c242337a1bb1f41ac3e6d727ea
BLAKE2b-256 1fa46a113403d903f2e0ef1a6f3da0ab48033f8b044032d06bbc663f3c1a22f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page