Skip to main content

No project description provided

Project description

quickdna

PyPI

Quickdna is a simple, fast library for working with DNA sequences. It is up to 100x faster than Biopython for some translation tasks, in part because it uses a native Rust module (via PyO3) for the translation. However, it exposes an easy-to-use, type-annotated API that should still feel familiar for Biopython users.

# These are the two main library types. Unlike Biopython, DnaSequence and
# ProteinSequence are distinct, though they share a common BaseSequence base class
>>> from quickdna import DnaSequence, ProteinSequence

# Sequences can be constructed from strs or bytes, and are stored internally as
# ascii-encoded bytes.
>>> d = DnaSequence("taatcaagactattcaaccaa")

# Sequences can be sliced just like regular strings, and return new sequence instances.
>>> d[3:9]
DnaSequence(seq='tcaaga')

# many other Python operations are supported on sequences as well: len, iter,
# ==, hash, concatenation with +, * a constant, etc. These operations are typed
# when appropriate and will not allow you to concatenate a ProteinSequence to a
# DnaSequence, for example

# DNA sequences can be easily translated to protein sequences with `translate()`.
# If no table=... argument is given, NBCI table 1 will be used by default...
>>> d.translate()
ProteinSequence(seq='*SRLFNQ')

# ...but any of the NCBI tables can be specified. A ValueError will be thrown
# for an invalid table.
>>> d.translate(table=22)
ProteinSequence(seq='**RLFNQ')

# This exists too! It's somewhat faster than Biopython, but not as dramatically as
# `translate()`
>>> d[3:9].reverse_complement()
DnaSequence(seq='TCTTGA')

# This method will return a list of all (up to 6) possible translated reading frames:
# (seq[:], seq[1:], seq[2:], seq.reverse_complement()[:], ...)
>>> d.translate_all_frames()
(ProteinSequence(seq='*SRLFNQ'), ProteinSequence(seq='NQDYST'),
ProteinSequence(seq='IKTIQP'), ProteinSequence(seq='LVE*S*L'),
ProteinSequence(seq='WLNSLD'), ProteinSequence(seq='G*IVLI'))

# translate_all_frames will return less than 6 frames for sequences of len < 5
>>> len(DnaSequence("AAAA").translate_all_frames())
4
>>> len(DnaSequence("AA").translate_all_frames())
0

# There is a similar method, `translate_self_frames`, that only returns the
# (up to 3) translated frames for this direction, without the reverse complement

# The IUPAC ambiguity code 'N' is supported as well.
# Codons with N will translate to a specific amino acid if it is unambiguous,
# such as GGN -> G, or the ambiguous amino acid code 'X' if there are multiple
# possible translations.
>>> DnaSequence("GGNATN").translate()
ProteinSequence(seq='GX')

Benchmarks

For regular DNA translation tasks, quickdna is faster than Biopython. (See benchmarks/bench.py for source). Machines and workloads vary, however -- always benchmark!

task time comparison
translate_quickdna(small_genome) 0.00306ms / iter
translate_biopython(small_genome) 0.05834ms / iter 1908.90%
translate_quickdna(covid_genome) 0.02959ms / iter
translate_biopython(covid_genome) 3.54413ms / iter 11979.10%
reverse_complement_quickdna(small_genome) 0.00238ms / iter
reverse_complement_biopython(small_genome) 0.00398ms / iter 167.24%
reverse_complement_quickdna(covid_genome) 0.02409ms / iter
reverse_complement_biopython(covid_genome) 0.02928ms / iter 121.55%

Should you use quickdna?

  • Quickdna pros
    • It's quick!
    • It's simple and small.
    • It has type annotations, including a py.typed marker file for checkers like MyPy or VSCode's PyRight.
    • It makes a type distinction between DNA and protein sequences, preventing confusion.
  • Quickdna cons:
    • It's newer and less battle-tested than Biopython.
    • It's not yet 1.0 -- the API is liable to change in the future.
    • It doesn't support reading FASTA files or many of the other tasks Biopython can do, so you'll probably end up still using Biopython or something else to do those tasks.
    • It doesn't support the (rarer) IUPAC ambiguity codes like B for non-A nucleotides, instead only supporting the general N ambiguity code.
      • If support for these codes is important to you, please make an issue! It may be possible to support them, it just isn't a priority right now.

Installation

Quickdna has prebuilt wheels for Linux (manylinux2010), OSX, and Windows available on PyPi.

Development

Quickdna uses PyO3 and maturin to build and upload the wheels, and poetry for handling dependencies. This is handled via a Justfile, which requires Just, a command-runner similar to make.

Poetry

You can install poetry from https://python-poetry.org, and it will handle the other python dependencies.

Just

You can install Just with cargo install just, and then run it in the project directory to get a list of commands.

Flamegraphs

The just profile command requires cargo-flamegraph, please see that repository for installation instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickdna-0.2.0.tar.gz (29.2 kB view hashes)

Uploaded Source

Built Distributions

quickdna-0.2.0-cp310-none-win_amd64.whl (109.8 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

quickdna-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (931.1 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

quickdna-0.2.0-cp310-cp310-macosx_10_7_x86_64.whl (208.4 kB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

quickdna-0.2.0-cp39-none-win_amd64.whl (109.8 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

quickdna-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (931.3 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

quickdna-0.2.0-cp39-cp39-macosx_10_7_x86_64.whl (208.4 kB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

quickdna-0.2.0-cp38-none-win_amd64.whl (109.9 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

quickdna-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (931.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

quickdna-0.2.0-cp38-cp38-macosx_10_7_x86_64.whl (208.6 kB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

quickdna-0.2.0-cp37-none-win_amd64.whl (109.8 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

quickdna-0.2.0-cp36-none-win_amd64.whl (109.7 kB view hashes)

Uploaded CPython 3.6 Windows x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page