Skip to main content

No project description provided

Project description

quickdna

PyPI

Quickdna is a simple, fast library for working with DNA sequences. It is up to 100x faster than Biopython for some translation tasks, in part because it uses a native Rust module (via PyO3) for the translation. However, it exposes an easy-to-use, type-annotated API that should still feel familiar for Biopython users.

Quickdna is "pre-1.0" software. Its API is still evolving. For now, if you're interested in using quickdna, we suggest you depend on an exact version or git rev, so that new releases don't break your code.

# These are the two main library types. Unlike Biopython, DnaSequence and
# ProteinSequence are distinct, though they share a common BaseSequence base class
>>> from quickdna import DnaSequence, ProteinSequence

# Sequences can be constructed from strs or bytes, and are stored internally as
# ascii-encoded bytes.
>>> d = DnaSequence("taatcaagactattcaaccaa")

# Sequences can be sliced just like regular strings, and return new sequence instances.
>>> d[3:9]
DnaSequence(seq='tcaaga')

# many other Python operations are supported on sequences as well: len, iter,
# ==, hash, concatenation with +, * a constant, etc. These operations are typed
# when appropriate and will not allow you to concatenate a ProteinSequence to a
# DnaSequence, for example

# DNA sequences can be easily translated to protein sequences with `translate()`.
# If no table=... argument is given, NBCI table 1 will be used by default...
>>> d.translate()
ProteinSequence(seq='*SRLFNQ')

# ...but any of the NCBI tables can be specified. A ValueError will be thrown
# for an invalid table.
>>> d.translate(table=22)
ProteinSequence(seq='**RLFNQ')

# This exists too! It's somewhat faster than Biopython, but not as dramatically as
# `translate()`
>>> d[3:9].reverse_complement()
DnaSequence(seq='TCTTGA')

# This method will return a list of all (up to 6) possible translated reading frames:
# (seq[:], seq[1:], seq[2:], seq.reverse_complement()[:], ...)
>>> d.translate_all_frames()
(ProteinSequence(seq='*SRLFNQ'), ProteinSequence(seq='NQDYST'),
ProteinSequence(seq='IKTIQP'), ProteinSequence(seq='LVE*S*L'),
ProteinSequence(seq='WLNSLD'), ProteinSequence(seq='G*IVLI'))

# translate_all_frames will return less than 6 frames for sequences of len < 5
>>> len(DnaSequence("AAAA").translate_all_frames())
4
>>> len(DnaSequence("AA").translate_all_frames())
0

# There is a similar method, `translate_self_frames`, that only returns the
# (up to 3) translated frames for this direction, without the reverse complement

# The IUPAC ambiguity codes are supported as well.
# Codons with N will translate to a specific amino acid if it is unambiguous,
# such as GGN -> G, or the ambiguous amino acid code 'X' if there are multiple
# possible translations.
>>> DnaSequence("GGNATN").translate()
ProteinSequence(seq='GX')

# The fine-grained ambiguity codes like "R = A or G" are accepted too, and
# translation results are the same as Biopython. In the output, amino acid
# ambiguity code 'B' means "either asparagine or aspartic acid" (N or D).
>>> DnaSequence("RAT").translate()
ProteinSequence(seq='B')

# To disallow ambiguity codes in translation, try: `.translate(strict=True)`

Benchmarks

For regular DNA translation tasks, quickdna is faster than Biopython. (See benchmarks/bench.py for source). Machines and workloads vary, however -- always benchmark!

task time comparison
translate_quickdna(small_genome) 0.00306ms / iter
translate_biopython(small_genome) 0.05834ms / iter 1908.90%
translate_quickdna(covid_genome) 0.02959ms / iter
translate_biopython(covid_genome) 3.54413ms / iter 11979.10%
reverse_complement_quickdna(small_genome) 0.00238ms / iter
reverse_complement_biopython(small_genome) 0.00398ms / iter 167.24%
reverse_complement_quickdna(covid_genome) 0.02409ms / iter
reverse_complement_biopython(covid_genome) 0.02928ms / iter 121.55%

Should you use quickdna?

  • Quickdna pros
    • It's quick!
    • It's simple and small.
    • It has type annotations, including a py.typed marker file for checkers like MyPy or VSCode's PyRight.
    • It makes a type distinction between DNA and protein sequences, preventing confusion.
  • Quickdna cons:
    • It's newer and less battle-tested than Biopython.
    • It's not yet 1.0 -- the API is liable to change in the future.
    • It doesn't support reading FASTA files or many of the other tasks Biopython can do, so you'll probably end up still using Biopython or something else to do those tasks.

Installation

Quickdna has prebuilt wheels for Linux (manylinux2010), OSX, and Windows available on PyPi.

Development

Quickdna uses PyO3 and maturin to build and upload the wheels, and poetry for handling dependencies. This is handled via a Justfile, which requires Just, a command-runner similar to make.

Poetry

You can install poetry from https://python-poetry.org, and it will handle the other python dependencies.

Just

You can install Just with cargo install just, and then run it in the project directory to get a list of commands.

Flamegraphs

The just profile command requires cargo-flamegraph, please see that repository for installation instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

quickdna-0.5.0-cp311-none-win_amd64.whl (115.4 kB view details)

Uploaded CPython 3.11Windows x86-64

quickdna-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

quickdna-0.5.0-cp310-cp310-macosx_10_7_x86_64.whl (236.0 kB view details)

Uploaded CPython 3.10macOS 10.7+ x86-64

quickdna-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

quickdna-0.5.0-cp39-cp39-macosx_10_7_x86_64.whl (236.3 kB view details)

Uploaded CPython 3.9macOS 10.7+ x86-64

quickdna-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

quickdna-0.5.0-cp38-cp38-macosx_10_7_x86_64.whl (235.9 kB view details)

Uploaded CPython 3.8macOS 10.7+ x86-64

File details

Details for the file quickdna-0.5.0-cp311-none-win_amd64.whl.

File metadata

  • Download URL: quickdna-0.5.0-cp311-none-win_amd64.whl
  • Upload date:
  • Size: 115.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.14.8

File hashes

Hashes for quickdna-0.5.0-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 a86f736e08169511abf950bd4f20ce4738d3615f80476ab43756dc214a123635
MD5 f4832ebf2b96d4041eb4e4cac2950088
BLAKE2b-256 52720dde2c3d53ee06a1138eb8244dd388cc28c52a5ae1da4cb3e157e1a63aa8

See more details on using hashes here.

File details

Details for the file quickdna-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for quickdna-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ac589d30e650bfff0e98f954abdc79215939ddcb94376e502f2b4ae2f4566f40
MD5 30942440219d83b89d8d8dfe8843e100
BLAKE2b-256 3a58bae7d907b09f9e49be9e05b61848ae476e5d344648a4aed385f1be342e67

See more details on using hashes here.

File details

Details for the file quickdna-0.5.0-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for quickdna-0.5.0-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 3ca0faa62507be6ace2db9615c01e7580a7f8bafed85741518c2764423359a63
MD5 2ae0ec876f88cc9235a5e185e7886665
BLAKE2b-256 9aadea9daa3f6f33280b59ae26ae5e0d06fa726f8cd85e37c8d2a9974ce4edf0

See more details on using hashes here.

File details

Details for the file quickdna-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for quickdna-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eeabbdc0928817f427b3f6ec03cea470a2ed44456aa647bcfc1dd7e67c01a2c5
MD5 51e958498322a540646224959829b87f
BLAKE2b-256 9b19500a94b7aceb91b76c7c463001c4720b69b8a77a8d0ccdfdd6dc5011f0e1

See more details on using hashes here.

File details

Details for the file quickdna-0.5.0-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for quickdna-0.5.0-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 ba1bfdfc9be9fced13b7f726a237ae2084192bdb3de72b989754c24a3791da2a
MD5 1ad5a89169e5ac2a5a8d78e86038e608
BLAKE2b-256 24383174c1d5994da34bf2bf1d7ae3608f1ef5058c273ea03e4449387b598c0f

See more details on using hashes here.

File details

Details for the file quickdna-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for quickdna-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1cbf77fb5600038424ad9a3db9e41fa6ade37fb58348726a9a62fce7dc367290
MD5 addddd412860745ec798d0fb4ab5ff55
BLAKE2b-256 d1ce3da31fb05f2060879df45b1fc6d92a20da5b021a10cf518f7614e987fbca

See more details on using hashes here.

File details

Details for the file quickdna-0.5.0-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for quickdna-0.5.0-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 5d6b8d531adaf361f0d6cbd4d52f5fd209e62874785063aca5abc4634da7b050
MD5 f8fdd836bef637f3a513e6d2ec67f9d7
BLAKE2b-256 9ba6a6e9c3c6f801915ac37d859bdc0516c9abbe746fb7f2d1610bd15e5ef4a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page