A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.33.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f31f2b1365d24c37e6b4db98741797da0a98e4f60b09138a7cc96daf81a3648 |
|
MD5 | f9d5c4de8c83f8a9dba0a287375d7fd4 |
|
BLAKE2b-256 | 855f017e3619d19846630966a5a7229d9ea1a0a540a95a4d2f01e4e38f71389a |
Hashes for biotite-0.33.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 717151200f33d1ba03683ec5d98a089d89fb8f7c6cab357fc4c00213e03d3372 |
|
MD5 | 16b62a49548e04af03200561dffa6d84 |
|
BLAKE2b-256 | 9d368ef9bde972d9928b6ec169675396e8c7d868f38946ae3c43d22fbc0b9404 |
Hashes for biotite-0.33.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 642ddab9536ae444ff70081ca0fdda31c5e00b1452c360e33b90f0a2f3cae4a8 |
|
MD5 | 1152904d3c72198b36ff3da32b11a263 |
|
BLAKE2b-256 | 9ae5c29d943af63e7316c3b673a43078eef7c28326813a68d0dec6fd274ee51e |
Hashes for biotite-0.33.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bae0cd5440b98063caa4659f33a418602b5af45648850de1e923145ffbdf572 |
|
MD5 | 5b1e607d2063ff44db1b2cb7a7222a58 |
|
BLAKE2b-256 | e1e468d22105f41451cd49a79e50438d806448b656cdeca74a5a6341ae88682a |
Hashes for biotite-0.33.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 286feb179aab1fce92d15743e7e9a2798f0feab1bbf89b7bc352b556d67402d9 |
|
MD5 | c478771531e8a6ddac576220181edba9 |
|
BLAKE2b-256 | b9abf4523e547e6451aea89b4cf3ab4ea215106408d62a253d8c313413697aa0 |
Hashes for biotite-0.33.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3911ee87714994f0d20b08d2edd3c499761ee480e3d8bdce2f7641c4aa17096 |
|
MD5 | c681dd90c8a871480aafa48c3cc4244c |
|
BLAKE2b-256 | 9f928eeffb553d9acf739373674452b913b26609e87e9bf3a4590c63070710d4 |
Hashes for biotite-0.33.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9436022744f3b67e95a09e1ce8c64a56d138a52cc3ac239af72a6a8ba340d30b |
|
MD5 | b9fb60d27e22b407dc67882812199669 |
|
BLAKE2b-256 | 3f9f6f503b8a8f7896bd65d91496e8676f918a69298f44293dde434dfbeb3f85 |
Hashes for biotite-0.33.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23cd6d1e3211b2efd271a593f0f43e2ef01f516ca1bcfc2df82a698474f79687 |
|
MD5 | a87fbc59e4c34c6a9a036379407fb031 |
|
BLAKE2b-256 | 82a93c4ea1ce85a64163b1a654e53af5003c047858dd1cf8e4cf5c2052b3dc31 |
Hashes for biotite-0.33.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63e641862b90d776ced2110bfd05132b57bee41c38ad083a8b238678440b573f |
|
MD5 | bf485d8e1d47a7c28d7e82f005d26384 |
|
BLAKE2b-256 | a6eed36fb42ff9d4545607a589915ef680a4f4e7635214a575f080888280b460 |