A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.30.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd1ef7740228b0eb7e38775803415154884aae200b30d7328a2d12dc6b62584e |
|
MD5 | 2d9356100c03ae710576d4d194bee5da |
|
BLAKE2b-256 | c6e60a77b29cd181946d7749ba97c25d95031a2d528bdbd7b18149b45917c0f8 |
Hashes for biotite-0.30.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff0e8989f3d216aa8af0226ce23c5a48e1771a8a7b676164127b0b2d085a00be |
|
MD5 | 7a2707f9bbf303401b36b56e99ce7a6d |
|
BLAKE2b-256 | a8753d767d37d1946aea1ba8999854fc485d6b7a17054a124ed51c39a66450c6 |
Hashes for biotite-0.30.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dba7a338e77c6a7f8536617a328cf8f84bd5da0360c665617599ffaa3caf26ee |
|
MD5 | de7aa8234d7f1caf3a63e57d17f13041 |
|
BLAKE2b-256 | 51e3d9f390f9666de1f5bedc52e60a35d70c75638823d5b2ad219772e9a2119d |
Hashes for biotite-0.30.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a17bf39ffe3873220039931f0334b9ac0ab111ac13c56c035e3d507c275cc87 |
|
MD5 | b91cd081ae1eaf8926790ee940b58ce9 |
|
BLAKE2b-256 | 7d1d54c294946eda689f87d0682309908e237691d42fee95b9be197bde5ad4b6 |
Hashes for biotite-0.30.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b058e0f7047b3f054902a2e684fcfcf42e8e3b2dbe4c3feb5b78027a1003be6d |
|
MD5 | f2a2f25fadf21151282eaf6ea8ac0f9d |
|
BLAKE2b-256 | 21d366805f0b39271eea88d1323a6fb161582308afe28b2c7df29b58d5e554d9 |
Hashes for biotite-0.30.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db813cf811ff35cf16794469a6d32e9a63f27285d9980cc53b53919db3b4ddae |
|
MD5 | 2e524742469ed94a0c79d611c043f3ad |
|
BLAKE2b-256 | 8e9890cd3712395dfcf00728ddec22eae9ad6340133d3802314041e4d318b714 |
Hashes for biotite-0.30.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b9ac586526b49286308a5877af1a6dd3dc790d38d94417d59a62add9e43ac01 |
|
MD5 | 8931bcb50f56b98169f7eaf0387d8c74 |
|
BLAKE2b-256 | 97f0108713668e61015e2596ee7e0650f500d11ddc3731a198f05df70daab9bc |
Hashes for biotite-0.30.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ebabc61691b5cb9bdd7446fd78c10a5c9cf13cd2b516cbf2a91e21ab9b7a8c9 |
|
MD5 | 333a9d1b9c1f8e8e24c72bbeb0b3a651 |
|
BLAKE2b-256 | aa82d9d9e392129733150e7fb58a48245ea3aa46603e27f5c9c1f9296258f5f5 |
Hashes for biotite-0.30.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac3ac54bc3037d7838ed4a3b3a87e1ed9c43917239a1c9b7d307398728953921 |
|
MD5 | 32ad2e4bf3acfaa152280fc43a14bcee |
|
BLAKE2b-256 | b0037eccbe6a64f2e6f692ccbceb2b2d685732ad5c36b0d228c163cd42a26967 |