A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.31.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce4eb46da4088ef0a03a227d04c2f573dea380b9aebde769f6bfcce9a71c2e80 |
|
MD5 | bdb2862a747d1b87c0ed60fef91f8bbb |
|
BLAKE2b-256 | be991c3fc879f28a10fbfd5e12be996887a7bf6001d739b11b468f84781eb6c3 |
Hashes for biotite-0.31.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70afced5e615978cd4e282dc4896207c408373db486d38ade79800d29409346a |
|
MD5 | 370eef066068a1958fb1f756e722d458 |
|
BLAKE2b-256 | 2da90558cf9e8d53f3c82d8ed1bed03be78c9b8cb3755d0d03db8fe30d9bf098 |
Hashes for biotite-0.31.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | acb6621a90ef9cfdb257ab5739a850ab1ffba34413e3dfc32cd0c58d55d3ca82 |
|
MD5 | 3ebdd3d938a139f51bf1e603b9e415d4 |
|
BLAKE2b-256 | 79a6a7752429f89d5fba4f46d6f228f4c2ac6e06232d07518ec9377e4eb925cc |
Hashes for biotite-0.31.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eb7b9736f3cc9115a2de74a504306b5e3a44b7eccaf763961dd7a88c1914bfc |
|
MD5 | 595555383614240e13a84384d9d43d9c |
|
BLAKE2b-256 | df7876f9f76c43eafe4ef569e6c38b5413b86b5e3c9254c347210685e0c5b891 |
Hashes for biotite-0.31.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 871bf174fce00364e86abb8cd760a8e38ee5d20f1612977b80d646ea05eb96c0 |
|
MD5 | 4e8d2b822c8a248dfc82b9b4278bd356 |
|
BLAKE2b-256 | a6b447cdbb2d3b9047f728721a55529bb7c5001cd7a714e4fae4d5059f414355 |
Hashes for biotite-0.31.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e0829d8fd4f430e532eb7c7744130e1bf0f29d585d69023e0ae62e49cda7dde |
|
MD5 | 2453534a9c0e2588486ebfff5c657f32 |
|
BLAKE2b-256 | f9b17ccbf39b948781fb526a7cef05b7bb749e3b261d40c2a0fd799964b57cdd |
Hashes for biotite-0.31.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a99016dc6639cfb15089660da79334b86db007c7ad2a8e4393c09efda852f51c |
|
MD5 | 549afc14425abeed9967eaf1791b32de |
|
BLAKE2b-256 | dc33658575ce6afc9671f2b80bc3bf1c18018bfa4aa22638c9aab16583777086 |
Hashes for biotite-0.31.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a7468c97e11ce2e321c3ecf75989a132fe64d3416cbc5b7e292cd6095b88e7d |
|
MD5 | d88191d8ddecd971e8360a8625e7d560 |
|
BLAKE2b-256 | 894d26271ca95f7f9587d1024c368533087aeb07bc5324576fe2a5f1193727c9 |
Hashes for biotite-0.31.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34f8739cfaa00d83c3ea1c3b93d3c34f63ac017dfbb64fbfe80991121d96aba8 |
|
MD5 | 961a57b521f73fc2e000b4246a15802d |
|
BLAKE2b-256 | 4ec742cb7b8b7f3db5c41968a2b985fb70308289db857895fd260e3d19c10e8e |