A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.35.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bfe9f754adee16bdcf4dbe2b76d82fd88a9f7ea5436c45f5fb37d0834f7ef70 |
|
MD5 | fb5cbd40da67f72ce377151e0865bbce |
|
BLAKE2b-256 | 51e5e692870f303f94e135f7f118be8d537e57e9a9ac30a30fcd5324aa7ddf43 |
Hashes for biotite-0.35.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a98f4795f93422c32d2e1f93996e21fd92995bec02acc218876eb47be54094c |
|
MD5 | 20615a25bc5f2ee3e47bb611d49f422a |
|
BLAKE2b-256 | ba158007873d5707ca6eb19963f067710e2b008d16f356abc6f54654ec90ce95 |
Hashes for biotite-0.35.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6173f496ad197bdb58fa46193305c31b9846b73d98b616c3d0db37af10d047de |
|
MD5 | 81a2807b85128514ee032416db451b34 |
|
BLAKE2b-256 | b5442eb7a9d93f66358fc9a9e32423ab06894b8d368e157e39c9b0eace2f7aff |
Hashes for biotite-0.35.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 568d2f73d751744bdd4cce340c07b35deff169e369169d2370722d8bf9f576ef |
|
MD5 | c9aaa370bafd7d2acaef2e1dcc37a1c0 |
|
BLAKE2b-256 | 40770249a821d0b03ebdfdc8bbecdd316ed2af55f509d3965db33230fd163078 |
Hashes for biotite-0.35.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6381919d3e6d6c865cf1fb24856b901eca7b2436561a25764d39c42538d072e |
|
MD5 | e788b3e6e3b401c1446b4b6238cfe0c3 |
|
BLAKE2b-256 | 1de15af2254b9d8037b321000207ffba18c704bf3ccc55795a1cff25daf57589 |
Hashes for biotite-0.35.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51652d2de88fa4899b742605b1a432632737afebd4b668424cdc59a41d4aa36b |
|
MD5 | aa987571bd8bf7c0b1760ec95942c916 |
|
BLAKE2b-256 | fbcb8a4749e17306e37fe2f96e4d3046c035c1e4d101f8a0ef1a1b352a654a92 |
Hashes for biotite-0.35.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f59aea54063371e40bd9146c456d3f924eb7e496d73c803d40d23bf76a699ea6 |
|
MD5 | 17533454bc170cd54c0c8d43e2be0501 |
|
BLAKE2b-256 | bfffb3f1237338547b18cb3a9f6258ae48c633e44eccc6a8c9796a5b8b9b59dc |
Hashes for biotite-0.35.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cf75d0ea2e237d5d4df4e7af06afeda2a90cafa12d658a6420546dcc80d82dd |
|
MD5 | 7a735720c4c5639650623a6dd5dc1df7 |
|
BLAKE2b-256 | a2d8b2fa34f730a99a4053bd98c8a63e0dfcb0eb7483917812d66c2ae797e1f6 |
Hashes for biotite-0.35.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54248697655750c5c6882369b592410a50eb7a9d7fac3f60254ee6a60fc5fd0a |
|
MD5 | a1a8108b0a6d51275b4b8c532e33aed8 |
|
BLAKE2b-256 | 1c3152640f5579d20aeb8704ec164aeac1ece3c90416823f8bde314a1baca621 |