A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.34.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2b9434013d3f4812eebb7335d11d9c48ec2640d3820c518f073744fc58be1da |
|
MD5 | 91abb9516060aa718cea46e1b280f32d |
|
BLAKE2b-256 | bb1cf7237e3b3d45238f98f306359c9ffcd5edc4dbca00a7435f0d9a30f56545 |
Hashes for biotite-0.34.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 491ccbac79aeb2213bad86ff1e8b8adf50464e31aee539c3f19c535a733cf541 |
|
MD5 | 888f1a5ea7b4fc7fd7cd09ef958337dc |
|
BLAKE2b-256 | 3bb10746fa4ad2e7f4d51e8fa46eaec7efd7b2a013d5fefb3028257aed8bdb9c |
Hashes for biotite-0.34.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6057bd5fad7d2744c63714c2e0d59fe98fe40ccb6bf1f6e106827495475eb5a2 |
|
MD5 | 86b1db7679e4408a9d5718660f47ad16 |
|
BLAKE2b-256 | fdc3ad8b5eff29829d7d1216ba6609533ce23777121f0289b5a290d42afeb142 |
Hashes for biotite-0.34.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 589d03c9cce1b73a9170b58f4e4336d51c147b9b246a1b00d63212632cda8fa9 |
|
MD5 | bded18d57fde9ebcac9f82f65439d730 |
|
BLAKE2b-256 | 0c91a28a256e8009fd189361cafc9ac656bd0d379419e91da6140059b25a9be0 |
Hashes for biotite-0.34.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6de95ae0eecbdf11ba88d6c92f922d65d93754ed37dfb63c1b182fcc8e3194ab |
|
MD5 | ac16ef9f3b9a288e2d7833b022b430dd |
|
BLAKE2b-256 | 76b2709f120fe7ee687af80c59c068af6c77d9170c13a2f4554a1f49e0753f3c |
Hashes for biotite-0.34.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e77331f8afea080822772e444752815bb5e93f8bdadec7a9199e8cd9bac4987 |
|
MD5 | 3024d55e5ded01ee26dbac6484b49466 |
|
BLAKE2b-256 | 59ff08f7196a4e5c5a0ca508eaef74305fc3580fa1f44aeed25c7b5eb90f85ee |
Hashes for biotite-0.34.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3499eb90ce527aaa3f1c60bdc1dc30f2d7ea277da9e456d891d14fe055a8c927 |
|
MD5 | 9d5bcfb476218e0d58ef90d3036c22cb |
|
BLAKE2b-256 | 80a8ba5d6f6becd654493e3f29ba8a4fa37e79b7bb32036fd11e58f8ef0cf3cc |
Hashes for biotite-0.34.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1ac4cea189490b80d6b3aba68c2108fb2c0434e50ace3872d915bb958830df4 |
|
MD5 | 73a1cbb0d86669377d0d28463deff86d |
|
BLAKE2b-256 | 14cc7e0772ed41f040ebb93c280f6f6f9fcf1f6674945cbebd86337270adb3e3 |
Hashes for biotite-0.34.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3363957df6dc25b7cdbfecb63cc836208ef88448aaa2a27b47b89dd979461d3 |
|
MD5 | 59eac78e501b0523bef12d82adeb02f9 |
|
BLAKE2b-256 | 9023cae47a71ce26d5e7ea1b31c851e848c7622198bb64c44bb6ddfc498f8766 |