Skip to main content

A comprehensive library for computational molecular biology

Project description

Biotite at PyPI Python version Test status The Biotite Project

Biotite project

Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:

  • Searching and fetching data from biological databases

  • Reading and writing popular sequence/structure file formats

  • Analyzing and editing sequence/structure data

  • Visualizing sequence/structure data

  • Interfacing external applications for further analysis

Biotite internally stores most of the data as NumPy ndarray objects, enabling

  • fast C-accelerated analysis,

  • intuitive usability through NumPy-like indexing syntax,

  • extensibility through direct access of the internal NumPy arrays.

As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.

If you use Biotite in a scientific publication, please cite:

Kunzmann, P. & Hamacher, K. BMC Bioinformatics (2018) 19:346.


Biotite requires the following packages:

  • numpy

  • requests

  • msgpack

  • networkx

Some functions require some extra packages:

  • mdtraj - Required for trajetory file I/O operations.

  • matplotlib - Required for plotting purposes.

Biotite can be installed via Conda

$ conda install -c conda-forge biotite

… or pip

$ pip install biotite


Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:

import biotite.sequence.align as align
import as fasta
import biotite.database.entrez as entrez

# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
    uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
    db_name="protein", ret_type="fasta"

# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file =
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()

# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
    avidin_seq, streptavidin_seq, matrix,
    gap_penalty=(-10, -1), terminal_penalty=False



More documentation, including a tutorial, an example gallery and the API reference is available at


Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biotite-0.38.0.tar.gz (32.7 MB view hashes)

Uploaded source

Built Distributions

biotite-0.38.0-cp311-cp311-win_amd64.whl (35.8 MB view hashes)

Uploaded cp311

biotite-0.38.0-cp310-cp310-win_amd64.whl (35.8 MB view hashes)

Uploaded cp310

biotite-0.38.0-cp39-cp39-win_amd64.whl (35.8 MB view hashes)

Uploaded cp39

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page