Skip to main content

Unambiguous representation of modified DNA, RNA, and proteins

Project description

PyPI package Documentation Test results Test coverage Code analysis License Analytics

BpForms: unambiguous representation of modified DNA, RNA, and proteins

BpForms is a set of tools for unambiguously representing the structures of modified forms of biopolymers such as DNA, RNA, and protein.

  • The BpForms notation can unambiguously represent the structure of modified forms of biopolymers. For example, the following represents a modified DNA molecule that contains a deoxyinosine monomer at the fourth position. ACG[id: "dI"    | structure: "[H][C@]1(O)C[C@@]([H])(O[C@]1([H])CO)N1C=NC2=C1N=CN=C2O"]T
  • This concrete representation of modified biopolymers enables the BpForms software tools to calculate the chemical formulae, molecular weights, and charges of biopolymers, as well as to automatically calculate the major protonation and tautomerization state of biopolymers at specific pHs.

BpForms encompasses five tools:

BpForms was motivated by the need to concretely represent the biochemistry of DNA modification, DNA repair, post-transcriptional processing, and post-translational processing in whole-cell computational models. In addition, BpForms are a valuable tool for experimental proteomics. In particular, we developed BpForms because there were no notations, schemas, data models, or file formats for concretely representing modified forms of biopolymers, despite the existence of several databases and ontologies of DNA, RNA, and protein modifications and the ProForma Proteoform Notation.

The BpForms syntax was inspired by the ProForma Proteoform Notation. BpForms improves upon this syntax in several ways:

  • BpForms separates the representation of modified biopolymers from the chemical processes which generate them.
  • BpForms clarifies the representation of multiply modified monomers. This is necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
  • BpForms can be customized to represent any modification and, therefore, is not limited to previously enumerated modifications. This is also necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
  • BpForms supports two additional types of uncertainty in the structures of biopolymers: uncertainty in the position of a modified nucleotide/amino acid within the polymer sequence, and uncertainty in the chemical identity of modified nucleotide/amino acid as deviation from its expected mass or charge.
  • BpForms has a concrete grammar. This enables error checking, as well the calculation of chemical formulae, masses, and charges, which is essential for modeling.


  1. Install the third-party dependencies listed below. Detailed installation instructions are available in An Introduction to Whole-Cell Modeling.
  2. To use Marvin to calculate major protonation and tautomerization states, set JAVA_HOME to the path to your Java virtual machine (JVM) export JAVA_HOME=/usr/lib/jvm/default-java
  3. To use Marvin to calculate major protonation and tautomerization states, add Marvin to the Java class path export CLASSPATH=$CLASSPATH:/opt/chemaxon/marvinsuite/lib/MarvinBeans.jar
  4. Install this package
    • Install the latest release from PyPI. pip install bpforms[all]
    • Install the latest revision from GitHub. pip install git+   pip install git+[all]   pip install git+[all]

Examples, tutorial, and documentation

Please see the documentation. An interactive tutorial is also available in the whole-cell modeling sandbox.


The package is released under the MIT license.

Citing BpForms

Lang PF, Chebaro Y & Jonathan R. Karr. BpForms: a toolkit for concretely describing modified DNA, RNA and proteins. arXiv:1903.10042. :link:

Development team

This package was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA.

Questions and comments

Please contact the Karr Lab with any questions or comments.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for bpforms, version 0.0.5
Filename, size File type Python version Upload date Hashes
Filename, size bpforms-0.0.5-py2.py3-none-any.whl (2.6 MB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size bpforms-0.0.5.tar.gz (2.5 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page