Unambiguous representation of modified DNA, RNA, and proteins
BpForms: unambiguous representation of modified DNA, RNA, and proteins
BpForms is a set of tools for unambiguously representing the structures of modified forms of biopolymers such as DNA, RNA, and protein.
- The BpForms notation can unambiguously represent the structure of modified forms of biopolymers. For example, the following represents a modified DNA molecule that contains a deoxyinosine monomer at the fourth position. ACG[id: "dI" | structure: "[H][C@]1(O)C[C@@]([H])(O[C@]1([H])CO)N1C=NC2=C1N=CN=C2O"]T
- This concrete representation of modified biopolymers enables the BpForms software tools to calculate the chemical formulae, molecular weights, and charges of biopolymers, as well as to automatically calculate the major protonation and tautomerization state of biopolymers at specific pHs.
BpForms encompasses five tools:
- Notation for describing biopolymers
- Web-based graphical interface: https://bpforms.org
- REST JSON API
- Command line interface
- Python API
BpForms was motivated by the need to concretely represent the biochemistry of DNA modification, DNA repair, post-transcriptional processing, and post-translational processing in whole-cell computational models. In addition, BpForms are a valuable tool for experimental proteomics. In particular, we developed BpForms because there were no notations, schemas, data models, or file formats for concretely representing modified forms of biopolymers, despite the existence of several databases and ontologies of DNA, RNA, and protein modifications and the ProForma Proteoform Notation.
The BpForms syntax was inspired by the ProForma Proteoform Notation. BpForms improves upon this syntax in several ways:
- BpForms separates the representation of modified biopolymers from the chemical processes which generate them.
- BpForms clarifies the representation of multiply modified monomers. This is necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
- BpForms can be customized to represent any modification and, therefore, is not limited to previously enumerated modifications. This is also necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
- BpForms supports two additional types of uncertainty in the structures of biopolymers: uncertainty in the position of a modified nucleotide/amino acid within the polymer sequence, and uncertainty in the chemical identity of modified nucleotide/amino acid as deviation from its expected mass or charge.
- BpForms has a concrete grammar. This enables error checking, as well the calculation of chemical formulae, masses, and charges, which is essential for modeling.
- Install the third-party dependencies listed below. Detailed installation instructions are available in An Introduction to Whole-Cell Modeling.
- To use Marvin to calculate major protonation and tautomerization states, set JAVA_HOME to the path to your Java virtual machine (JVM) export JAVA_HOME=/usr/lib/jvm/default-java
- To use Marvin to calculate major protonation and tautomerization states, add Marvin to the Java class path export CLASSPATH=$CLASSPATH:/opt/chemaxon/marvinsuite/lib/MarvinBeans.jar
- Install this package
- Install the latest release from PyPI. pip install bpforms[all]
- Install the latest revision from GitHub. pip install git+https://github.com/KarrLab/log.git#egg=log pip install git+https://github.com/KarrLab/wc_utils.git#egg=wc_utils[all] pip install git+https://github.com/KarrLab/bpforms.git#egg=bpforms[all]
Examples, tutorial, and documentation
The package is released under the MIT license.
Lang PF, Chebaro Y & Jonathan R. Karr. BpForms: a toolkit for concretely describing modified DNA, RNA and proteins. arXiv:1903.10042. :link:
This package was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA.
Questions and comments
Please contact the Karr Lab with any questions or comments.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|bpforms-0.0.5-py2.py3-none-any.whl (2.6 MB) Copy SHA256 hash SHA256||Wheel||py2.py3|
|bpforms-0.0.5.tar.gz (2.5 MB) Copy SHA256 hash SHA256||Source||None|