A Python utility for wrapping Rosetta command line tools.
Project description
RosettaPy
A Python Utility for Wrapping Rosetta Macromolecural Modeling Suite.
[!CAUTION]
RosettaPy
requiresRosetta
compiled and installed. Before runningRosettaPy
, please DO make sure that you have obtained the correct license from Rosetta Commons. For more details, please see this page.
[!IMPORTANT]
RosettaPy
is NOTPyRosetta
. You probably don't need to install this package if you are looking forPyRosetta
. Please see this page.
License
CI Status
Quality
Release
Python version supported
Overview
RosettaPy
is a Python module designed to locate Rosetta biomolecular modeling suite binaries that follow a specific naming pattern and execute Rosetta in command line. The module includes:
Class/Component | Description |
---|---|
RosettaFinder | A class designed to search for binary files within specified directories. |
RosettaBinary | Represents a binary file and its associated attributes, such as path and version. |
RosettaCmdTask | Encapsulates a single task for running Rosetta, including command-line arguments and input files. |
RosettaRepoManager | Fetches necessary directories and files, sets up environment variables, and provides a partial_clone method for cloning and setting up repositories. |
Rosetta | A command-line wrapper for executing Rosetta runs, simplifying the process of setting up and running commands. |
RosettaScriptsVariableGroup | Represents variables used in Rosetta scripts, facilitating their management and use. |
RosettaEnergyUnitAnalyser | Analyzes and interprets Rosetta output score files, providing a simplified interface for result analysis. |
Example Applications | Demonstrates the use of the above components through specific Rosetta applications like PROSS, FastRelax, RosettaLigand, Supercharge, MutateRelax, and Cartesian ddG, each tailored to different computational biology tasks. |
Features
- Flexible Binary Search: Finds Rosetta binaries based on their naming convention.
- Platform Support: Supports Windows, Linux and macOS operating systems.
- Container Support: Works with Docker containers running upon the official Rosetta Docker image.
- Customizable Search Paths: Allows specification of custom directories to search.
- Structured Binary Representation: Uses a dataclass to encapsulate binary attributes.
- Command-Line Shortcut: Provides a quick way to find binaries via the command line.
- Available on PyPI: Installable via
pip
without the need to clone the repository. - Unit Tested: Includes tests for both classes to ensure functionality.
Naming Convention
The Rosetta binaries are expected to follow this naming pattern:
rosetta_scripts[[.mode].oscompilerrelease]
- Binary Name:
rosetta_scripts
(default) or specified. - Mode (optional):
default
,mpi
, orstatic
. - OS (optional):
linux
ormacos
. - Compiler (optional):
gcc
orclang
. - Release (optional):
release
ordebug
.
For reg expression to match the basenames:
^(?P<binary_name>[\w]+)((\.(?P<mode>static|mpi|default|cxx11threadserialization|cxx11threadmpiserialization))?(\.(?P<os>linux|macos)(?P<compiler>gcc|clang)(?P<release>release|debug)))?$
See this regex101 page for more details.
Examples of valid binary filenames:
rosetta_scripts
(dockerized Rosetta)rosetta_scripts.linuxgccrelease
rosetta_scripts.mpi.macosclangdebug
rosetta_scripts.static.linuxgccrelease
Installation
One can install RosettaPy
directly from PyPI:
pip install RosettaPy -U
Basic Usages
RosettaPy
is designed to handle the complexities of locating and running Rosetta binaries within Python.
Build Rosetta Workflow from scratch
-
Import necessary modules
from RosettaPy import Rosetta, RosettaScriptsVariableGroup, RosettaEnergyUnitAnalyser from RosettaPy.node import RosettaContainer, MpiNode, Native
-
Create a
Rosetta
proxy with parametersrosetta = Rosetta( # a binary name for locating the real binary path bin="rosetta_scripts", # flag file paths (please do not use `@` prefix here) flags=[...], # command-line options opts=[ "-in:file:s", os.path.abspath(pdb), "-parser:protocol", "/path/to/my_rosetta_scripts.xml", ], # output directory output_dir=..., # save pdb and scorefile together save_all_together=True, # a job identifier job_id=..., # silent the rosetta logs from stdout verbose = False, )
RosettaPy uses
Native
node by default.It has to be noted that
Native
andMpiNode
are only available on Linux and macOS. For Windows users, please refer to theFull Operating System Compatibility Table
andGet Windows Ready for Rosetta Runs
sections below. -
Compose rosetta tasks matrix as inputs
tasks: list[Dict[str, Any]] = [ # Create tasks for each variant { "rsv": RosettaScriptsVariableGroup.from_dict( { "var1": ..., "var2": ..., "var3": ..., } ), "-out:file:scorefile": f"{variant}.sc", "-out:prefix": f"{variant}.", } for variant in variants ] # pass task matrix to rosetta.run as `inputs` rosetta.run(inputs=tasks)
-
Using structure labels (
-nstruct
)Create distributed runs with structure labels (
-nstruct <int>
) is feasible. For local runs without MPI or container,RosettaPy
implemented this feature by ignoring the build-in job distributer of Rosetta, canceling the default output structure label, attaching external structural label as unique job identifier to each other, then run these tasks only once for each. This enables massive parallalism.options=[...] # Passing an optional list of options that will be used to all structure models rosetta.run(nstruct=nstruct, inputs=options) # input options will be passed to all runs equally
-
Call Analyzer to check the results
analyser = RosettaEnergyUnitAnalyser(score_file=rosetta.output_scorefile_dir) best_hit = analyser.best_decoy pdb_path = os.path.join(rosetta.output_pdb_dir, f'{best_hit["decoy"]}.pdb') # Ta-da !!! print("Analysis of the best decoy:") print("-" * 79) print(analyser.df.sort_values(by=analyser.score_term)) print("-" * 79) print(f'Best Hit on this run: {best_hit["decoy"]} - {best_hit["score"]}: {pdb_path}')
One can also build a customized analyser by re-using the
analyser.df
Dataframe.
Advanced Usages
Here are some tips for advanced usages to adjust the workflow in respect to behaviors of some Rosetta workflows and applications.
Isolation Mode
Some Rosetta Apps (Superchange, Cartesian ddG, etc.) may produce files at their working directory, and this may not threadsafe if one runs multiple jobs in parallel in the same directory. In this case, the isolation
flag can be used to create a temporary directory for each run.
Rosetta(
...
+ isolation=True,
)
Run Node Configurations
There are various node configurations that can be used to run Rosetta jobs.
Full Operating System Compatibility Table
Node | Linux | macOS | Windows | Architectures | Prerequisite |
---|---|---|---|---|---|
Native | ✅[^1] | ✅ | ❌ | x86_64, aarch64 | Rosetta compiled. |
MpiNode | ✅[^1] | ✅ | ❌ | x86_64, aarch64 | Rosetta compiled with extras=mpi flag; MPI installed |
RosettaContainer | ✅ | ✅[^3] | ✅ | x86_64 | Docker or Docker Desktop installed and launched. |
WslWrapper[^2] | ❌ | ❌ | ✅[^1] | x86_64, aarch64[^4] | WSL2, with Rosetta built and installed on. |
[^1]: For building Rosetta upon aarch64, please refer to this example. [^2]: Windows Subsystem for Linux(WSL) installed and switched to WSL2, with Rosetta built and installed on. [^3]: Translated with Rosetta2 framework if runs on Apple Silicon Mac, which may cause worthy slow performance. [^4]: It's theoretically possible yet no testing is done at all.
Reallocate CPU Cores
By default, RosettaPy
uses Native
node, representing the local machine with Rosetta installed.
To specify the number of cores, use the nproc
parameter.
Rosetta(
...
+ run_node=Native(nproc=8)
)
Native MPI Support for local builds of Rosetta
Requires MPI(mpich, openmpi, etc.) installed.
If one wish to run with Rosetta that was installed on local and built with extra=mpi
flag via MPI,
consider using MpiNode
instance as run_node
instead. This enables native parallelism feature with MPI.
Rosetta(
...
+ run_node=MpiNode(nproc=10),
)
Also, if one wishes to use MpiNode with Slurm task manager, specifying run_node
to MpiNode.from_slurm()
may help
with fetching the node info from the environment.
This is an experimental feature that has not been seriously tested in production.
Rosetta(
...
+ run_node=MpiNode.from_slurm(),
)
Rosetta Container
If one wishes to use the Rosetta container as the task worker, (WSL + Docker Desktop, for example)
setting a run_node
option as RosettaContainer
class would tell the proxy to use it.
This image names can be found at https://hub.docker.com/r/rosettacommons/rosetta
Note that the paths of each task will be mounted into the container and rewritten to the container's path.
This rewriting feature may fail if the path is mixed with complicated expressions as options.
Non-mpi image:
Rosetta(
...
+ run_node=RosettaContainer(image="rosettacommons/rosetta:latest"),
)
or MPI image:
Rosetta(
...
+ run_node=RosettaContainer(image="rosettacommons/rosetta:mpi"),
+ use_mpi=True, # one still needs to enable MPI by this flag if MPI is required.
)
During the workflow processing, one will see some active containers at Containers
tab of Docker Desktop
, if Docker Desktop
is installed. Also, typing docker ps
in the terminal will show them too. Each of these containers will be destructed immediately after its task finished or stopped.
Pick Node Accordingly
One can still pick the desire node quickly by calling node_picker
method.
from RosettaPy.node import node_picker, NodeHintT
node_hint: NodeHintT = 'docker_mpi'
Rosetta(
...
+ run_node=node_picker(node_type=node_hint),
+ use_mpi=...,
)
Where node_hint
is one of ["docker", "docker_mpi", "mpi", "wsl", "wsl_mpi", "native"]
Fetching additional scripts/database files from the Rosetta GitHub repository
[!CAUTION] AGAIN, before using this tool, please DO make sure that you have licensed by Rosetta Commons. For more details of licensing, please check this page.
This tool is helpful for fetching additional scripts/database files/directories from the Rosetta GitHub repository.
For example, if one's local machine does not have Rosetta built and installed, and wishes to check some files from $ROSETTA3_DB
or use some helper scripts at $ROSETTA_PYTHON_SCRIPTS
before run Rosetta tasks within Rosetta Container, one can use this tool to fetch them into the local harddrive by doing a minimum cloning.
The partial_clone
function do will do the following steps:
- Check if Git is installed and versioned with
>=2.34.1
. If not satisfied, raise an error to notify the user to upgrade git. - Check if the target directory is empty or not and the repository is not cloned yet.
- Setup partial clone and sparse checkout stuffs.
- Clone the repository and subdirectory to the target directory.
- Setup the environment variable with the target directory.
import os
from RosettaPy.utils import partial_clone
def clone_db_relax_script():
"""
A example for cloning the relax scripts from the Rosetta database.
This function uses the `partial_clone` function to clone specific relax scripts from the RosettaCommons GitHub repository.
It sets an environment variable to specify the location of the cloned subdirectory and prints the value of the environment variable after cloning.
"""
# Clone the relax scripts from the Rosetta repository to a specified directory
partial_clone(
repo_url="https://github.com/RosettaCommons/rosetta",
target_dir="rosetta_db_clone_relax_script",
subdirectory_as_env="database",
subdirectory_to_clone="database/sampling/relax_scripts",
env_variable="ROSETTA3_DB",
)
# Print the value of the environment variable after cloning
print(f'ROSETTA3_DB={os.environ.get("ROSETTA3_DB")}')
Get Windows Ready for Rosetta Runs
Thanks for Windows Subsystem for Linux(WSL), we provide two simple ways to run Rosetta on Windows.
One must enable Windows Subsystem for Linux
, then switch to WSL2
following the instructions on this page.
Docker Desktop
-
Install
Docker Desktop
and enableWSL2 docker engine
. -
Search for the Image
rosettacommons/rosetta:<label>
where<label>
is the version of Rosetta build one want to use.- Note: network proxies or docker registry mirror setting may be required for users behind the GFW.
-
Use
RosettaContainer
class as the run node, with the image name one just pulled. -
Make sure all the input files are using
LF
ending instead ofCRLF
. This is fatal for Rosetta to parse input files.- Note: this issue now can be done by using a context manager
convert_crlf_to_lf
fromRosettaPy.utils.tools
. Example:
from RosettaPy.utils.tools import convert_crlf_to_lf with convert_crlf_to_lf(input_file) as output_file: """Use `output_file` to replace `input_file`."""
- Note: this issue now can be done by using a context manager
-
Build Rosetta workflow with
RosettaPy
and run it.
WSL Wrapper
- Install any recent release of Linux Distribution (e.g.,
Ubuntu-22.04
) and setup for the account. - Install build essential tools:
apt-get update && apt-get install build-essential git -y
- Note: network proxies or apt repostory mirror setting may be required for users behind the GFW.
- Install MPI:
apt-get install mpich -y
. MPICH is sufficient for Rosetta. - Install Python:
apt-get install python-is-python3 -y
- Fetch the source code of Rosetta and un-tar it to anywhere convenient. e.g.
/opt/rosetta
- Go to the source code directory and build it according to the Official Rosetta Build Documentation.
- Environment variables required by RosettaPy are:
ROSETTA_BIN
: path to the Rosetta executablesROSETTA3_DB
: path to the Rosetta databaseROSETTA_PYTHON_SCRIPTS
: path to the Rosetta scripts
- Use
WslWrapper
class as the run node. Parameters:rosetta_bin
:RosettaBinary
with in-wsldirname
(the absolute path of Rosetta binary directory in WSL distro).distro
: the name of the Linux Distribution. for example,'Ubuntu-22.04'
user
: the name one just setup as the user in the Linux Distribution.nproc
: number of CPU cores to be used.prohibit_mpi
: whether to prohibit MPI.
Environment Variables
The RosettaFinder
searches the following directories by default:
PATH
, which is commonly used in dockerized Rosetta image.- The path specified in the
ROSETTA_BIN
environment variable. ROSETTA3/bin
ROSETTA/main/source/bin/
- A custom search path provided during initialization.
Running Tests
The project includes unit tests using Python's pytest
framework.
-
Clone the repository (if not already done):
git clone https://github.com/YaoYinYing/RosettaPy.git
-
Navigate to the project directory and install the required dependencies:
cd RosettaPy pip install '.[test]'
-
Run the tests:
# quick test cases pytest ./tests -m 'not integration' # test integration cases pytest ./tests -m 'integration' # run integration tests with both docker and local export GITHUB_CONTAINER_ROSETTA_TEST=YES pytest ./tests -m 'integration'
Contributing
Contributions are welcome! Please submit a pull request or open an issue for bug reports and feature requests.
Acknowledgements
- Rosetta Commons: The Rosetta software suite for the computational modeling and analysis of protein structures.
- CIs, formatters, checkers and hooks that save my life and make this tool improved.
- ChatGPT, Tongyi Lingma, DeepSource Autofix™ AI and CodeRabbit for the documentation, code improvements, test cases generations and code revisions.
Contact
For questions or support, please contact:
- Name: Yinying Yao
- Email:yaoyy.hi(a)gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rosettapy-0.2.10rc256.post1.tar.gz
.
File metadata
- Download URL: rosettapy-0.2.10rc256.post1.tar.gz
- Upload date:
- Size: 399.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b627d19670bdb3d3d42965bfcbce5d59fc7757efe20db0eee4d798279ed88c36 |
|
MD5 | dbf195f0a684bcec5e2fa5443a68f44b |
|
BLAKE2b-256 | be1e1006e73908c8fe62bfff2cfdd4ea9774cd39f58c93c79d021c578874c1a0 |
File details
Details for the file rosettapy-0.2.10rc256.post1-py3-none-any.whl
.
File metadata
- Download URL: rosettapy-0.2.10rc256.post1-py3-none-any.whl
- Upload date:
- Size: 90.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f02a232d1e984027319052a5db2bee60a41e0a5502c230d703dfcf5fe9ce88d |
|
MD5 | 644e80cede126a371aac495600f15e31 |
|
BLAKE2b-256 | 0adc18f9d0dcc13214cc0c0c8890ef01baa5fd86952c60dccdecf88b8430dfee |