Skip to main content

A Python utility for wrapping Rosetta command line tools.

Project description

RosettaPy

A Python Utility for Wrapping Rosetta Macromolecural Modeling Suite.

[!NOTE] Before running RosettaPy, please DO make sure that you have abtained the correct license from Rosetta Commons. For more details, please see this page.

License

GitHub License

CI Status

Python CI Test in Rosetta Container Dependabot Updates Pylint Bare Test with Rosetta Container Node pre-commit.ci status

Quality

codecov CodeFactor Maintainability Codacy Badge Pylint GitHub repo size

DeepSource DeepSource

Code style: black linting: pylint imports: isort syntax upgrade: pyupgrade pycln Flake8 autoflake

Release

GitHub Release GitHub Release Date

PyPI - Format PyPI - Version PyPI - Status PyPI - Wheel

Python version supported

PyPI - Python Version PyPI - Implementation

Overview

RosettaPy is a Python module designed to locate Rosetta biomolecular modeling suite binaries that follow a specific naming pattern and execute Rosetta in command line. The module includes:

Class/Component Description
RosettaFinder A class designed to search for binary files within specified directories.
RosettaBinary Represents a binary file and its associated attributes, such as path and version.
RosettaCmdTask Encapsulates a single task for running Rosetta, including command-line arguments and input files.
RosettaContainer Wraps multiple Rosetta tasks into a container, managing file system mounts and resource allocation.
MpiNode Manages MPI resources for parallel computing tasks; note that it is not thoroughly tested.
RosettaRepoManager Fetches necessary directories and files, sets up environment variables, and provides a partial_clone method for cloning and setting up repositories.
WslWrapper Wrapper for running Rosetta on Windows Subsystem for Linux (WSL). Requires Rosetta installed in WSL.
Rosetta A command-line wrapper for executing Rosetta runs, simplifying the process of setting up and running commands.
RosettaScriptsVariableGroup Represents variables used in Rosetta scripts, facilitating their management and use.
RosettaEnergyUnitAnalyser Analyzes and interprets Rosetta output score files, providing a simplified interface for result analysis.
Example Applications Demonstrates the use of the above components through specific Rosetta applications like PROSS, FastRelax, RosettaLigand, Supercharge, MutateRelax, and Cartesian ddG, each tailored to different computational biology tasks.

Features

  • Flexible Binary Search: Finds Rosetta binaries based on their naming convention.
  • Platform Support: Supports Linux and macOS operating systems.
  • Container Support: Works with Docker containers running upon the official Rosetta Docker image.
  • Customizable Search Paths: Allows specification of custom directories to search.
  • Structured Binary Representation: Uses a dataclass to encapsulate binary attributes.
  • Command-Line Shortcut: Provides a quick way to find binaries via the command line.
  • Available on PyPI: Installable via pip without the need to clone the repository.
  • Unit Tested: Includes tests for both classes to ensure functionality.

Naming Convention

The binaries are expected to follow this naming pattern:

rosetta_scripts[[.mode].oscompilerrelease]
  • Binary Name: rosetta_scripts (default) or specified.
  • Mode (optional): default, mpi, or static.
  • OS (optional): linux or macos.
  • Compiler (optional): gcc or clang.
  • Release (optional): release or debug.

Examples of valid binary filenames:

  • rosetta_scripts (dockerized Rosetta)
  • rosetta_scripts.linuxgccrelease
  • rosetta_scripts.mpi.macosclangdebug
  • rosetta_scripts.static.linuxgccrelease

Installation

Ensure Python 3.8 or higher installed.

Install via PyPI

You can install RosettaPy directly from PyPI:

pip install RosettaPy -U

Usage

Build Your Own Rosetta Workflow

Import necessary modules

from RosettaPy import Rosetta, RosettaScriptsVariableGroup, RosettaEnergyUnitAnalyser
from RosettaPy.node import RosettaContainer, MpiNode

Create a Rosetta proxy with parameters

rosetta = Rosetta(
    # a binary name for locating the real binary path
    bin="rosetta_scripts",

    # flag file paths (please do not use `@` prefix here)
    flags=[...],

    # command-line options
    opts=[
        "-in:file:s", os.path.abspath(pdb),
        "-parser:protocol", "/path/to/my_rosetta_scripts.xml",
    ],

    # output directory
    output_dir=...,

    # save pdb and scorefile together
    save_all_together=True,

    # a job identifier
    job_id=...,

    # silent the rosetta logs from stdout
    verbose = False,
)

Isolation Mode

Some Rosetta Apps (Superchange, Cartesian ddG, etc.) may produce files at their working directory, and this may not threadsafe if one runs multiple jobs in parallel in the same directory. In this case, the isolation flag can be used to create a temporary directory for each run.

Rosetta(
    ...
+   isolation=True,
)

Native as Run Node

By default, RosettaPy uses Native node, representing the local machine with Rosetta installed. To specify the number of cores, use the nproc parameter.

Rosetta(
    ...
+   run_node=Native(nproc=8)
)

Run rosetta tasks with Rosetta Container

If one wishes to use the Rosetta container as the task worker, (WSL + Docker Desktop, for example) setting a run_node option as RosettaContainer class would tell the proxy to use it. This image names can be found at https://hub.docker.com/r/rosettacommons/rosetta Note that the paths of each task will be mounted into the container and rewritten to the container's path. This rewriting feature may fail if the path is mixed with complicated expressions as options.

Rosetta(
    ...
+   run_node=RosettaContainer(image="rosettacommons/rosetta:latest"),
)

Run rosetta tasks with MPI

If one wish to run with Rosetta that was installed on local and built with extra=mpi flag via MPI, consider using MpiNode instance as run_node instead. This enables native parallelism feature with MPI.

Rosetta(
    ...
+   run_node=MpiNode(nproc=10),
)

Also, if one wishes to use MpiNode with Slurm task manager, specifying run_node to MpiNode.from_slurm() may help with fetching the node info from the environment.

This is an experimental feature that has not been seriously tested in production.

Rosetta(
    ...
+   run_node=MpiNode.from_slurm(),
)

Pick Your Node

One can still pick the desire node quickly by calling node_picker method.

from RosettaPy.node import node_picker, NodeHintT

node_hint: NodeHintT = 'docker_mpi'

Rosetta(
    ...
+   run_node=node_picker(node_type=node_hint)
)

Where node_hint is one of ["docker", "docker_mpi", "mpi", "wsl", "wsl_mpi", "native"]

Compose rosetta tasks matrix as inputs

tasks = [ # Create tasks for each variant
    {
        "rsv": RosettaScriptsVariableGroup.from_dict(
            {
                "var1": ...,
                "var2": ...,
                "var3": ...,
            }
        ),
        "-out:file:scorefile": f"{variant}.sc",
        "-out:prefix": f"{variant}.",
    }
    for variant in variants
]

# pass task matrix to rosetta.run as `inputs`
rosetta.run(inputs=tasks)

Using structure labels (-nstruct)

Create distributed runs with structure labels (-nstruct) is feasible. For local runs without MPI or container, RosettaPy implemented this feature by ignoring the build-in job distributer of Rosetta, canceling the default output structure label, attaching external structural label as unique job identifier to each other, then run these tasks only once for each. This enables massive parallalism.

options=[...] # Passing an optional list of options that will be used to all structure models
rosetta.run(nstruct=nstruct, inputs=options) # input options will be passed to all runs equally

Call Analyzer to check the results

analyser = RosettaEnergyUnitAnalyser(score_file=rosetta.output_scorefile_dir)
best_hit = analyser.best_decoy
pdb_path = os.path.join(rosetta.output_pdb_dir, f'{best_hit["decoy"]}.pdb')

# Ta-da !!!
print("Analysis of the best decoy:")
print("-" * 79)
print(analyser.df.sort_values(by=analyser.score_term))

print("-" * 79)

print(f'Best Hit on this run: {best_hit["decoy"]} - {best_hit["score"]}: {pdb_path}')

Fetching additional scripts/database files from the Rosetta GitHub repository.

[!WARNING] AGAIN, before using this tool, please DO make sure that you have licensed by Rosetta Commons. For more details of licensing, please check this page.

This tool is helpful for fetching additional scripts/database files/directories from the Rosetta GitHub repository.

For example, if one's local machine does not have Rosetta built and installed, and wishes to check some files from $ROSETTA3_DB or use some helper scripts at $ROSETTA_PYTHON_SCRIPTS before run Rosetta tasks within Rosetta Container, one can use this tool to fetch them into the local harddrive by doing a minimum cloning.

The partial_clone function do will do the following steps:

  1. Check if Git is installed and versioned with >=2.34.1. If not satisfied, raise an error to notify the user to upgrade git.
  2. Check if the target directory is empty or not and the repository is not cloned yet.
  3. Setup partial clone and sparse checkout stuffs.
  4. Clone the repository and subdirectory to the target directory.
  5. Setup the environment variable with the target directory.
import os
from RosettaPy.utils import partial_clone

def clone_db_relax_script():
    """
    A example for cloning the relax scripts from the Rosetta database.

    This function uses the `partial_clone` function to clone specific relax scripts from the RosettaCommons GitHub repository.
    It sets an environment variable to specify the location of the cloned subdirectory and prints the value of the environment variable after cloning.
    """
    # Clone the relax scripts from the Rosetta repository to a specified directory
    partial_clone(
        repo_url="https://github.com/RosettaCommons/rosetta",
        target_dir="rosetta_db_clone_relax_script",
        subdirectory_as_env="database",
        subdirectory_to_clone="database/sampling/relax_scripts",
        env_variable="ROSETTA3_DB",
    )

    # Print the value of the environment variable after cloning
    print(f'ROSETTA3_DB={os.environ.get("ROSETTA3_DB")}')

Windows? Yes.

Thanks to the official container image, it is possible to run RosettaPy on Windows. Here's the steps one should follow:

  1. Enable Windows Subsystem for Linux, and switch to WSL2(https://aka.ms/wsl2kernel)
  2. Install Docker Desktop and enable WSL2 docker engine.
  3. Search for the Image rosettacommons/rosetta:<label> where <label> is the version of Rosetta build you want to use.
  4. Use RosettaContainer class as the run node, with the image name you just pulled.
  5. Make sure all your input files are using LF ending instead of CRLF. This is fatal for Rosetta to parse input files. For details on CRLF vs LF on git clone, please refer to this page
  6. Build you Rosetta workflow with RosettaPy and run it.

During the workflow processing, you will see some active containers at Containers tab of Docker Desktop.

Environment Variables

The RosettaFinder searches the following directories by default:

  1. PATH, which is commonly used in dockerized Rosetta image.
  2. The path specified in the ROSETTA_BIN environment variable.
  3. ROSETTA3/bin
  4. ROSETTA/main/source/bin/
  5. A custom search path provided during initialization.

Running Tests

The project includes unit tests using Python's pytest framework.

  1. Clone the repository (if not already done):

    git clone https://github.com/YaoYinYing/RosettaPy.git
    
  2. Navigate to the project directory and install the required dependencies:

    cd RosettaPy
    pip install '.[test]'
    
  3. Run the tests:

    # quick test cases
    pytest ./tests -m 'not integration'
    
    # test integration cases
    pytest ./tests -m 'integration'
    
    # run integration tests with both docker and local
    export GITHUB_CONTAINER_ROSETTA_TEST=YES
    pytest ./tests -m 'integration'
    

Contributing

Contributions are welcome! Please submit a pull request or open an issue for bug reports and feature requests.

Acknowledgements

  • Rosetta Commons: The Rosetta software suite for the computational modeling and analysis of protein structures.
  • CIs, formatters, checkers and hooks that save my life and make this tool improved.
  • ChatGPT, Tongyi Lingma and DeepSource Autofix™ AI for the documentation, code improvements, test cases and code revisions.

Contact

For questions or support, please contact:

  • Name: Yinying Yao
  • Email:yaoyy.hi(a)gmail.com

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rosettapy-0.2.9.tar.gz (391.0 kB view details)

Uploaded Source

Built Distribution

rosettapy-0.2.9-py3-none-any.whl (85.5 kB view details)

Uploaded Python 3

File details

Details for the file rosettapy-0.2.9.tar.gz.

File metadata

  • Download URL: rosettapy-0.2.9.tar.gz
  • Upload date:
  • Size: 391.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for rosettapy-0.2.9.tar.gz
Algorithm Hash digest
SHA256 0a038c34a2da0d1894461fd88f872688eefeb32ebcd0158c4077df6b7feff021
MD5 2eefe704a1e75ca1e2b36923daec5532
BLAKE2b-256 e234e88e16d3fcd9bf274b0ee0c586c29c568168c81117b5c81aa0b09e5eb83d

See more details on using hashes here.

File details

Details for the file rosettapy-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: rosettapy-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 85.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for rosettapy-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 d9fd430426aa7cfe5112d43d320c57ffad38c7e5b04a26dd1c31f32e0ca8c18e
MD5 360bd4791f8a7f751b02652596a28963
BLAKE2b-256 f890278982aa3ea785620d24b394d020f5b7cd8ffa9e0b78af77c0c65169bb47

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page