Skip to main content

A Python utility for wrapping Rosetta command line tools.

Project description

RosettaPy

A Python utility for wrapping Rosetta command line tools.

License

GitHub License

CI Status

Python CI Test in Rosetta Container Dependabot Updates Pylint Bare Test with Rosetta Container Node pre-commit.ci status

Quality

codecov CodeFactor Maintainability Codacy Badge Pylint GitHub repo size

DeepSource DeepSource

Code style: black linting: pylint Imports: isort Syntax Upgrade: pyupgrade Pycln Flake8

Release

GitHub Release GitHub Release Date

PyPI - Format PyPI - Version PyPI - Status PyPI - Wheel

Python version supported

PyPI - Python Version PyPI - Implementation

Overview

RosettaPy is a Python module designed to locate Rosetta biomolecular modeling suite binaries that follow a specific naming pattern and execute Rosetta in command line. The module includes:

Building Blocks provided by RosettaPy

  • An object-oriented RosettaFinder class to search for binaries.
  • A RosettaBinary dataclass to represent the binary and its attributes.
  • A RosettaCmdTask dataclass to represent a single Rosetta run task.
  • A RosettaContainer dataclass to wrap runs into Rosetta Containers.
  • A MPI_node dataclass to manage MPI resourses. Not Seriously Tested
  • A RosettaRepoManager dataclass to fetch necessary directories and files, and setup as an environment variable.
  • A shortcut method partial_clone to handle repository clonings and setups.

    [!NOTE] Before run this tool, please DO make sure that you have abtained the correct license from Rosetta Commons. For more details, please see this page.

  • A command-line wrapper dataclass Rosetta for handling Rosetta runs.
  • A RosettaScriptsVariableGroup dataclass to represent Rosetta scripts variables.
  • A general and simplified result analyzer RosettaEnergyUnitAnalyser to read and interpret Rosetta output score files.
  • A series of example applications that follow the design elements and patterns described above.
    • PROSS
    • FastRelax
    • RosettaLigand
    • Supercharge
    • MutateRelax
    • Cartesian ddG (Analyser: RosettaCartesianddGAnalyser)
  • Unit tests to ensure reliability and correctness.

Features

  • Flexible Binary Search: Finds Rosetta binaries based on their naming convention.
  • Platform Support: Supports Linux and macOS operating systems.
  • Container Support: Works with Docker containers running upon the official Rosetta Docker image.
  • Customizable Search Paths: Allows specification of custom directories to search.
  • Structured Binary Representation: Uses a dataclass to encapsulate binary attributes.
  • Command-Line Shortcut: Provides a quick way to find binaries via the command line.
  • Available on PyPI: Installable via pip without the need to clone the repository.
  • Unit Tested: Includes tests for both classes to ensure functionality.

Naming Convention

The binaries are expected to follow this naming pattern:

rosetta_scripts[[.mode].oscompilerrelease]
  • Binary Name: rosetta_scripts (default) or specified.
  • Mode (optional): default, mpi, or static.
  • OS (optional): linux or macos.
  • Compiler (optional): gcc or clang.
  • Release (optional): release or debug.

Examples of valid binary filenames:

  • rosetta_scripts (dockerized Rosetta)
  • rosetta_scripts.linuxgccrelease
  • rosetta_scripts.mpi.macosclangdebug
  • rosetta_scripts.static.linuxgccrelease

Installation

Ensure you have Python 3.8 or higher installed.

Install via PyPI

You can install RosettaPy directly from PyPI:

pip install RosettaPy -U

Usage

Building Your Own Rosetta Workflow

# Imports
from RosettaPy import Rosetta, RosettaScriptsVariableGroup, RosettaEnergyUnitAnalyser
from RosettaPy.node import RosettaContainer

# Create a Rosetta object with the desired parameters
rosetta = Rosetta(
    bin="rosetta_scripts",
    flags=[...],
    opts=[
        "-in:file:s", os.path.abspath(pdb),
        "-parser:protocol", "/path/to/my_rosetta_scripts.xml",
    ],
    output_dir=...,
    save_all_together=True,
    job_id=...,

    # Some Rosetta Apps (Superchange, Cartesian ddG, etc.) may produce files in the working directory,
    # and this may not threadsafe if one runs multiple jobs in parallel in the same directory.
    # In this case, the `isolation` flag can be used to create a temporary directory for each run.
    # isolation=True,

    # Optionally, if one wishes to use the Rosetta container.
    # The image name can be found at https://hub.docker.com/r/rosettacommons/rosetta
    # run_node=RosettaContainer(image="rosettacommons/rosetta:latest")
)

# Compose your Rosetta tasks matrix
tasks = [ # Create tasks for each variant
    {
        "rsv": RosettaScriptsVariableGroup.from_dict(
            {
                "var1": ...,
                "var2": ...,
                "var3": ...,
            }
        ),
        "-out:file:scorefile": f"{variant}.sc",
        "-out:prefix": f"{variant}.",
    }
    for variant in variants
]

# Run Rosetta against these tasks
rosetta.run(inputs=tasks)

# Or create a distributed runs with structure labels (-nstruct)
options=[...] # Passing an optional list of options that will be used to all structure models
rosetta.run(nstruct=nstruct, inputs=options) # input options will be passed to all runs equally

# Use Analyzer to check the results
analyser = RosettaEnergyUnitAnalyser(score_file=rosetta.output_scorefile_dir)
best_hit = analyser.best_decoy
pdb_path = os.path.join(rosetta.output_pdb_dir, f'{best_hit["decoy"]}.pdb')

# Ta-da !!!
print("Analysis of the best decoy:")
print("-" * 79)
print(analyser.df.sort_values(by=analyser.score_term))

print("-" * 79)

print(f'Best Hit on this run: {best_hit["decoy"]} - {best_hit["score"]}: {pdb_path}')

Fetching additional scripts/database files from the Rosetta GitHub repository.

[!WARNING] AGAIN, before run this method, please DO make sure that you have licensed by Rosetta Commons. For more details of licensing, please see this page.

This tool is helpful for fetching additional scripts/database files from the Rosetta GitHub repository.

For example, if your local machine does not have Rosetta built and installed, and you wich check some files from $ROSETTA3_DB or $ROSETTA_PYTHON_SCRIPTS before run Rosetta tasks within Rosetta Container, you may quickly use this tool to fetch them into your local machine.

The partial_clone function do will do the following steps:

  1. Check if the Git binary is feasible and the git version >=2.34.1. If not, then raise an error to notify the user to upgrade git.
  2. Check if the target directory is empty or not and the repository is not cloned yet.
  3. Setup partial clone and sparse checkout stuffs.
  4. Clone the repository and subdirectory to the target directory.
  5. Setup the environment variable with the target directory.
import os
from RosettaPy.utils import partial_clone

def clone_db_relax_script():
    """
    A example for cloning the relax scripts from the Rosetta database.

    This function uses the `partial_clone` function to clone specific relax scripts from the RosettaCommons GitHub repository.
    It sets an environment variable to specify the location of the cloned subdirectory and prints the value of the environment variable after cloning.
    """
    # Clone the relax scripts from the Rosetta repository to a specified directory
    partial_clone(
        repo_url="https://github.com/RosettaCommons/rosetta",
        target_dir="rosetta_db_clone_relax_script",
        subdirectory_as_env="database",
        subdirectory_to_clone="database/sampling/relax_scripts",
        env_variable="ROSETTA3_DB",
    )

    # Print the value of the environment variable after cloning
    print(f'ROSETTA3_DB={os.environ.get("ROSETTA3_DB")}')

Environment Variables

The RosettaFinder searches the following directories by default:

  1. PATH, which is commonly used in dockerized Rosetta image.
  2. The path specified in the ROSETTA_BIN environment variable.
  3. ROSETTA3/bin
  4. ROSETTA/main/source/bin/
  5. A custom search path provided during initialization.

Running Tests

The project includes unit tests using Python's pytest framework.

  1. Clone the repository (if not already done):

    git clone https://github.com/YaoYinYing/RosettaPy.git
    
  2. Navigate to the project directory and install the required dependencies:

    cd RosettaPy
    pip install '.[test]'
    
  3. Run the tests:

    # quick test cases
    python -m pytest ./tests -m 'not integration'
    
    # test integration cases
    python -m pytest ./tests -m 'integration'
    

Contributing

Contributions are welcome! Please submit a pull request or open an issue for bug reports and feature requests.

Acknowledgements

  • Rosetta Commons: The Rosetta software suite for the computational modeling and analysis of protein structures.

Contact

For questions or support, please contact:

  • Name: Yinying Yao
  • Email:yaoyy.hi(a)gmail.com

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rosettapy-0.2.3rc172.post1.tar.gz (375.1 kB view details)

Uploaded Source

Built Distribution

rosettapy-0.2.3rc172.post1-py3-none-any.whl (72.9 kB view details)

Uploaded Python 3

File details

Details for the file rosettapy-0.2.3rc172.post1.tar.gz.

File metadata

  • Download URL: rosettapy-0.2.3rc172.post1.tar.gz
  • Upload date:
  • Size: 375.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for rosettapy-0.2.3rc172.post1.tar.gz
Algorithm Hash digest
SHA256 5cfff68fb9e09f27cc2991f55a983e48186f626f430b1ac956ffae914514d8d4
MD5 4c7aded8d153bc4eb3da56563b6106f4
BLAKE2b-256 f0717ce0c8f04bd76cff9016a1e8725cbe26f055359cf137e7c2c7060ff6e3e2

See more details on using hashes here.

File details

Details for the file rosettapy-0.2.3rc172.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for rosettapy-0.2.3rc172.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 b219ca1a7e7abd5307bc61a38c5d4aec0919c38012e057db6744ebcb3480596b
MD5 e350f66965cb44c89134000b78ae2eb8
BLAKE2b-256 0c2f112fdbce30f57c3d61dd30bf0681f4027f562b50ee99ee2a3560ac5d422f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page