Skip to main content

OpenProtein Python interface.

Project description

PyPI version Coverage Conda version

openprotein-python

The OpenProtein.AI Python Interface provides a user-friendly library to interact with the OpenProtein.AI REST API, enabling various tasks related to protein analysis and modeling.

Table of Contents

Workflow Description
0 Quick start Quick start guide
1 Installation Install guide for pip and conda.
2 Session management An overview of the OpenProtein Python Client & the asynchronous jobs system.
3 Asssay-based Sequence Learning Covers core tasks such as data upload, model training & prediction, and sequence design.
4 De Novo prediction & generative models (PoET) Covers PoET, a protein LLM for de novo scoring, as well as sequence generation.
5 Protein Language Models & Embeddings Covers methods for creating sequence embeddings with proprietary & open-source models.

Quick-start

Get started with our quickstart README! You can peruse the official documentation for more details!

Installation

To install the python interface using pip, run the following command:

pip install openprotein-python

or with conda:

conda install -c openprotein openprotein-python

Requirements

  • Python 3.8 or higher.
  • pydantic version 1.0 or newer.
  • requests version 2.0 or newer.
  • tqdm version 4.0 or newer.
  • pandas version 1.0 or newer.

Getting started

Read on below for the quick-start guide, or see the docs for more information!

To begin, create a session using your login credentials.

import openprotein

# replace USERNAME and PASSWORD with your actual login credentials
session = openprotein.connect(USERNAME, PASSWORD)

Job Status

The interface offers AsyncJobFuture objects for asynchronous calls, allowing tracking of job status and result retrieval when ready. Given a future, you can check its status and retrieve results.

Checking Job Status

Check the status of an AsyncJobFuture using the following methods:

future.refresh()  # call the backend to update the job status
future.done()     # returns True if the job is done, meaning the status could be SUCCESS, FAILED, or CANCELLED

Retrieving Job Results

Once the job has finished, retrieve the results using the following methods:

result = future.wait()     # wait until done and then fetch results

#verbosity is controlled with verbose arg
result = future.get(verbose=True)  # get the result from a finished job

Jobs Interface

Listing Jobs

To view all jobs associated with each session, the following method is available, providing an option to filter results by date, job type, or status.

session.jobs.list() 

Retrieving Specific Job

For detailed information about a particular job, use the following command with the corresponding job ID:

session.jobs.get(JOB_ID)  # Replace JOB_ID with the ID of the specific job to be retrieved

Resuming Jobs

Jobs from prior workflows can be resumed using the load_job method provided by each API.

session.load_job(JOB_ID)  # Replace JOB_ID with the ID of the training job to resume

PoET interface

The PoET Interface allows scoring, generating, and retrieving sequences using the PoET model.

Scoring Sequences

To score sequences, use the score function. Provide a prompt and a list of queries. The results will be a list of (sequence, score) pydantic objects.

prompt_seqs = b'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'

prompt = session.poet.upload_prompt(prompt_seqs)
queries = [
    b'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN',
    b'MALWMRLLPLLVLLALWGPDPASAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN',
    b'MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN',
    b'MALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGERGFFYSPKARRDVEQPLVSSPLRGEAGVLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN',
    b'MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEDLQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN',
]
future = session.poet.score(prompt, queries)
result = future.wait()
# result is a list of (sequence, score) pydantic objects

Scoring Single Site Variants

For scoring single site variants, use the single_site function, providing the original sequence and setting prompt_is_seed to True if the prompt is a seed sequence.

sequence = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"
future = session.poet.single_site(prompt, sequence, prompt_is_seed=True) 
result = future.wait()
# result is a dictionary of {variant: score}

Generating Sequences

To generate sequences from the PoET model, use the generate function with relevant parameters. The result will be a list of generated samples.

future = session.poet.generate(
    prompt,
    max_seqs_from_msa=1024,
    num_samples=100,
    temperature=1.0,
    topk=15
)
samples = future.wait()

Retrieving Input Sequences

You can retrieve the prompt, MSA, or seed sequences for a PoET job using the get_input function or the individual functions for each type.

future.get_input(INPUT_TYPE)
# or, functions for each type
future.get_prompt()
future.get_msa()
future.get_seed()

See more at our Homepage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openprotein_python-0.8.8.tar.gz (86.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openprotein_python-0.8.8-py3-none-any.whl (124.6 kB view details)

Uploaded Python 3

File details

Details for the file openprotein_python-0.8.8.tar.gz.

File metadata

  • Download URL: openprotein_python-0.8.8.tar.gz
  • Upload date:
  • Size: 86.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for openprotein_python-0.8.8.tar.gz
Algorithm Hash digest
SHA256 7acbaabd4e5b3bf01f6673893b7c623fc96f1ffb897207bcd1cd1905ecff329d
MD5 f87aef1c8ea6b68bee11219a2555b479
BLAKE2b-256 037ac9334707e1cd3045ec28016b9b679860d3241785d88f914c5e687020e2de

See more details on using hashes here.

File details

Details for the file openprotein_python-0.8.8-py3-none-any.whl.

File metadata

File hashes

Hashes for openprotein_python-0.8.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1c102c0df2810eb2d48a59feb478657649c33c58dfe79b759694f9fd2ac5d914
MD5 4b559fc9e9ff0b17f06b6a7854278bd1
BLAKE2b-256 410d22dddcffec503d48a08ad5cdc69ca0474a2eb1a609e6058645c1ab4d5957

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page