Skip to main content

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs

Project description

tests

GhostFold

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs


Overview

GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.


Installation

1. Install PyTorch with CUDA

GhostFold requires PyTorch with CUDA support. Install the appropriate version for your system before installing GhostFold:

# Example for CUDA 12.1 (adjust for your CUDA version)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Refer to the PyTorch installation guide for platform-specific instructions.

2. Install localcolabfold

Operations that involve protein structure prediction (ghostfold run and ghostfold fold) require a working local ColabFold runtime. From the GhostFold repository root:

chmod +x scripts/install_localcolabfold.sh
./scripts/install_localcolabfold.sh

If you prefer cloud-based structure prediction, you can use the generated pseudoMSAs directly in ColabFold by selecting "custom_msa" under MSA settings and uploading the pseudoMSA generated by GhostFold.

3. Install GhostFold

pip install ghostfold

For development:

git clone https://github.com/brineylab/ghostfold.git
cd ghostfold
pip install -e ".[dev]"

Hugging Face Authentication

GhostFold uses ProstT5 from the Hugging Face Hub. You may need to configure a Hugging Face access token:

huggingface-cli login

See the Hugging Face documentation for details.


CLI Usage

GhostFold provides a single command-line tool with five subcommands:

Generate pseudoMSAs

ghostfold msa --project-name my_project --fasta-path query.fasta

Options:

  • --config PATH — Custom YAML config (overrides bundled defaults)
  • --recursive — Recursively search directories for FASTA files
  • --coverage FLOAT — Coverage values (repeatable, default: 1.0)
  • --num-runs INT — Independent runs per sequence (default: 1)
  • --evolve-msa — Enable MSA evolution with substitution matrices
  • --mutation-rates JSON — Mutation rates per matrix
  • --sample-percentage FLOAT — Fraction of sequences to evolve (default: 1.0)
  • --plot-msa-coverage — Generate MSA coverage heatmaps
  • --no-coevolution-maps — Skip coevolution map generation

Run structure prediction

ghostfold fold --project-name my_project

Options:

  • --subsample — Enable MSA subsampling (multiple depth levels)
  • --mask-fraction FLOAT — Mask a fraction of MSA residues (0.0-1.0)
  • --num-gpus INT — Override auto-detected GPU count
  • --localcolabfold-dir PATH — Path to localcolabfold pixi checkout (default: ./localcolabfold)
  • --colabfold-env TEXT — Legacy mamba env name for ColabFold fallback (default: colabfold)

Full pipeline (MSA + folding)

ghostfold run --project-name my_project --fasta-path query.fasta

Combines all options from msa and fold commands.

Mask MSA files

ghostfold mask --input-path input.a3m --output-path masked.a3m --mask-fraction 0.15

Calculate Neff scores

ghostfold neff my_project/

Version

ghostfold --version

Python API

GhostFold can also be used as a Python library:

from ghostfold import run_pipeline, mask_a3m_file, calculate_neff, MSA_Mutator
from ghostfold.core.config import load_config

# Load config with optional overrides
config = load_config("my_config.yaml")

# Run MSA generation pipeline
run_pipeline(
    project="my_project",
    fasta_path="query.fasta",
    config=config,
    coverage_list=[1.0],
    evolve_msa=True,
    mutation_rates_str='{"MEGABLAST": 5, "PAM250": 20, "BLOSUM62": 10}',
    sample_percentage=1.0,
    plot_msa=False,
    plot_coevolution=False,
)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostfold-0.1.2.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostfold-0.1.2-py3-none-any.whl (42.0 kB view details)

Uploaded Python 3

File details

Details for the file ghostfold-0.1.2.tar.gz.

File metadata

  • Download URL: ghostfold-0.1.2.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8f426132562faeb28328dcb02c61493e69f6440fbd4ca7d10f449c8221147acc
MD5 95e23f92d3655073955c4f9bb25048c0
BLAKE2b-256 5d2386a5b86e2797b8fc0be4e4b4bd4fde00a0d5041aca7255263823a947a118

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.2.tar.gz:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ghostfold-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ghostfold-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 42.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9e0c92b949ee437a73531005479c9cdc0c69551105b5f3311d9ebffd2bb5526e
MD5 54a7dff5dafef2c5fee4b67309ad72a9
BLAKE2b-256 f48e5c1eb23dd99c069ce72a6d43bc0dbe33c44cc12a604b15a3920317a7a4a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.2-py3-none-any.whl:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page