Skip to main content

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs

Project description

GhostFold

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs


Overview

GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.


Installation

1. Install PyTorch with CUDA

GhostFold requires PyTorch with CUDA support. Install the appropriate version for your system before installing GhostFold:

# Example for CUDA 12.1 (adjust for your CUDA version)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Refer to the PyTorch installation guide for platform-specific instructions.

2. Install localcolabfold

Operations that involve protein structure prediction (ghostfold run and ghostfold fold) require a working local ColabFold runtime. From the GhostFold repository root:

chmod +x scripts/install_localcolabfold.sh
./scripts/install_localcolabfold.sh

If you prefer cloud-based structure prediction, you can use the generated pseudoMSAs directly in ColabFold by selecting "custom_msa" under MSA settings and uploading the pseudoMSA generated by GhostFold.

3. Install GhostFold

pip install ghostfold

For development:

git clone https://github.com/brineylab/ghostfold.git
cd ghostfold
pip install -e ".[dev]"

Hugging Face Authentication

GhostFold uses ProstT5 from the Hugging Face Hub. You may need to configure a Hugging Face access token:

huggingface-cli login

See the Hugging Face documentation for details.


CLI Usage

GhostFold provides a single command-line tool with five subcommands:

Generate pseudoMSAs

ghostfold msa --project-name my_project --fasta-file query.fasta

Options:

  • --config PATH — Custom YAML config (overrides bundled defaults)
  • --coverage FLOAT — Coverage values (repeatable, default: 1.0)
  • --num-runs INT — Independent runs per sequence (default: 1)
  • --evolve-msa — Enable MSA evolution with substitution matrices
  • --mutation-rates JSON — Mutation rates per matrix
  • --sample-percentage FLOAT — Fraction of sequences to evolve (default: 1.0)
  • --plot-msa-coverage — Generate MSA coverage heatmaps
  • --no-coevolution-maps — Skip coevolution map generation

Run structure prediction

ghostfold fold --project-name my_project

Options:

  • --subsample — Enable MSA subsampling (multiple depth levels)
  • --mask-fraction FLOAT — Mask a fraction of MSA residues (0.0-1.0)
  • --num-gpus INT — Override auto-detected GPU count
  • --localcolabfold-dir PATH — Path to localcolabfold pixi checkout (default: ./localcolabfold)
  • --colabfold-env TEXT — Legacy mamba env name for ColabFold fallback (default: colabfold)

Full pipeline (MSA + folding)

ghostfold run --project-name my_project --fasta-file query.fasta

Combines all options from msa and fold commands.

Mask MSA files

ghostfold mask --input-path input.a3m --output-path masked.a3m --mask-fraction 0.15

Calculate Neff scores

ghostfold neff my_project/

Version

ghostfold --version

Python API

GhostFold can also be used as a Python library:

from ghostfold import run_pipeline, mask_a3m_file, calculate_neff, MSA_Mutator
from ghostfold.core.config import load_config

# Load config with optional overrides
config = load_config("my_config.yaml")

# Run MSA generation pipeline
run_pipeline(
    project="my_project",
    query_fasta="query.fasta",
    config=config,
    coverage_list=[1.0],
    evolve_msa=True,
    mutation_rates_str='{"MEGABLAST": 5, "PAM250": 20, "BLOSUM62": 10}',
    sample_percentage=1.0,
    plot_msa=False,
    plot_coevolution=False,
)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostfold-0.1.1.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostfold-0.1.1-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file ghostfold-0.1.1.tar.gz.

File metadata

  • Download URL: ghostfold-0.1.1.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3f205ac15e088a92587c8222a532f69e2ca27c028ce94ad568cc930273c53931
MD5 765c2d95fc175191f782d1e51a9a339a
BLAKE2b-256 a96373cc6b58c72050527e4ad430ea641711d3b94b1e79eb91584e9451f195c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.1.tar.gz:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ghostfold-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ghostfold-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ef13be03ffec70526e128bfc8639dd8a1c9b8b96e1c946cbe479fe3dd7f65fca
MD5 afdfdcc803cd5e6a23a295a1576e14f6
BLAKE2b-256 e56cf290f7efd127de760a7bf2cea857322f192bbf92b713d406a70a0e6a1531

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.1-py3-none-any.whl:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page