Skip to main content

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs

Project description

tests

GhostFold

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs


Overview

GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.


Installation

1. Install PyTorch with CUDA

GhostFold requires PyTorch with CUDA support. Install the appropriate version for your system before installing GhostFold:

# Example for CUDA 12.1 (adjust for your CUDA version)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Refer to the PyTorch installation guide for platform-specific instructions.

2. Install localcolabfold

Operations that involve protein structure prediction (ghostfold run and ghostfold fold) require a working local ColabFold runtime. From the GhostFold repository root:

chmod +x scripts/install_localcolabfold.sh
./scripts/install_localcolabfold.sh

If you prefer cloud-based structure prediction, you can use the generated pseudoMSAs directly in ColabFold by selecting "custom_msa" under MSA settings and uploading the pseudoMSA generated by GhostFold.

3. Install GhostFold

pip install ghostfold

For development:

git clone https://github.com/brineylab/ghostfold.git
cd ghostfold
pip install -e ".[dev]"

Hugging Face Authentication

GhostFold uses ProstT5 from the Hugging Face Hub. You may need to configure a Hugging Face access token:

huggingface-cli login

See the Hugging Face documentation for details.


CLI Usage

GhostFold provides a single command-line tool with five subcommands:

Generate pseudoMSAs

ghostfold msa --project-name my_project --fasta-path query.fasta

Options:

  • --config PATH — Custom YAML config (overrides bundled defaults)
  • --recursive — Recursively search directories for FASTA files
  • --coverage FLOAT — Coverage values (repeatable, default: 1.0)
  • --num-runs INT — Independent runs per sequence (default: 1)
  • --evolve-msa — Enable MSA evolution with substitution matrices
  • --mutation-rates JSON — Mutation rates per matrix
  • --sample-percentage FLOAT — Fraction of sequences to evolve (default: 1.0)
  • --plot-msa-coverage — Generate MSA coverage heatmaps
  • --no-coevolution-maps — Skip coevolution map generation

Run structure prediction

ghostfold fold --project-name my_project

Options:

  • --subsample — Enable MSA subsampling (multiple depth levels)
  • --mask-fraction FLOAT — Mask a fraction of MSA residues (0.0-1.0)
  • --num-gpus INT — Override auto-detected GPU count
  • --localcolabfold-dir PATH — Path to localcolabfold pixi checkout (default: ./localcolabfold)
  • --colabfold-env TEXT — Legacy mamba env name for ColabFold fallback (default: colabfold)

Full pipeline (MSA + folding)

ghostfold run --project-name my_project --fasta-path query.fasta

Combines all options from msa and fold commands.

Mask MSA files

ghostfold mask --input-path input.a3m --output-path masked.a3m --mask-fraction 0.15

Calculate Neff scores

ghostfold neff my_project/

Version

ghostfold --version

Python API

GhostFold can also be used as a Python library:

from ghostfold import run_pipeline, mask_a3m_file, calculate_neff, MSA_Mutator
from ghostfold.core.config import load_config

# Load config with optional overrides
config = load_config("my_config.yaml")

# Run MSA generation pipeline
run_pipeline(
    project="my_project",
    fasta_path="query.fasta",
    config=config,
    coverage_list=[1.0],
    evolve_msa=True,
    mutation_rates_str='{"MEGABLAST": 5, "PAM250": 20, "BLOSUM62": 10}',
    sample_percentage=1.0,
    plot_msa=False,
    plot_coevolution=False,
)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostfold-0.1.3.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostfold-0.1.3-py3-none-any.whl (42.0 kB view details)

Uploaded Python 3

File details

Details for the file ghostfold-0.1.3.tar.gz.

File metadata

  • Download URL: ghostfold-0.1.3.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.3.tar.gz
Algorithm Hash digest
SHA256 62c0826a8424b92f16345e0e5d6d571c82fda80f638695d126f7b891102b7bff
MD5 ab80bd460fe6b3a10476df502776cae9
BLAKE2b-256 66d0a597940607d5270bd44462a4ed719a2c5b18e723470f2a4f0466b59359f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.3.tar.gz:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ghostfold-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ghostfold-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 42.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b2dc707a51536d84425708b2ea0678a42898c5bedc3ab7b58798c57e172b3579
MD5 07b6c3b9e9cd4c1f7088f4bc036e2dc5
BLAKE2b-256 195d4476a60061e10e2b3fd694cc43ebb16d34d2b5baf34db1fe1f3267cc6a95

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.3-py3-none-any.whl:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page