Skip to main content

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs

Project description

GhostFold

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs


Overview

GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.


Installation

1. Install PyTorch with CUDA

GhostFold requires PyTorch with CUDA support. Install the appropriate version for your system before installing GhostFold:

# Example for CUDA 12.1 (adjust for your CUDA version)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Refer to the PyTorch installation guide for platform-specific instructions.

2. Install GhostFold

pip install ghostfold

For development:

git clone https://github.com/brineylab/ghostfold.git
cd ghostfold
pip install -e ".[dev]"

Hugging Face Authentication

GhostFold uses ProstT5 from the Hugging Face Hub. You may need to configure a Hugging Face access token:

huggingface-cli login

See the Hugging Face documentation for details.


CLI Usage

GhostFold provides a single command-line tool with five subcommands:

Generate pseudoMSAs

ghostfold msa --project-name my_project --fasta-file query.fasta

Options:

  • --config PATH — Custom YAML config (overrides bundled defaults)
  • --coverage FLOAT — Coverage values (repeatable, default: 1.0)
  • --num-runs INT — Independent runs per sequence (default: 1)
  • --evolve-msa — Enable MSA evolution with substitution matrices
  • --mutation-rates JSON — Mutation rates per matrix
  • --sample-percentage FLOAT — Fraction of sequences to evolve (default: 1.0)
  • --plot-msa-coverage — Generate MSA coverage heatmaps
  • --no-coevolution-maps — Skip coevolution map generation

Run structure prediction

ghostfold fold --project-name my_project

Options:

  • --subsample — Enable MSA subsampling (multiple depth levels)
  • --mask-fraction FLOAT — Mask a fraction of MSA residues (0.0-1.0)
  • --num-gpus INT — Override auto-detected GPU count

Full pipeline (MSA + folding)

ghostfold run --project-name my_project --fasta-file query.fasta

Combines all options from msa and fold commands.

Mask MSA files

ghostfold mask --input-path input.a3m --output-path masked.a3m --mask-fraction 0.15

Calculate Neff scores

ghostfold neff my_project/

Version

ghostfold --version

Python API

GhostFold can also be used as a Python library:

from ghostfold import run_pipeline, mask_a3m_file, calculate_neff, MSA_Mutator
from ghostfold.core.config import load_config

# Load config with optional overrides
config = load_config("my_config.yaml")

# Run MSA generation pipeline
run_pipeline(
    project="my_project",
    query_fasta="query.fasta",
    config=config,
    coverage_list=[1.0],
    evolve_msa=True,
    mutation_rates_str='{"MEGABLAST": 5, "PAM250": 20, "BLOSUM62": 10}',
    sample_percentage=1.0,
    plot_msa=False,
    plot_coevolution=False,
)

Local ColabFold Setup

To enable local structure prediction with ColabFold:

chmod +x scripts/install_localcolabfold.sh
./scripts/install_localcolabfold.sh

This creates a separate colabfold conda environment with all required dependencies and downloads AlphaFold2 model weights.

If you prefer cloud-based prediction, you can use the generated pseudoMSAs directly in ColabFold by selecting "custom_msa" under MSA settings.


References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostfold-0.1.0.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostfold-0.1.0-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file ghostfold-0.1.0.tar.gz.

File metadata

  • Download URL: ghostfold-0.1.0.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.0.tar.gz
Algorithm Hash digest
SHA256 71798ec39efb689b6bbd3da4ecb44890f1cb486f3fc16144df8d72a1baf6390c
MD5 398332b749f6d027465ea58c51819cd4
BLAKE2b-256 76181a7b327a2a556f09b41c386daf828264be0c3e3ddb1f517878fdb4bb044c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.0.tar.gz:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ghostfold-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ghostfold-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghostfold-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 833cadfe9ed2052609576d77d76916bd3c0aef41532b7dcc867b3385bbd4ce72
MD5 ea5bcf59ddc928bcf11785fed34e666e
BLAKE2b-256 7c0cd3098597ba97edd4dc1453f162c936c4a83fb0c45b10005e8be11f92a4f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghostfold-0.1.0-py3-none-any.whl:

Publisher: python-publish.yaml on brineylab/ghostfold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page