Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs
Project description
GhostFold
Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs
Overview
GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.
Installation
1. Install PyTorch with CUDA
GhostFold requires PyTorch with CUDA support. Install the appropriate version for your system before installing GhostFold:
# Example for CUDA 12.1 (adjust for your CUDA version)
pip install torch --index-url https://download.pytorch.org/whl/cu121
Refer to the PyTorch installation guide for platform-specific instructions.
2. Install GhostFold
pip install ghostfold
For development:
git clone https://github.com/brineylab/ghostfold.git
cd ghostfold
pip install -e ".[dev]"
Hugging Face Authentication
GhostFold uses ProstT5 from the Hugging Face Hub. You may need to configure a Hugging Face access token:
huggingface-cli login
See the Hugging Face documentation for details.
CLI Usage
GhostFold provides a single command-line tool with five subcommands:
Generate pseudoMSAs
ghostfold msa --project-name my_project --fasta-file query.fasta
Options:
--config PATH— Custom YAML config (overrides bundled defaults)--coverage FLOAT— Coverage values (repeatable, default: 1.0)--num-runs INT— Independent runs per sequence (default: 1)--evolve-msa— Enable MSA evolution with substitution matrices--mutation-rates JSON— Mutation rates per matrix--sample-percentage FLOAT— Fraction of sequences to evolve (default: 1.0)--plot-msa-coverage— Generate MSA coverage heatmaps--no-coevolution-maps— Skip coevolution map generation
Run structure prediction
ghostfold fold --project-name my_project
Options:
--subsample— Enable MSA subsampling (multiple depth levels)--mask-fraction FLOAT— Mask a fraction of MSA residues (0.0-1.0)--num-gpus INT— Override auto-detected GPU count
Full pipeline (MSA + folding)
ghostfold run --project-name my_project --fasta-file query.fasta
Combines all options from msa and fold commands.
Mask MSA files
ghostfold mask --input-path input.a3m --output-path masked.a3m --mask-fraction 0.15
Calculate Neff scores
ghostfold neff my_project/
Version
ghostfold --version
Python API
GhostFold can also be used as a Python library:
from ghostfold import run_pipeline, mask_a3m_file, calculate_neff, MSA_Mutator
from ghostfold.core.config import load_config
# Load config with optional overrides
config = load_config("my_config.yaml")
# Run MSA generation pipeline
run_pipeline(
project="my_project",
query_fasta="query.fasta",
config=config,
coverage_list=[1.0],
evolve_msa=True,
mutation_rates_str='{"MEGABLAST": 5, "PAM250": 20, "BLOSUM62": 10}',
sample_percentage=1.0,
plot_msa=False,
plot_coevolution=False,
)
Local ColabFold Setup
To enable local structure prediction with ColabFold:
chmod +x scripts/install_localcolabfold.sh
./scripts/install_localcolabfold.sh
This creates a separate colabfold conda environment with all required dependencies and downloads AlphaFold2 model weights.
If you prefer cloud-based prediction, you can use the generated pseudoMSAs directly in ColabFold by selecting "custom_msa" under MSA settings.
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghostfold-0.1.0.tar.gz.
File metadata
- Download URL: ghostfold-0.1.0.tar.gz
- Upload date:
- Size: 31.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71798ec39efb689b6bbd3da4ecb44890f1cb486f3fc16144df8d72a1baf6390c
|
|
| MD5 |
398332b749f6d027465ea58c51819cd4
|
|
| BLAKE2b-256 |
76181a7b327a2a556f09b41c386daf828264be0c3e3ddb1f517878fdb4bb044c
|
Provenance
The following attestation bundles were made for ghostfold-0.1.0.tar.gz:
Publisher:
python-publish.yaml on brineylab/ghostfold
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ghostfold-0.1.0.tar.gz -
Subject digest:
71798ec39efb689b6bbd3da4ecb44890f1cb486f3fc16144df8d72a1baf6390c - Sigstore transparency entry: 946977164
- Sigstore integration time:
-
Permalink:
brineylab/ghostfold@1a62dd12e77e8109cbf9f711945852f9cbea66cf -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/brineylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yaml@1a62dd12e77e8109cbf9f711945852f9cbea66cf -
Trigger Event:
release
-
Statement type:
File details
Details for the file ghostfold-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ghostfold-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
833cadfe9ed2052609576d77d76916bd3c0aef41532b7dcc867b3385bbd4ce72
|
|
| MD5 |
ea5bcf59ddc928bcf11785fed34e666e
|
|
| BLAKE2b-256 |
7c0cd3098597ba97edd4dc1453f162c936c4a83fb0c45b10005e8be11f92a4f8
|
Provenance
The following attestation bundles were made for ghostfold-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yaml on brineylab/ghostfold
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ghostfold-0.1.0-py3-none-any.whl -
Subject digest:
833cadfe9ed2052609576d77d76916bd3c0aef41532b7dcc867b3385bbd4ce72 - Sigstore transparency entry: 946977167
- Sigstore integration time:
-
Permalink:
brineylab/ghostfold@1a62dd12e77e8109cbf9f711945852f9cbea66cf -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/brineylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yaml@1a62dd12e77e8109cbf9f711945852f9cbea66cf -
Trigger Event:
release
-
Statement type: