Genome analysis toolkit powered by Evo
Project description
EvoSeq
It is designed for the common research workflow where positive datasets have a
manifest.tsv, negative datasets may only have paired FASTA files, and the same
Evo2 model should stay loaded once per Colab runtime.
Install
For local testing from this repository:
pip install -e .
For Evo2 scoring dependencies:
pip install -e ".[evo2]"
In Google Colab, Evo2 often needs a runtime-specific install. Use this before scoring:
pip uninstall -y torchvision
pip install -q torch==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -q flash-attn==2.8.0.post2 --no-build-isolation
pip install -q evo2
pip install -e .
After a GitHub Release is tagged, users can install a specific version directly:
pip install "git+https://github.com/mizomizo1/EvoSeq.git@v0.1.0"
For Evo2 scoring in Colab, install Evo2 and GPU dependencies in the runtime that matches your model. The preprocessing step only needs the base dependencies.
Debug / Test
Run the local workflow tests without Evo2, torch, or flash-attn:
python -m unittest discover -s tests -v
These tests cover preprocessing, folder discovery, score-table export with a
fake scorer, and the missing Evo2 dependency message. Real Evo2 scoring still
requires a Colab GPU runtime with torch, flash-attn, and evo2 installed.
Quick Start: Preprocessing Files
Put files anywhere, for example in test/, and pass the files directly:
from evoseq.preprocess import preprocess_files
evo_df, paths = preprocess_files(
reference_fasta_path="test/evo2_reference.fasta",
mutant_fasta_path="test/evo2_mutant.fasta",
manifest_path="auto",
)
By default, outputs are written next to the input files:
test/evoseq_preprocess_output/.
You can also be explicit:
evo_df, paths = preprocess_files(
reference_fasta_path="test/evo2_reference.fasta",
mutant_fasta_path="test/evo2_mutant.fasta",
output_dir="outputs/preprocessing",
)
Outputs include:
evo2_pairs.tsv: one row per variant withref_seqandmut_seqevo2_reference.faevo2_mutant.faevo2_all.fapreprocessing_report.tsv
manifest.tsv is optional. When present, metadata are merged by record_id.
When absent, metadata are inferred from FASTA IDs when possible.
Quick Start: Preprocessing a Folder
If your folder contains paired FASTA files, EvoSeq can discover them:
from evoseq.preprocess import preprocess_folder
evo_df, paths = preprocess_folder("test")
Quick Start: Evo2 Scoring
from evoseq.scoring import score_pairs_file
result_df, result_paths = score_pairs_file(
pairs_path="test/evoseq_preprocess_output/evo2_pairs.tsv",
model_name="evo2_7b",
batch_size=8,
)
By default, outputs are written next to the pair table:
test/evoseq_preprocess_output/evoseq_scoring_output/.
Use output_dir="outputs/scoring" if you want a project-level result folder.
evo2_variant_scores_unique.tsvevo2_variant_scores_manifest.tsvwhen a manifest is availableenvironment_info.tsvscoring_report.tsv
Reference sequences are scored once per unique sequence and reused. This is useful when many variants share the same reference window.
Model Handling
EvoSeq caches the loaded Evo2 model inside the Python process:
from evoseq.scoring import Evo2Scorer
scorer = Evo2Scorer(model_name="evo2_7b", device="cuda:0")
scores = scorer.score_sequences(["ACGTACGT"])
Calling another scoring function with the same model reuses it. Attempting to load a different Evo2 model in the same runtime raises an explicit error by default, because loading multiple large models often exhausts Colab GPU memory. Restart the runtime when switching from 7B to 20B.
Common model names:
evo2_7bevo2_7b_baseevo2_20b
For local model weights:
score_evo2_pairs(
base_dir=".",
model_name="evo2_20b",
local_path="/content/drive/MyDrive/Models/evo2_20b.pt",
)
TOML Config
Copy evoseq.example.toml, edit the input paths/model, and run:
from evoseq import run_from_config
outputs = run_from_config("evoseq.example.toml")
or:
evoseq-run evoseq.example.toml
Per-Base Log-Probabilities
from evoseq.scoring import export_perbase_logprobs
path = export_perbase_logprobs(
fasta_path="test/representative_perbase.fasta",
model_name="evo2_7b",
center=4096,
half_window=320,
)
By default, this writes test/evoseq_perbase_output/perbase_logprobs.tsv.
Reproducibility
EvoSeq writes small TSV reports for methods sections and reruns:
- input paths and output paths
- number of variants and unique reference sequences
- model name, batch size, device, elapsed time
- Python, PyTorch, CUDA, GPU, NumPy, pandas, Biopython, and Evo2 versions
These files are meant to be saved with each analysis directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evoseq-0.3.0.tar.gz.
File metadata
- Download URL: evoseq-0.3.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5252adf110c1a89158950dc3ad630b590de946ee5b27c199e0be4677ec01084a
|
|
| MD5 |
52f780cdaddc48f6b6278c24fe8024dc
|
|
| BLAKE2b-256 |
aea600b54614196e2ddc54827d4eeb9e6b9fd2c0bf991b72fe3b122ccc092caf
|
Provenance
The following attestation bundles were made for evoseq-0.3.0.tar.gz:
Publisher:
python-publish.yml on mizomizo1/EvoSeq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
evoseq-0.3.0.tar.gz -
Subject digest:
5252adf110c1a89158950dc3ad630b590de946ee5b27c199e0be4677ec01084a - Sigstore transparency entry: 1917757232
- Sigstore integration time:
-
Permalink:
mizomizo1/EvoSeq@65926f1d0a6848d70d759d298b377614783f9f29 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/mizomizo1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@65926f1d0a6848d70d759d298b377614783f9f29 -
Trigger Event:
release
-
Statement type:
File details
Details for the file evoseq-0.3.0-py3-none-any.whl.
File metadata
- Download URL: evoseq-0.3.0-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
360c237a6e78f73875aa75ed470ecd65217d2cb02e53969c6638ce6501f40d85
|
|
| MD5 |
e9ffda0fc7a9a1d6cd153d2a5ba53393
|
|
| BLAKE2b-256 |
73b8e0995fb538c0b50f27490b4b296d9b90673dd841fa9d434cded9520ac472
|
Provenance
The following attestation bundles were made for evoseq-0.3.0-py3-none-any.whl:
Publisher:
python-publish.yml on mizomizo1/EvoSeq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
evoseq-0.3.0-py3-none-any.whl -
Subject digest:
360c237a6e78f73875aa75ed470ecd65217d2cb02e53969c6638ce6501f40d85 - Sigstore transparency entry: 1917757430
- Sigstore integration time:
-
Permalink:
mizomizo1/EvoSeq@65926f1d0a6848d70d759d298b377614783f9f29 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/mizomizo1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@65926f1d0a6848d70d759d298b377614783f9f29 -
Trigger Event:
release
-
Statement type: