Genome analysis toolkit powered by Evo

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

EvoSeq

EvoSeq is a small Colab-friendly toolkit for preparing paired reference/mutant FASTA files and scoring variants with Evo2.

It is designed for the common research workflow where positive datasets have a manifest.tsv, negative datasets may only have paired FASTA files, and the same Evo2 model should stay loaded once per Colab runtime.

Install

For local testing from this repository:

pip install -e .

For Evo2 scoring dependencies:

pip install -e ".[evo2]"

In Google Colab, Evo2 often needs a runtime-specific install. Use this before scoring:

pip uninstall -y torchvision
pip install -q torch==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -q flash-attn==2.8.0.post2 --no-build-isolation
pip install -q evo2
pip install -e .

After a GitHub Release is tagged, users can install a specific version directly:

pip install "git+https://github.com/mizomizo1/EvoSeq.git@v0.1.0"

For Evo2 scoring in Colab, install Evo2 and GPU dependencies in the runtime that matches your model. The preprocessing step only needs the base dependencies.

Debug / Test

Run the local workflow tests without Evo2, torch, or flash-attn:

python -m unittest discover -s tests -v

These tests cover preprocessing, folder discovery, score-table export with a fake scorer, and the missing Evo2 dependency message. Real Evo2 scoring still requires a Colab GPU runtime with torch, flash-attn, and evo2 installed.

Quick Start: Preprocessing Files

Put files anywhere, for example in test/, and pass the files directly:

from evoseq.preprocess import preprocess_files

evo_df, paths = preprocess_files(
    reference_fasta_path="test/evo2_reference.fasta",
    mutant_fasta_path="test/evo2_mutant.fasta",
    manifest_path="auto",
)

By default, outputs are written next to the input files: test/evoseq_preprocess_output/.

You can also be explicit:

evo_df, paths = preprocess_files(
    reference_fasta_path="test/evo2_reference.fasta",
    mutant_fasta_path="test/evo2_mutant.fasta",
    output_dir="outputs/preprocessing",
)

Outputs include:

evo2_pairs.tsv: one row per variant with ref_seq and mut_seq
evo2_reference.fa
evo2_mutant.fa
evo2_all.fa
preprocessing_report.tsv

manifest.tsv is optional. When present, metadata are merged by record_id. When absent, metadata are inferred from FASTA IDs when possible.

Quick Start: Preprocessing a Folder

If your folder contains paired FASTA files, EvoSeq can discover them:

from evoseq.preprocess import preprocess_folder

evo_df, paths = preprocess_folder("test")

Quick Start: Evo2 Scoring

from evoseq.scoring import score_pairs_file

result_df, result_paths = score_pairs_file(
    pairs_path="test/evoseq_preprocess_output/evo2_pairs.tsv",
    model_name="evo2_7b",
    batch_size=8,
)

By default, outputs are written next to the pair table: test/evoseq_preprocess_output/evoseq_scoring_output/.

Use output_dir="outputs/scoring" if you want a project-level result folder.

evo2_variant_scores_unique.tsv
evo2_variant_scores_manifest.tsv when a manifest is available
environment_info.tsv
scoring_report.tsv

Reference sequences are scored once per unique sequence and reused. This is useful when many variants share the same reference window.

Model Handling

EvoSeq caches the loaded Evo2 model inside the Python process:

from evoseq.scoring import Evo2Scorer

scorer = Evo2Scorer(model_name="evo2_7b", device="cuda:0")
scores = scorer.score_sequences(["ACGTACGT"])

Calling another scoring function with the same model reuses it. Attempting to load a different Evo2 model in the same runtime raises an explicit error by default, because loading multiple large models often exhausts Colab GPU memory. Restart the runtime when switching from 7B to 20B.

Common model names:

evo2_7b
evo2_7b_base
evo2_20b

For local model weights:

score_evo2_pairs(
    base_dir=".",
    model_name="evo2_20b",
    local_path="/content/drive/MyDrive/Models/evo2_20b.pt",
)

TOML Config

Copy evoseq.example.toml, edit the input paths/model, and run:

from evoseq import run_from_config

outputs = run_from_config("evoseq.example.toml")

or:

evoseq-run evoseq.example.toml

Per-Base Log-Probabilities

from evoseq.scoring import export_perbase_logprobs

path = export_perbase_logprobs(
    fasta_path="test/representative_perbase.fasta",
    model_name="evo2_7b",
    center=4096,
    half_window=320,
)

By default, this writes test/evoseq_perbase_output/perbase_logprobs.tsv.

Reproducibility

EvoSeq writes small TSV reports for methods sections and reruns:

input paths and output paths
number of variants and unique reference sequences
model name, batch size, device, elapsed time
Python, PyTorch, CUDA, GPU, NumPy, pandas, Biopython, and Evo2 versions

These files are meant to be saved with each analysis directory.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mizomizo1

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.3

Jun 23, 2026

0.3.0

Jun 22, 2026

This version

0.1.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evoseq-0.1.0.tar.gz (20.6 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evoseq-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file evoseq-0.1.0.tar.gz.

File metadata

Download URL: evoseq-0.1.0.tar.gz
Upload date: Jun 22, 2026
Size: 20.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evoseq-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0272af825ce955f28fc7667c4080a46e9b729872692c0f5dfd11ba4941ca17c8`
MD5	`9f9336e812dd247ceceab3cc9eef9860`
BLAKE2b-256	`efcce11132b0653c7dd9b73d1045aca9c261ce18824d8f9b70c085cff4ef9314`

See more details on using hashes here.

Provenance

The following attestation bundles were made for evoseq-0.1.0.tar.gz:

Publisher: python-publish.yml on mizomizo1/EvoSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: evoseq-0.1.0.tar.gz
- Subject digest: 0272af825ce955f28fc7667c4080a46e9b729872692c0f5dfd11ba4941ca17c8
- Sigstore transparency entry: 1917756843
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: mizomizo1/EvoSeq@65926f1d0a6848d70d759d298b377614783f9f29
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/mizomizo1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@65926f1d0a6848d70d759d298b377614783f9f29
- Trigger Event: release

File details

Details for the file evoseq-0.1.0-py3-none-any.whl.

File metadata

Download URL: evoseq-0.1.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evoseq-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97d9701ddc01d7ae93bedd9032e0c3f5367af0b47bd473f1fc7808210824ff49`
MD5	`8b6c2e589d2796e13c441a48968d6bc0`
BLAKE2b-256	`9ab4d5af905f3f356100dc321d1de70033d373ba9b719531f8c1e4e09be323c0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for evoseq-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on mizomizo1/EvoSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: evoseq-0.1.0-py3-none-any.whl
- Subject digest: 97d9701ddc01d7ae93bedd9032e0c3f5367af0b47bd473f1fc7808210824ff49
- Sigstore transparency entry: 1917757583
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: mizomizo1/EvoSeq@65926f1d0a6848d70d759d298b377614783f9f29
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/mizomizo1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@65926f1d0a6848d70d759d298b377614783f9f29
- Trigger Event: release

evoseq 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

EvoSeq

Install

Debug / Test

Quick Start: Preprocessing Files

Quick Start: Preprocessing a Folder

Quick Start: Evo2 Scoring

Model Handling

TOML Config

Per-Base Log-Probabilities

Reproducibility

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance