DNA foundation modeling from molecular to genome scale.

Project description

Evo: DNA foundation modeling from molecular to genome scale

We have developed a new model called Evo 2 that extends the Evo 1 model and its ideas to all domains of life. Please see https://github.com/arcinstitute/evo2 for more details.

Evo

Evo is a biological foundation model capable of long-context modeling and design. Evo uses the StripedHyena architecture to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on OpenGenome, a prokaryotic whole-genome dataset containing ~300 billion tokens.

We describe Evo in the paper “Sequence modeling and design from molecular to genome scale with Evo”.

We describe Evo 1.5 in the paper “Semantic design of functional de novo genes from a genomic language model”. We used the Evo 1.5 model to generate SynGenome, the first AI-generated genomics database containing over 100 billion base pairs of synthetic DNA sequences.

We provide the following model checkpoints:

Checkpoint Name	Description
`evo-1.5-8k-base`	A model pretrained with 8,192 context obtained by extending the pretraining of `evo-1-8k-base` to process 50% more training data.
`evo-1-8k-base`	A model pretrained with 8,192 context. We use this model as the base model for molecular-scale finetuning tasks.
`evo-1-131k-base`	A model pretrained with 131,072 context using `evo-1-8k-base` as the base model. We use this model to reason about and generate sequences at the genome scale.
`evo-1-8k-crispr`	A model finetuned using `evo-1-8k-base` as the base model to generate CRISPR-Cas systems.
`evo-1-8k-transposon`	A model finetuned using `evo-1-8k-base` as the base model to generate IS200/IS605 transposons.

News

December 17, 2024: We have found and fixed a bug in the code for Evo model inference affecting package versions from Nov 15-Dec 16, 2024, which has been corrected in release versions 0.3 and above. If you installed the package during this timeframe, please upgrade to correct the issue.

Setup
- Requirements
- Installation
Usage
HuggingFace
Together API
colab
Playground wrapper
Dataset
Citation

Setup

Requirements

Evo is based on StripedHyena.

Evo uses FlashAttention-2, which may not work on all GPU architectures. Please consult the FlashAttention GitHub repository for the current list of supported GPUs. Currently, Evo supports FlashAttention versions <= 2.7.4.post0.

Make sure to install the correct PyTorch version on your system. PyTorch versions >= 2.7.0 and < 2.8.0a0 are supported by FlashAttention 2.7.4.

We recommend using a fresh conda environment to install these prerequisites. Below is an example of how to install these:

conda install -c nvidia cuda-nvcc cuda-cudart-dev
conda install -c conda-forge flash-attn=2.7.4

Installation

You can install Evo using pip

pip install evo-model

or directly from the GitHub source

git clone https://github.com/evo-design/evo.git
cd evo/
pip install .

If you are not using the conda-forge FlashAttention installation shown above, which will automatically install PyTorch, we recommend that you install the PyTorch library before installing all other dependencies (due to dependency issues of the flash-attn library; see, e.g., this issue).

One of our example scripts, demonstrating how to go from generating sequences with Evo to folding proteins (scripts/generation_to_folding.py), further requires the installation of prodigal. We have created an environment.yml file for this:

conda env create -f environment.yml
conda activate evo-design

Troubleshooting

If you are using Numpy versions > 2.2, you may encounter the following error:

ValueError: The binary mode of fromstring is removed, use frombuffer instead

To fix this, modify tokenizer.py at line 157 in your local installation of StripedHyena as shown:

# Replace this:
return list(np.fromstring(text, dtype=np.uint8))

# With this:
return list(np.frombuffer(text.encode(), dtype=np.uint8))

Usage

Below is an example of how to download Evo and use it locally through the Python API.

from evo import Evo
import torch

device = 'cuda:0'

evo_model = Evo('evo-1-131k-base')
model, tokenizer = evo_model.model, evo_model.tokenizer
model.to(device)
model.eval()

sequence = 'ACGT'
input_ids = torch.tensor(
    tokenizer.tokenize(sequence),
    dtype=torch.int,
).to(device).unsqueeze(0)

with torch.no_grad():
    logits, _ = model(input_ids) # (batch, length, vocab)

print('Logits: ', logits)
print('Shape (batch, length, vocab): ', logits.shape)

An example of batched inference can be found in scripts/example_inference.py.

We provide an example script for how to prompt the model and sample a set of sequences given the prompt.

python -m scripts.generate \
    --model-name 'evo-1-131k-base' \
    --prompt ACGT \
    --n-samples 10 \
    --n-tokens 100 \
    --temperature 1. \
    --top-k 4 \
    --device cuda:0

We also provide an example script for using the model to score the log-likelihoods of a set of sequences.

python -m scripts.score \
    --input-fasta examples/example_seqs.fasta \
    --output-tsv scores.tsv \
    --model-name 'evo-1-131k-base' \
    --device cuda:0

HuggingFace

Evo is integrated with HuggingFace.

from transformers import AutoConfig, AutoModelForCausalLM

model_name = 'togethercomputer/evo-1-8k-base'

model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, revision="1.1_fix")
model_config.use_cache = True

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    config=model_config,
    trust_remote_code=True,
    revision="1.1_fix"
)

Together API

Evo is available through Together AI with a web UI, where you can generate DNA sequences with a chat-like interface.

For more detailed or batch workflows, you can call the Together API with a simple example below.

import openai
import os

# Fill in your API information here.
client = openai.OpenAI(
  api_key=TOGETHER_API_KEY,
  base_url='https://api.together.xyz',
)

chat_completion = client.chat.completions.create(
  messages=[
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "ACGT", # Prompt the model with a sequence.
    }
  ],
  model="togethercomputer/evo-1-131k-base",
  max_tokens=128, # Sample some number of new tokens.
  logprobs=True
)
print(
    chat_completion.choices[0].logprobs.token_logprobs,
    chat_completion.choices[0].message.content
)

Dataset

The OpenGenome dataset for pretraining Evo is available at Hugging Face datasets.

Citation

Please cite the following publication when referencing Evo.

@article{nguyen2024sequence,
   author = {Eric Nguyen and Michael Poli and Matthew G. Durrant and Brian Kang and Dhruva Katrekar and David B. Li and Liam J. Bartie and Armin W. Thomas and Samuel H. King and Garyk Brixi and Jeremy Sullivan and Madelena Y. Ng and Ashley Lewis and Aaron Lou and Stefano Ermon and Stephen A. Baccus and Tina Hernandez-Boussard and Christopher Ré and Patrick D. Hsu and Brian L. Hie },
   title = {Sequence modeling and design from molecular to genome scale with Evo},
   journal = {Science},
   volume = {386},
   number = {6723},
   pages = {eado9336},
   year = {2024},
   doi = {10.1126/science.ado9336},
   URL = {https://www.science.org/doi/abs/10.1126/science.ado9336},
}

Please cite the following publication when referencing Evo 1.5.

@article{merchant2025semantic,
    author = {Merchant, Aditi T and King, Samuel H and Nguyen, Eric and Hie, Brian L},
    title = {Semantic design of functional de novo genes from a genomic language model},
    year = {2025},
    doi = {10.1038/s41586-025-09749-7},
    URL = {https://www.nature.com/articles/s41586-025-09749-7},
    journal = {Nature}
}

Project details

Release history Release notifications | RSS feed

This version

0.5

Feb 16, 2026

0.4

Dec 18, 2024

0.3

Dec 17, 2024

0.2.1

Nov 15, 2024

0.1.2

Apr 30, 2024

0.1.1

Feb 27, 2024

0.1.0

Feb 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evo_model-0.5.tar.gz (24.0 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evo_model-0.5-py3-none-any.whl (24.5 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file evo_model-0.5.tar.gz.

File metadata

Download URL: evo_model-0.5.tar.gz
Upload date: Feb 16, 2026
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for evo_model-0.5.tar.gz
Algorithm	Hash digest
SHA256	`f1e0260fb5768308e1598eaa1ad53a3bcebb7c39f613f814d132e43920c6222b`
MD5	`64d37ead0bf44719ab470f0288b19432`
BLAKE2b-256	`a1f34178973eff29732997da778503d22e8b9e36fcd4db39a94a4da651efc007`

See more details on using hashes here.

File details

Details for the file evo_model-0.5-py3-none-any.whl.

File metadata

Download URL: evo_model-0.5-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 24.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for evo_model-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8699de8a051af0ca38836506ba189f15541bc9fcabfbd68687d46b86147711f8`
MD5	`084dcb5ad1233fd8a21cfe27f155b750`
BLAKE2b-256	`fd1fda0ff8a1ce713c2149322e0496cfb4b8189eeb1cbb8180211136d35b587e`

See more details on using hashes here.

evo-model 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Evo: DNA foundation modeling from molecular to genome scale

News

Contents

Setup

Requirements

Installation

Troubleshooting

Usage

HuggingFace

Together API

Dataset

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes