Skip to main content

To use Evo2 easily in HPC

Project description

EasyEvo2

Python 3.11+ pypi License: MIT

A Python toolkit for easily using Evo2 models in bioinformatics workflows, particularly in HPC environments.

Description

EasyEvo2 provides a simplified interface to Evo2 foundation models for sequence embedding. It enables biologists and bioinformaticians to efficiently extract embeddings from DNA, RNA, or protein sequences without extensive deep learning expertise. It's specially designed to work well in High-Performance Computing (HPC) environments.

Installation

# Install from PyPI
pip install easyevo2

# Or install from source
git clone https://github.com/ylab-hi/EasyEvo2.git
cd EasyEvo2
pip install .

Usage

Basic Usage

# Embed sequences from a FASTA/FASTQ file using the default model (evo2_7b)
easyevo2 embed input.fa

# Specify a different model and specific layer
easyevo2 embed input.fa --model-type evo2_40b --layer-name blocks.28.mlp.l3

# Specify a different model and multiple layers
easyevo2 embed input.fa --model-type evo2_40b --layer-name blocks.28.mlp.l3 blocks.28.mlp.l2

# Save to a specific output file
easyevo2 embed input.fa --output my_embeddings

The output will be a safetensor file containing the embeddings for each sequence in the input file. We can load the embeddings using the load_tensor function:

from easyevo2.io import load_tensor

embeddings = load_tensor("my_embeddings.mode.layer.safetensors")
print(embeddings)
# Output: {
# "seq1": torch.tensor([...]),
# "seq2": torch.tensor([...]),
# }

Evo2 Memory Estimates

Model GPU Memory Usage Embedding Dimension Batch Size
Evo2 1B Base 1.5 GB 2048 1
Evo2 7B 15 GB 4096 1
Evo2 40B Base >80 GB* -- 1
Evo2 40B >80 GB* -- 1

* Estimated based on scaling from other models

Notes:

  • Longer sequences require proportionally more memory
  • H100 GPUs (80GB) can accommodate the 7B model with batch size 1 but may struggle with the 40B model

Development

This project uses a Makefile to automate common development tasks:

# Show available commands
make help

# Run tests
make test

# Lint code
make lint

# Format code
make format

# Build package
make build

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easyevo2-0.1.16.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easyevo2-0.1.16-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file easyevo2-0.1.16.tar.gz.

File metadata

  • Download URL: easyevo2-0.1.16.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for easyevo2-0.1.16.tar.gz
Algorithm Hash digest
SHA256 e5063d55a47ad77960f47b4c2f7ade571a436a048ea9e4ebb23da3040ec4cb4e
MD5 6e35b843aae9eb6d4019828f7e2d6b59
BLAKE2b-256 00f893692f16a22c6b0f0fa99d85d51c727ae5551bdefcf7751bd2af9d29d069

See more details on using hashes here.

File details

Details for the file easyevo2-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: easyevo2-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for easyevo2-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 c6df0fbeaadb0d8c62795173ac112de29b3c1dffffcb0aa2f38a9bcb27457eb1
MD5 d4c9404b4a8175e3d0c391ae9a7d8ab7
BLAKE2b-256 24f51a6f2e2563c1838cce0cead24a4595b0c81ee44479fcf63627df4e7c875e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page