Skip to main content

The largest open-source library to develop plasmid foundation models and generate novel plasmids using machine learning.

Project description

Plasmid.ai

Plasmid.ai is the largest open-source toolkit for developing plasmid foundation models. Created by the iGEM Toronto team, this project aims to revolutionize the field of synthetic biology by leveraging machine learning to generate novel plasmids.

Table of Contents

Overview

Plasmid.ai provides a comprehensive set of tools and models for the analysis, design, and generation of plasmids. By utilizing state-of-the-art machine learning techniques, this project enables researchers and synthetic biologists to explore new possibilities in plasmid engineering and design. For more information about our team and project, visit our iGEM Team Wiki.

Features

  • Plasmid Sequence Tokenization: Utilizes custom tokenizers tailored for encoding plasmid sequences.
  • Data Preprocessing Pipelines: Includes robust modules for loading, preprocessing, and visualizing plasmid data.
  • Advanced Sampling Techniques: Provides cutting-edge sampling functions for generating novel plasmids based on trained models.
  • Lightning Integration: Seamlessly integrates with PyTorch Lightning for distributed training and model scalability.
  • Custom Model Components: Features specialized optimizers and callbacks for enhanced model performance.

Installation

Using pip

To install the Plasmid.ai package, run the following command:

pip install --upgrade pip setuptools wheel
pip install plasmidai

Using git

For development or to access the latest features, you can clone the repository:

git clone https://github.com/igem-toronto/plasmidai.git
cd plasmid-ai
pip install --upgrade pip setuptools wheel
pip install -e .

You can use conda or poetry to manage dependencies.

Usage

Here's a basic example of how to use Plasmid.ai:

import plasmidai as pai

# Training
python -m pai.experimental.train \
    --backend.matmul_precision=medium \
    --data.batch_size=64 --data.num_workers=4 \
    --lit.fused_add_norm=true --lit.scheduler_span=50000 --lit.top_p=0.9 \
    --trainer.accelerator=gpu  --trainer.devices=2 --trainer.precision=bf16-mixed \
    --trainer.wandb=true --trainer.wandb_dir="${REPO_ROOT}/logs" \
    --trainer.checkpoint=true --trainer.checkpoint_dir="${REPO_ROOT}/checkpoints/last.ckpt" \
    --trainer.progress_bar=true \
    --trainer.max_epochs=175

# Generation
python -m pai.experimental.sample \
    --backend.matmul_precision=medium \
    --sample.checkpoint_path="${REPO_ROOT}/checkpoints/last.ckpt" \
    --sample.precision=bfloat16 --sample.num_samples=10000 --sample.top_p=0.9 \
    --sample.wandb_dir="${REPO_ROOT}/logs"

Checkout the slurm directory for more examples!

Project Structure

The Plasmid.ai project is organized into several key components:

  • data/: Contains datasets and scripts for data processing.
    • scripts/: Helper scripts for data manipulation.
    • tokenizers/: Custom tokenizers for plasmid sequences.
  • datasets/: Modules for loading and preprocessing plasmid datasets.
  • experimental/: Cutting-edge features and models in development.
    • callbacks.py: Custom callbacks for model training.
    • lit.py: Lightning modules for PyTorch Lightning integration.
    • optimizers.py: Custom optimizers for training plasmid models.
    • sample.py: Functions for sampling from trained models.
    • train.py: Training pipelines for plasmid models.
  • utils.py: Utility functions used across the project.
  • paths.py: Path configurations for the project.

Authors and acknowledgment

This project is developed by the iGEM Toronto 2024 team. We would like to extend our gratitude to all the team members and contributors who have made this project possible. Special thanks to our mentors and collaborators for their guidance and support.

Contributing

We welcome contributions from the community! Please open an issue first.

License

We use the Apache-2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plasmidai-1.2.0.tar.gz (172.6 kB view details)

Uploaded Source

Built Distribution

plasmidai-1.2.0-py3-none-any.whl (180.9 kB view details)

Uploaded Python 3

File details

Details for the file plasmidai-1.2.0.tar.gz.

File metadata

  • Download URL: plasmidai-1.2.0.tar.gz
  • Upload date:
  • Size: 172.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.2

File hashes

Hashes for plasmidai-1.2.0.tar.gz
Algorithm Hash digest
SHA256 08b601cb4c0d27ee4bb2c93d66140642b674f3565de5d3fd4d41e3a0977c9a2c
MD5 352a877601701265a78dbe33c19b57cb
BLAKE2b-256 53f9965b816de5b9627e7da30a7ed0eb47e72b95e9875085e238d8a342a60ae5

See more details on using hashes here.

File details

Details for the file plasmidai-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: plasmidai-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 180.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.2

File hashes

Hashes for plasmidai-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f30ae9d89cc77391c9113f552b06856c23c6de2c1a8227d8c945e827ae68a3e7
MD5 f4a9e64810895defcc4edc0c0d196238
BLAKE2b-256 65f2659c9971397b8649287d85299fe10d4443d45005ae4a4b4f0dda82b64c4c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page