No project description provided

These details have not been verified by PyPI

Project links

Project description

Variant Sequencing with Nanopore (LevSeq)

LevSeq provides a streamlined pipeline for sequencing and analyzing genetic variants using Oxford Nanopore technology. In directed evolution experiments, LevSeq enables sequencing of every variant, enhancing data insight and creating datasets suitable for AI/ML methods. Sequence variants can be generated within a day at an extremely low cost.

Figure 1: LevSeq Workflow Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore technology. This diagram illustrates the key steps in the process, from sample preparation to data analysis and visualization.

Important: Barcode Improvements and LevSeq 2.0 Development

We have identified and resolved demultiplexing challenges in the original barcode set. Version 1.4 introduced alignment-aware variant calling to address these issues and significantly improve accuracy.

We are actively developing LevSeq 2.0 in collaboration with DTU and AITHYRA to fundamentally redesign the barcode system. The updated approach includes:

Enhanced barcode design: New barcodes will be strain-aware and sequence-aware, generated using an advanced barcode design tool
Reversed workflow architecture: LevSeq 2.0 will perform alignment first, then demultiplexing (rather than the current demultiplexing-first approach), resolving issues with forward and reverse read handling
Improved accuracy: These changes will provide more robust demultiplexing and variant calling across diverse experimental conditions

If you are planning to order barcoded primers now, or need detailed help with troubleshooting or barcode design, please reach out at lyming2021@gmail.com.

Notes

LevSeq was designed for epPCR and SSM experiments. We are also extending it to support additional enzyme engineering designs. Current features under development include:

Insertion handling (see version 4.1.3). Thanks to Brian Zhong for contributions to this section.
Gene calling for experiments with different genes, using the --oligopool flag.

If you notice issues with new features or have adapted LevSeq for your own use case, community contributions are welcome. Please submit an issue or pull request and we will aim to incorporate the changes.

Performance update: demultiplexing now runs in parallel batches of 8 plates and input FASTQs are staged once per run, improving throughput on multi-core systems.

Recent repository polish:

Faster imports: import levseq no longer initializes visualization libraries unless they are needed.
Cleaner run startup: plotting dependencies are loaded only when platemaps are generated.
Packaging cleanup: bundled barcode files and demultiplex binaries are now declared through package discovery.
Git hygiene: local node_modules/ folders are ignored.

Quick Start

Note the current stable version is: 1.5.1, the latest version is 1.5.1.

For stable releases these are made available via docker and pip. For latest versions, please clone the repo and install locally (see Local development or install of latest version below).

How to Run LevSeq

Before running LevSeq, prepare:

A folder containing Oxford Nanopore basecalled FASTQ files, usually from a fastq_pass directory.
A reference CSV file with the columns barcode_plate, name, and refseq (see Reference File Format).
A run name, which LevSeq uses as the output folder name.

The basic command format is:

levseq <run_name> <path_to_fastq_folder> <path_to_ref_csv>

Example:

levseq my_experiment /path/to/fastq_pass /path/to/ref.csv

LevSeq writes results to an output folder named after <run_name>. Key outputs include variants.csv, visualization_partial.csv, result CSV files, logs, and interactive platemap HTML files.

Common run options:

Use --output /path/to/output to choose where the run folder is created.
Use --skip_demultiplexing if reads have already been demultiplexed.
Use --skip_variantcalling if you only want to run demultiplexing.
Use --oligopool for experiments with multiple genes or references per barcode plate.
Use --show_msa to include multiple sequence alignment views in the output.

Docker Installation (Recommended)

Install Docker: https://docs.docker.com/engine/install/

Pull the appropriate image:

# For Linux/Windows x86 systems:
docker pull yueminglong/levseq:levseq-1.5.1-x86

# For Mac M-series chips (M1, M2, M3, M4):
docker pull yueminglong/levseq:levseq-1.5.1-arm64

Run LevSeq:

docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.5.1-arm64 my_experiment levseq_results/ levseq_results/ref.csv

Replace levseq-1.5.1-arm64 with the image tag that matches your platform and release.

Connect function data to your sequence data

docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.5.1-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"

Pip Installation (Mac/Linux only)

IMPORTANT: On Mac M-series chips (M1-M4), gcc 13 and 14 are REQUIRED:

brew install gcc@13 gcc@14

Create and activate conda environment:

conda create --name levseq python=3.12 -y
conda activate levseq

Install dependencies:

conda install -c bioconda -c conda-forge samtools minimap2

Install LevSeq:
```
pip install levseq
```

Run LevSeq:

levseq my_experiment /path/to/data/ /path/to/ref.csv

Combine function data:

levseq my_experiment /path/to/data/ /path/to/ref.csv  "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"

For function data, LevSeq currently expects LCMS files with these columns:

Sample Vial Number (corresponding to the well that the sample was from).
Area (which becomes fitness value).
Compound Name which is the name of the compound we filter for that is passed as a parameter.
The final _X.csv suffix should contain the barcode number used to match that sample to the correct plate. For example, if plate 2 used barcode 33, the fitness file should end in _33.csv, such as some_fitness_for_plate_2_33.csv.

Data and Visualization

Test Data: Sample data is available on Zenodo
Visualization Tool: A web application is available at https://levseqdb.streamlit.app/ - simply upload your LevSeq output and LCMS results
Self-hosted Solution: You can deploy your own instance using our LevSeq_db repository

Reference File Format (ref.csv)

Your reference CSV file must contain the following columns:

barcode_plate	name	refseq
33	Q97A76	ATGCGC...

For oligopool experiments (multiple proteins per plate), use:

barcode_plate	name	refseq
33	Q97A76	ATGCGCAAG
33	P96084	ATGGATCA
34	P46209	ATGGGGCAA
34	Q60336	ATGGGGCC

Command Line Arguments

Required Arguments

name: Name of the experiment (output folder)
path: Location of basecalled fastq files
summary: Path to reference CSV file

Optional Arguments

--skip_demultiplexing: Skip the demultiplexing step
--skip_variantcalling: Skip the variant calling step
--output: Custom save location (defaults to current directory)
--show_msa: Show multiple sequence alignment for each well
--oligopool: Process data as oligopool experiment
--fitness_files: Comma-separated LCMS or function-data CSV files to join with sequence results
--smiles: Reaction SMILES string used when joining function data
--compound: Compound name to filter in the function-data files
--variant_df: LevSeq variant dataframe to join with function data, usually visualization_partial.csv

Step-by-Step Tutorial

Prepare your sequencing data:
- Your fastq files should be in a directory structure similar to Nanopore's output
- Prepare a reference CSV file with barcode plates, sample names, and reference sequences

Run LevSeq:

# Via Docker
docker run --rm -v "/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.5.1-arm64 my_experiment levseq_results/ levseq_results/ref.csv

# Via pip
levseq my_experiment /path/to/data/ /path/to/ref.csv

Analyze results:
- Output includes variant data (CSV) and interactive visualizations (HTML)
- Upload results to the LevSeq visualization tool for further analysis

Experimental Setup

For the wet lab protocol:

Refer to the wiki
See the methods section of our paper
Order forward and reverse primers compatible with your plasmid
Install Oxford Nanopore's software for basecalling if needed

Additional Resources

Example Notebook: See example/Example.ipynb for a walkthrough
Advanced Usage: See the manuscript notebook
Troubleshooting: See our computational protocols wiki

Local development or install of latest version

conda create --name levseq python=3.10
git clone git@github.com:fhalab/LevSeq.git
cd LevSeq
python setup.py sdist bdist_wheel
pip install dist/levseq-1.5.1.tar.gz

Citing LevSeq

If you find LevSeq useful, please cite our paper:

@article{long2024levseq,
  title={LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning},
  author={Long, Yueming and Mora, Ariane and Li, Francesca-Zhoufan and Gürsoy, Emre and Johnston, Kadina E and Arnold, Frances H},
  journal={ACS Synthetic Biology},
  year={2024},
  publisher={American Chemical Society}
}

Contact

For detailed questions, troubleshooting, barcode design support, or feature requests, email lyming2021@gmail.com. Reproducible bugs and public feature discussions are also welcome as GitHub issues.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.5.1

May 10, 2026

1.5

Jan 21, 2026

1.4.3

Sep 6, 2025

1.4.2

Aug 8, 2025

1.4.1

May 14, 2025

1.4.0

Apr 7, 2025

1.3.3

Feb 17, 2025

1.3.2

Jan 21, 2025

1.3.1

Jan 20, 2025

1.2.10

Jan 20, 2025

1.2.9

Jan 8, 2025

1.2.7

Dec 12, 2024

1.2.6

Dec 12, 2024

1.2.5

Dec 2, 2024

1.2.1

Nov 29, 2024

1.1.1

Nov 27, 2024

1.1.0

Nov 27, 2024

1.0.2

Oct 24, 2024

1.0.1

Oct 7, 2024

1.0.0

Sep 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

levseq-1.5.1.tar.gz (375.0 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

levseq-1.5.1-py3-none-any.whl (384.4 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file levseq-1.5.1.tar.gz.

File metadata

Download URL: levseq-1.5.1.tar.gz
Upload date: May 10, 2026
Size: 375.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for levseq-1.5.1.tar.gz
Algorithm	Hash digest
SHA256	`b10256bcb7d08e6f3aa6d70055eeeb3d959896b96e80791ce6ce88211e322346`
MD5	`d5eef4c8b0d827776e7b43d7a8bed3b7`
BLAKE2b-256	`59dd7f3f9775e165163d112dbb4d26c4afb4af034096b91b8ec3342c9179b676`

See more details on using hashes here.

File details

Details for the file levseq-1.5.1-py3-none-any.whl.

File metadata

Download URL: levseq-1.5.1-py3-none-any.whl
Upload date: May 10, 2026
Size: 384.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for levseq-1.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`143345d71aca2a724386cf9ce2294558ad24c1b710d26f8ce10b4279cd40f1b8`
MD5	`64916d21f404a9f7da082a1219bd4a29`
BLAKE2b-256	`9eab1eff5b721c22975587086237f22919c5bbe2e9e95f26d3e27a27ed9b24ae`

See more details on using hashes here.

levseq 1.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Variant Sequencing with Nanopore (LevSeq)

Important: Barcode Improvements and LevSeq 2.0 Development

Notes

Quick Start

How to Run LevSeq

Docker Installation (Recommended)

Pip Installation (Mac/Linux only)

Data and Visualization

Reference File Format (ref.csv)

Command Line Arguments

Required Arguments

Optional Arguments

Step-by-Step Tutorial

Experimental Setup

Additional Resources

Local development or install of latest version

Citing LevSeq

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes