Skip to main content

Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!

Project description

Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!

Overview

SAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:

  • Step 1. pre-survey takes plasmid maps as inputs and guides users which groupings of plasmids is optimal.
  • Step 2. submit samples according to the output of pre-survey.
  • Step 3. post-analysis execute computational deconvolution of the obtained results, and generate a consensus sequence for each plasmid constituent within the sample mixture. This step must be run separately for each sample mixture.
  • An optional third step, Step 4. visualization of results (optional) provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.

The algorithm permits mixing of six (or potentially even more) plasmids for sequencing with Oxford Nanopore Technology (e.g., Plasmidsaurus services) and permits mixing of plasmids with as few as two base differences. For more information, please check out our publication (coming soon).

SAVEMONEY via Google Colab!

  • SAVEMONEY (supports both circular and linear alignment)
  • SAVEMONEY BATCH (execute multiple rounds of post_analysis at once)

SAVEMONEY for local environment

Requirements

Verified on macOS, Linux, and Windows10

  • Python 3.10 or later
  • One of the following C++ compiler (though I don't know the minimum required version number)
  • biopython>=1.83
  • pandas>=1.5.3
  • parasail>=1.3.4
  • Pillow>=9.4.0
  • PuLP>=2.7.0
  • scipy>=1.11.4
  • snapgene_reader>=0.1.20
  • tqdm>=4.66.1
  • Cython>=3.0.7
  • matplotlib>=3.7.1
  • numpy>=1.23.5
  • pyspoa>=0.2.1
  • pysam>=0.22.0 (optional)

Installation

SAVEMONEY is available via pip.

pip install savemoney

If installation via pip fails, please check the requirements above. If any of the package conflicts with those already present in your environment, I recommend creating a new virtual environment.

If C++ compiler does not exist, install Xcode Command Line Tools using the following command (for macOS):

xcode-select --install

or download Microsoft C++ Build Tools (for Windows).

Quick usage

SAVEMONEY can be executed either in the python script or via command line.

Execute SAVEMONEY in python script

To import and execute SAVEMONEY in the python script. Follow the example below:

import savemoney
savemoney.pre_survey("path_to_sequence_directory", "save_directory", **kwargs)
savemoney.post_analysis("path_to_sequence_directory", "save_directory", **kwargs)

All of the plasmid map files with *.dna and .fasta extension (and in addition *.fastq files for post analysis) in the path_to_sequence_directory will be used for the analysis. Results will be generated in the save_directory. kwargs are optional parameters through which you can optimize the analysis:

# pre-survey
kwargs = {
    'distance_threshold':   5,  # main parameter to be changed
    'number_of_groups':     1,  # main parameter to be changed
    'gap_open_penalty':     3,  # alignment parameter
    'gap_extend_penalty':   1,  # alignment parameter
    'match_score':          1,  # alignment parameter
    'mismatch_score':      -2,  # alignment parameter
    'topology_of_dna':      0,  # 0: circular, 1: linear
    'n_cpu':                2,  # number of cpu cores to be used
    'export_image_results': 1,  # 0; skip export of svg figure files, 1: export svg figure files
}

# post-analysis
kwargs = {
    'score_threshold':    0.3,  # main parameter to be changed 
    'gap_open_penalty':     3,  # alignment parameter
    'gap_extend_penalty':   1,  # alignment parameter
    'match_score':          1,  # alignment parameter
    'mismatch_score':      -2,  # alignment parameter
    'error_rate':     0.00001,  # prior probability for Bayesian analysis
    'ins_rate':       0.00001,  # prior probability for Bayesian analysis
    'window':             160,  # maximum detectable length of repetitive sequences when wrong plasmid maps are provided: if region of 80 nt is repeated adjascently two times, put the value of 160
    'topology_of_dna':      0,  # 0: circular, 1: linear
    'n_cpu':                2,  # number of cpu cores to be used
    'export_image_results': 1,  # 0; skip export of svg figure files, 1: export svg figure files
}

For the meaning of these parameters, please refer to the SAVEMONEY Google Colab page or the reference below.

Execute SAVEMONEY via command line

SAVEMONEY can also be executed via command line:

python -m savemoney.pre_survey path_to_sequence_directory save_directory
python -m savemoney.post_analysis path_to_sequence_directory save_directory

Parameters can be specified as follows:

# pre-survey
python -m savemoney.pre_survey -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-dt DT] [-nog NOG] plasmid_map_dir_paths save_dir_base
positional arguments:
  plasmid_map_dir_paths path to plasmid map_directory
  save_dir_base         save directory path
options:
  -h, --help            show this help message and exit
  -gop GOP              gap_open_penalty, optional, default_value = 3
  -gep GEP              gap_extend_penalty, optional, default_value = 1
  -ms MS                match_score, optional, default_value = 1
  -mms MMS              mismatch_score, optional, default_value = -2
  -dt DT                distance_threshold, optional, default_value = 5
  -nog NOG              number_of_groups, optional, default_value = 1
  -tod TOD              topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)
  -nc NC                n_cpu, optional, default_value = 2
  -eir EIR              export_image_results, optional, default_value = 1

# post-analysis
python -m savemoney.post_analysis -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-st ST] [-er ER] [-dmr DMR] [-ir IR] sequence_dir_paths save_dir_base
positional arguments:
  sequence_dir_paths  sequence_dir_paths
  save_dir_base       save directory path
options:
  -h, --help          show this help message and exit
  -gop GOP            gap_open_penalty, optional, default_value = 3
  -gep GEP            gap_extend_penalty, optional, default_value = 1
  -ms MS              match_score, optional, default_value = 1
  -mms MMS            mismatch_score, optional, default_value = -2
  -st ST              score_threshold, optional, default_value = 0.3
  -er ER              error_rate, optional, default_value = 1e-07
  -ir IR              ins_rate, optional, default_value = 1e-07
  -w W                window, optional, default_value = 160
  -tod TOD            topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)
  -nc NC              n_cpu, optional, default_value = 2
  -eir EIR            export_image_results, optional, default_value = 1

Output

The interpretation of output files are described on SAVEMONEY Google Colab page in details. Other than that, you can visualize consensus alignment results by using your_plasmid_name.ca file generated by SAVEMONEY.

From python script:

import savempney
savemoney.show_consensus(consensus_alignment_path, center=2000, seq_range=50, offset=0)

From command line:

python -m savemoney.show_consensus path_to_consensus_alignment_file

Parameters can be specified as follows:

python -m savemoney.show_consensus -h
usage: __main__.py [-h] [--center CENTER] [--seq_range SEQ_RANGE] [--offset OFFSET] consensus_alignment_path
positional arguments:
  consensus_alignment_path  path to consensus_alignment (*.ca) file
options:
  -h, --help            show this help message and exit
  --center CENTER       center, optional, default_value = 2000
  --seq_range SEQ_RANGE seq_range, optional, default_value = 50
  --offset OFFSET       offset, optional, default_value = 0

Conversion of consensus alignment results (*.ca) to *.bam and *.fastq format is also supported. The conversion requires pysam>=0.22.0 be installed in your environment. To convert the file, type the following code in a python script:

import savemoney
savemoney.ca2bam(consensus_alignment_path)

If you want to convert it via command line, type the following commnad:

python -m savemoney.ca2bam path_to_consensus_alignment_file

References

Uematsu M., Baskin J. M., "Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing." eLife. 2023; 12: RP88794

Slide from Weill Institute Science Workshop, May 22, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

savemoney-0.3.4.tar.gz (250.4 kB view details)

Uploaded Source

Built Distribution

savemoney-0.3.4-cp310-cp310-macosx_11_0_arm64.whl (362.4 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

File details

Details for the file savemoney-0.3.4.tar.gz.

File metadata

  • Download URL: savemoney-0.3.4.tar.gz
  • Upload date:
  • Size: 250.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for savemoney-0.3.4.tar.gz
Algorithm Hash digest
SHA256 caf3b1d6b110bf65928eddcf1cb31e45aebf0d80fb20afafca6e7422d8e0d7c7
MD5 d54b8f3fcfca4f0a4df1cef8d147697d
BLAKE2b-256 5cbd060e7fd792bc2ed97aad5a33fb7ee7a65aeba36fec333c2ef9318e739b5c

See more details on using hashes here.

File details

Details for the file savemoney-0.3.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for savemoney-0.3.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d32f678fd71da0acbfa9b6c215a5d5ddb58aa2d23d6a02778798171f0f7caa8f
MD5 0e3cab715d98e123a980449571d4f440
BLAKE2b-256 99435314d35b5f8307f01da74da2b91d7f80094f9549a5178da9880b472b83fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page