Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!
Project description
Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!
Overview
SAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:
- Step 1. pre-survey takes plasmid maps as inputs and guides users which groupings of plasmids is optimal.
- Step 2. submit samples according to the output of pre-survey.
- Step 3. post-analysis execute computational deconvolution of the obtained results, and generate a consensus sequence for each plasmid constituent within the sample mixture. This step must be run separately for each sample mixture.
- An optional third step, Step 4. visualization of results (optional) provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.
The algorithm permits mixing of six (or potentially even more) plasmids for sequencing with Oxford Nanopore Technology (e.g., Plasmidsaurus services) and permits mixing of plasmids with as few as two base differences. For more information, please check out our publication (coming soon).
SAVEMONEY via Google Colab!
- SAVEMONEY (supports both circular and linear alignment)
- SAVEMONEY BATCH (execute multiple rounds of post_analysis at once)
SAVEMONEY for local environment
Requirements
Verified on macOS, Linux, and Windows10
- Python 3.10 or later
- One of the following C++ compiler (though I don't know the minimum required version number)
- Clang 14.0.0
- GCC 12.2.0
- Microsoft C++ Build Tools (for Windows)
- biopython>=1.83
- pandas>=1.5.3
- parasail>=1.3.4
- Pillow>=9.4.0
- PuLP>=2.7.0
- scipy>=1.11.4
- snapgene_reader>=0.1.20
- tqdm>=4.66.1
- Cython>=3.0.7
- matplotlib>=3.7.1
- numpy>=1.23.5
- pyspoa>=0.2.1
- pysam>=0.22.0 (optional)
Installation
SAVEMONEY is available via pip.
pip install savemoney
If installation via pip fails, please check the requirements above. If any of the package conflicts with those already present in your environment, I recommend creating a new virtual environment.
If C++ compiler does not exist, install Xcode Command Line Tools using the following command (for macOS):
xcode-select --install
or download Microsoft C++ Build Tools (for Windows).
Quick usage
SAVEMONEY can be executed either in the python script or via command line.
Execute SAVEMONEY in python script
To import and execute SAVEMONEY in the python script. Follow the example below:
import savemoney
savemoney.pre_survey("path_to_sequence_directory", "save_directory", **kwargs)
savemoney.post_analysis("path_to_sequence_directory", "save_directory", **kwargs)
All of the plasmid map files with *.dna
and .fasta
extension (and in addition *.fastq
files for post analysis) in the path_to_sequence_directory
will be used for the analysis. Results will be generated in the save_directory
. kwargs
are optional parameters through which you can optimize the analysis:
# pre-survey
kwargs = {
'distance_threshold': 5, # main parameter to be changed
'number_of_groups': 1, # main parameter to be changed
'gap_open_penalty': 3, # alignment parameter
'gap_extend_penalty': 1, # alignment parameter
'match_score': 1, # alignment parameter
'mismatch_score': -2, # alignment parameter
'topology_of_dna': 0, # 0: circular, 1: linear
'n_cpu': 2, # number of cpu cores to be used
'export_image_results': 1, # 0; skip export of svg figure files, 1: export svg figure files
}
# post-analysis
kwargs = {
'score_threshold': 0.3, # main parameter to be changed
'gap_open_penalty': 3, # alignment parameter
'gap_extend_penalty': 1, # alignment parameter
'match_score': 1, # alignment parameter
'mismatch_score': -2, # alignment parameter
'error_rate': 0.00001, # prior probability for Bayesian analysis
'ins_rate': 0.00001, # prior probability for Bayesian analysis
'window': 160, # maximum detectable length of repetitive sequences when wrong plasmid maps are provided: if region of 80 nt is repeated adjascently two times, put the value of 160
'topology_of_dna': 0, # 0: circular, 1: linear
'n_cpu': 2, # number of cpu cores to be used
'export_image_results': 1, # 0; skip export of svg figure files, 1: export svg figure files
}
For the meaning of these parameters, please refer to the SAVEMONEY Google Colab page or the reference below.
Execute SAVEMONEY via command line
SAVEMONEY can also be executed via command line:
python -m savemoney.pre_survey path_to_sequence_directory save_directory
python -m savemoney.post_analysis path_to_sequence_directory save_directory
Parameters can be specified as follows:
# pre-survey
python -m savemoney.pre_survey -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-dt DT] [-nog NOG] plasmid_map_dir_paths save_dir_base
positional arguments:
plasmid_map_dir_paths path to plasmid map_directory
save_dir_base save directory path
options:
-h, --help show this help message and exit
-gop GOP gap_open_penalty, optional, default_value = 3
-gep GEP gap_extend_penalty, optional, default_value = 1
-ms MS match_score, optional, default_value = 1
-mms MMS mismatch_score, optional, default_value = -2
-dt DT distance_threshold, optional, default_value = 5
-nog NOG number_of_groups, optional, default_value = 1
-tod TOD topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)
-nc NC n_cpu, optional, default_value = 2
-eir EIR export_image_results, optional, default_value = 1
# post-analysis
python -m savemoney.post_analysis -h
usage: __main__.py [-h] [-gop GOP] [-gep GEP] [-ms MS] [-mms MMS] [-st ST] [-er ER] [-dmr DMR] [-ir IR] sequence_dir_paths save_dir_base
positional arguments:
sequence_dir_paths sequence_dir_paths
save_dir_base save directory path
options:
-h, --help show this help message and exit
-gop GOP gap_open_penalty, optional, default_value = 3
-gep GEP gap_extend_penalty, optional, default_value = 1
-ms MS match_score, optional, default_value = 1
-mms MMS mismatch_score, optional, default_value = -2
-st ST score_threshold, optional, default_value = 0.3
-er ER error_rate, optional, default_value = 1e-07
-ir IR ins_rate, optional, default_value = 1e-07
-w W window, optional, default_value = 160
-tod TOD topology_of_dna, optional, default_value = 0 (0: circular, 1: linear)
-nc NC n_cpu, optional, default_value = 2
-eir EIR export_image_results, optional, default_value = 1
Output
The interpretation of output files are described on SAVEMONEY Google Colab page in details. Other than that, you can visualize consensus alignment results by using your_plasmid_name.ca
file generated by SAVEMONEY.
From python script:
import savempney
savemoney.show_consensus(consensus_alignment_path, center=2000, seq_range=50, offset=0)
From command line:
python -m savemoney.show_consensus path_to_consensus_alignment_file
Parameters can be specified as follows:
python -m savemoney.show_consensus -h
usage: __main__.py [-h] [--center CENTER] [--seq_range SEQ_RANGE] [--offset OFFSET] consensus_alignment_path
positional arguments:
consensus_alignment_path path to consensus_alignment (*.ca) file
options:
-h, --help show this help message and exit
--center CENTER center, optional, default_value = 2000
--seq_range SEQ_RANGE seq_range, optional, default_value = 50
--offset OFFSET offset, optional, default_value = 0
Conversion of consensus alignment results (*.ca
) to *.bam
and *.fastq
format is also supported. The conversion requires pysam>=0.22.0 be installed in your environment. To convert the file, type the following code in a python script:
import savemoney
savemoney.ca2bam(consensus_alignment_path)
If you want to convert it via command line, type the following commnad:
python -m savemoney.ca2bam path_to_consensus_alignment_file
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for savemoney-0.3.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d32f678fd71da0acbfa9b6c215a5d5ddb58aa2d23d6a02778798171f0f7caa8f |
|
MD5 | 0e3cab715d98e123a980449571d4f440 |
|
BLAKE2b-256 | 99435314d35b5f8307f01da74da2b91d7f80094f9549a5178da9880b472b83fd |