Tool to computationally deconvolve combinatorially pooled arrayed random mutagenesis libraries
Project description
arraylib-solve
Introduction
arraylib-solve
is a tool to deconvolve combinatorially pooled arrayed random mutagenesis libraries (e.g. by transposon mutagenesis). In a typical experiment generating arrayed mutagenesis libraries, first a pooled version of the library is created and arrayed on a grid of well plates. To infer the identities of each mutant on the well plate, wells are pooled in combinatorial manner such that each mutant appears in a unique combination of pools. The pools are then sequenced using NGS and sequenced reads are stored in individual fastq files per pool. arraylib-solve
deconvolves the pools and returns summaries stating the identity and location of each mutant on the original well grid. The package is based on the approach described in [1].
Installation
To install arraylib-solve
first create Python 3.8
environment e.g. by
conda create --name arraylib-env python=3.8
conda activate arraylib-env
and install the package using
pip install arraylib-solve
How to run arraylib-solve
To run arraylib-solve
on a library deconvolution experiment with default parameters run:
arraylib-run <input_directory> <experimental_design.csv> -b <path_to_genbank_reference> -br <path_to_bowtie2_indices> -t <transposon_sequence> -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>
Input parameters
Required parameters:
- input_dir: path to directory holding the input fastq files
- exp_design: path to file indicating experimental design. The experimental design file should have columns, Filename, Poolname and Pooldimension. (see example in tests/test_data/full_exp_design.csv) - Filename should contain all the unqiue input fastq filenames. - Poolname should indicate to which pool a given file belongs. Multiple files per poolname are allowed. - Pooldimension indicates the pooling dimension a pool belongs to. All pools sharing the same pooling dimension should have the same string in the Pooldimension column.
- -gb path to genbank reference file
- -br path to bowtie index files, ending with the basename of your index (if the basename of your index is UTI89 and you store your bowtie2 references in bowtie_ref it should be bowtie_ref/UTI89). Please visit https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer for a manual how to create bowtie2 indices.
- -t transposon sequence (e.g. ATTGCCTA)
- -bu upstream sequence of barcode
- -bd downstream sequence of barcode
Optional parameters:
- -mq minimum bowtie2 alignment quality score for each base to include read
- -sq minimum phred score for each base to include read
- -tm number of transposon mismatches allowed
- -thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)
Output
arraylib-solve
outputs 4 files:
- count_matrix.csv: Read counts per pool for each mutant.
- filtered_matrix.csv: Read counts per pool for each mutant, but mutants with barcodes with low read counts for a given genomic location are filtered out.
- mutant_location_summary.csv: A summary of mutants found in the well plate grid, where each row corresponds to a different mutant.
- well_location_summary.csv: A summary of the deconvolved well plate grid, where each row corresponds to a different well.
References
[1] Baym, M., Shaket, L., Anzai, I.A., Adesina, O. and Barstow, B., 2016. Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku. Nature communications, 7(1), p.13270.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for arraylib_solve-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3f647c9f3d6b2375ac6a45becd9db09bd78808544ace5e746a3e340b494bfda |
|
MD5 | f00fbfa363f0f38cac625db11d93a899 |
|
BLAKE2b-256 | b3ba1d3ffd70f16ed0aec9eab59c421b6c719c396c69551234b7b48ca388d581 |