A set of scripts to convert multiple breseq analyses together and highlight variabls of interest.
Project description
isolate_parsers
Usage
python isolateset_parser.py [-h] [-i FOLDER] [--no-fasta] [-w WHITELIST]
[-b BLACKLIST] [-m SAMPLE_MAP] [--filter-1000bp]
optional arguments:
-h, --help show this help message and exit
-i FOLDER, --input FOLDER
The breseq folder to parse.
--no-fasta Whether to generate an aligned fasta file of all snps
in the breseq VCF file.
-w WHITELIST, --whitelist WHITELIST
Samples not in the whitelist are ignored. Either a
comma-separated list of sample ids for a file with
each sample id occupying a single line.
-b BLACKLIST, --blacklist BLACKLIST
Samples to ignore. See `--whitelist` for possible
input formats.
-m SAMPLE_MAP, --sample-map SAMPLE_MAP
A file mapping sample ids to sample names. Use if the
subfolders in the breseqset folder are named
differently from the sample names. The file should
have two columns: `sampleId` and `sampleName`,
separated by a tab character.
--filter-1000bp Whether to filter out variants that occur within
1000bp of each other. Usually indicates a mapping
error.
Input
The scripts expect a folder of individual breseq runs, with each folder named after the isolate/sample.
The scipts only require the output.vcf
, annotated.gd
, and index.html
files located in each folder.
Example folder:
.breseq_folder
|-- sample1
| |-- data
| | |-- output.vcf
| |-- output
| | |-- index.html
| | |-- evidence
| | | |-- annotated.gd
|-- sample2
| |-- data
| | |-- output.vcf
| |-- output
| | |-- index.html
| | |-- evidence
| | | |-- annotated.gd
|-- sample3
| |-- data
| | |-- output.vcf
| |-- output
| | |-- index.html
| | |-- evidence
| | | |-- annotated.gd
Output
The scripts generate an excel file in the breseq run folder with 4 sheets: comparison
, variant
, coverage
, and junction
.
The variant
, coverage
, and junction
tables are just the concatenated tables of all samples in the breseq run.
Comparision table
A table in which every row represents a single mutation seen in the sample callset and samples are represented by columns with the alternate sequence for each sample.
Sample1 | Sample2 | Sample3 | annotation | description | gene | locusTag | mutationCategory | position | presentIn | presentInAllSamples | ref | seq id |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GG | GG | GG | intergenic (+65/+20) | putative lipoprotein/putative hydrolase | PFLU0045 - / - PFLU0046 | PFLU0045/PFLU0046 | small_indel | 45881 | 3 | 1 | G | NC_012660 |
CC | CC | CC | intergenic (+17/-136) | microcin-processing peptidase 1. Unknown type peptidase. MEROPS family U62/hypothetical protein | PFLU0872 - / - PFLU0873 | PFLU0872/PFLU0873 | small_indel | 985333 | 3 | 1 | C | NC_012660 |
intergenic (+57/+21) | hypothetical protein/putative helicase | PFLU3154 - / - PFLU3155 | PFLU3154/PFLU3155 | small_indel | 3447986 | 3 | 1 | NC_012660 | ||||
A | A | G | M350I (ATG-ATA) | putative GGDEF domain signaling protein | PFLU3571 - | PFLU3571 | snp_nonsynonymous | 3959631 | 2 | 0 | G | NC_012660 |
A | A | C | T238P (ACC-CCC) | hybrid sensory histidine kinase in two-component regulatory system with UvrY | PFLU3777 - | PFLU3777 | snp_nonsynonymous | 4173231 | 1 | 0 | A | NC_012660 |
G | G | GG | coding (322/1476 nt) | putative two-component system response regulator nitrogen regulation protein NR(I) | PFLU4443 - | PFLU4443 | small_indel | 4908233 | 1 | 0 | G | NC_012660 |
Aligned fasta files
The scripts also generates 3 fasta files (breseq.snp.fasta
, breseq.amino.fasta
, breseq.codon.fasta
)
with all nonsynonymous snps from each sample represented by the replacement bases, amino acids, and codons.
Example:
>reference
GA
>Sample1
AA
>Sample2
AA
>Sample3
GC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file workflow.py-0.1.0.tar.gz
.
File metadata
- Download URL: workflow.py-0.1.0.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f838d8e0a9b5d739357207fb321aec2b0390b4ce7f33304098ebe0eeb2e13edf |
|
MD5 | 0c2ff351a52738e72c37ea92001fa2ce |
|
BLAKE2b-256 | 76c2ba9e898fcc1bf65399ee651d2eb62b9b3e14870378bd607ca6593f0faf49 |
File details
Details for the file workflow.py-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: workflow.py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8ccff7ec629e780b57c980340c47d4026d673ae349cd1f183bf98e59cf46d21 |
|
MD5 | c133ea75e638be2437c95859093aabf0 |
|
BLAKE2b-256 | 3bd1e9bf32f529ab19d39e14d7244988cb8297bcd230ff11dcb4204bd21a6ef1 |