Skip to main content

A set of scripts to convert multiple breseq analyses together and highlight variabls of interest.

Project description

isolate_parsers

Usage

python isolateset_parser.py [-h] [-i FOLDER] [--no-fasta] [-w WHITELIST]
                            [-b BLACKLIST] [-m SAMPLE_MAP] [--filter-1000bp]

optional arguments:
  -h, --help            show this help message and exit
  -i FOLDER, --input FOLDER
                        The breseq folder to parse.
  --no-fasta            Whether to generate an aligned fasta file of all snps
                        in the breseq VCF file.
  -w WHITELIST, --whitelist WHITELIST
                        Samples not in the whitelist are ignored. Either a
                        comma-separated list of sample ids for a file with
                        each sample id occupying a single line.
  -b BLACKLIST, --blacklist BLACKLIST
                        Samples to ignore. See `--whitelist` for possible
                        input formats.
  -m SAMPLE_MAP, --sample-map SAMPLE_MAP
                        A file mapping sample ids to sample names. Use if the
                        subfolders in the breseqset folder are named
                        differently from the sample names. The file should
                        have two columns: `sampleId` and `sampleName`,
                        separated by a tab character.
  --filter-1000bp       Whether to filter out variants that occur within
                        1000bp of each other. Usually indicates a mapping
                        error.

Input

The scripts expect a folder of individual breseq runs, with each folder named after the isolate/sample. The scipts only require the output.vcf, annotated.gd, and index.html files located in each folder. Example folder:

    .breseq_folder
    |-- sample1
    |   |-- data
    |   |   |-- output.vcf
    |   |-- output
    |   |   |-- index.html
    |   |   |-- evidence
    |   |   |   |-- annotated.gd
    |-- sample2
    |   |-- data
    |   |   |-- output.vcf
    |   |-- output
    |   |   |-- index.html
    |   |   |-- evidence
    |   |   |   |-- annotated.gd
    |-- sample3
    |   |-- data
    |   |   |-- output.vcf
    |   |-- output
    |   |   |-- index.html
    |   |   |-- evidence
    |   |   |   |-- annotated.gd

Output

The scripts generate an excel file in the breseq run folder with 4 sheets: comparison, variant, coverage, and junction. The variant, coverage, and junction tables are just the concatenated tables of all samples in the breseq run.

Comparision table

A table in which every row represents a single mutation seen in the sample callset and samples are represented by columns with the alternate sequence for each sample.

Sample1 Sample2 Sample3 annotation description gene locusTag mutationCategory position presentIn presentInAllSamples ref seq id
GG GG GG intergenic (+65/+20) putative lipoprotein/putative hydrolase PFLU0045 - / - PFLU0046 PFLU0045/PFLU0046 small_indel 45881 3 1 G NC_012660
CC CC CC intergenic (+17/-136) microcin-processing peptidase 1. Unknown type peptidase. MEROPS family U62/hypothetical protein PFLU0872 - / - PFLU0873 PFLU0872/PFLU0873 small_indel 985333 3 1 C NC_012660
intergenic (+57/+21) hypothetical protein/putative helicase PFLU3154 - / - PFLU3155 PFLU3154/PFLU3155 small_indel 3447986 3 1 NC_012660
A A G M350I (ATG-ATA) putative GGDEF domain signaling protein PFLU3571 - PFLU3571 snp_nonsynonymous 3959631 2 0 G NC_012660
A A C T238P (ACC-CCC) hybrid sensory histidine kinase in two-component regulatory system with UvrY PFLU3777 - PFLU3777 snp_nonsynonymous 4173231 1 0 A NC_012660
G G GG coding (322/1476 nt) putative two-component system response regulator nitrogen regulation protein NR(I) PFLU4443 - PFLU4443 small_indel 4908233 1 0 G NC_012660

Aligned fasta files

The scripts also generates 3 fasta files (breseq.snp.fasta, breseq.amino.fasta, breseq.codon.fasta) with all nonsynonymous snps from each sample represented by the replacement bases, amino acids, and codons. Example:

>reference
GA
>Sample1
AA
>Sample2
AA
>Sample3
GC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isolateparser-0.1.1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

isolateparser-0.1.1-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file isolateparser-0.1.1.tar.gz.

File metadata

  • Download URL: isolateparser-0.1.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for isolateparser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e44b1ca4dbdf613ba19111ab7f8115d7a8f20ef6c55aeebd65265c71a0748f3e
MD5 07f4afd82711db3ad19aabb48e355dd9
BLAKE2b-256 9f8445a1644c5b5d0c003efc502a21cba978f0e6560d9af040ea75485c5e7d42

See more details on using hashes here.

File details

Details for the file isolateparser-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: isolateparser-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for isolateparser-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3ad46b029da6704f4dc70580c4033807cda0b1a675b77aa8f18251d3e96619be
MD5 d2ab05ad7a30f2cca4ff993588086f5d
BLAKE2b-256 71a34c6c678dd82c684495cc4cf7f2f1b3ee81cc212175febe428bf6248b165c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page