Skip to main content

Relative K-mer Project

Project description

Relative K-mer Project

Abstract

WGS analysis reveals extended natural transformation in Campylobacter impacting diagnostics and the pathogens adaptive potential.

Running title: WGS analysis of Campylobacter hybrid strains

Julia C. Golz 1a, Lennard Epping 2#, Marie-Theres Knüver 1a, Maria Borowiak 1b, Felix Hartkopf 2, Carlus Deneke 1b, Burkhard Malorny 1b, Torsten Semmler 2, Kerstin Stingl 1a*

1 German Federal Institute for Risk Assessment, Department of Biological Safety, a National Reference Laboratory for Campylobacter, b Study Centre for Genome Sequencing and Analysis, Berlin, Germany 2 Robert Koch Institute, Microbial Genomics, Berlin, Germany

# sharing first author
* corresponding author

In the past decade, Campylobacter infections are getting more common worldwide. These infections can lead to diarrhea, abdominal pain, fever, headache, nausea, and/or vomiting and pose a serious danger for public health. This sparked efforts to improve prevention, treatment and reduce transmissions. As further stated by Kaakoush et al. [1], the main risks are the consumption of animal products and water, contact with animals and international travels.

As the threat to public health differs among Campylobacter species, it is important to identify dangerous Campylobacter species and investigate their characteristics in genotype and phenotype. In this work, a kmer mapping approach is used to identify recombination events and involved genes to describe hybrid species. Therefore, hybrids of Campylobacter jejuni and Campylobacter coli are analyzed to validate this approach and to develop a workflow that can be applied to emerging hybrids in general. This would allow a fast and reliable classification of hybrids.

KMC3 [2] and BEDTools [5] are utilized to extract kmers of Campylobacter genomes and to calculate shared kmers of two species and their hybrids. Subsequently, these kmers can be used in combination with Blast [3] and Bowtie 2 [4] to select genes that are shared with the hybrid genomes. These genes can be grouped into batches that were involved in a single recombination event. A visualization of the gene coverage generated using R provides further information about the selected genes.

This work will provide a new generic tool for hybrid analysis that could be expanded to other bacteria and enable researchers to classify new species and recombination events in a fast and reliable manner.

[1] Global Epidemiology of Campylobacter Infection Nadeem O. Kaakoush, Natalia Castaño-Rodríguez, Hazel M. Mitchell, Si Ming Man Clinical Microbiology Reviews Jun 2015, 28 (3) 687-720; DOI: 10.1128/CMR.00006-15
[2] Marek Kokot, Maciej Długosz, Sebastian Deorowicz, KMC 3: counting and manipulating k-mer statistics, Bioinformatics, Volume 33, Issue 17, 01 September 2017, Pages 2759–2761, https://doi.org/10.1093/bioinformatics/btx304
[3] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman, Basic local alignment search tool, Journal of Molecular Biology, Volume 215, Issue 3, 1990, Pages 403-410, ISSN 0022-2836, https://doi.org/10.1016/S0022-2836(05)80360-2.
[4] Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
[5] Aaron R. Quinlan, Ira M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, Volume 26, Issue 6, 15 March 2010, Pages 841–842, https://doi.org/10.1093/bioinformatics/btq033

Requirements

or

  • Python 3.X
    • numpy = 1.17.3
    • matplotlib = 3.1.2
    • pandas = 0.25.3
    • biopython = 1.76
    • argparse = 1.4.0
    • tqdm = 4.41.1
  • kmc = 3.1.1
  • bowtie2 = 2.3.5
  • bedtools = 2.29.2
  • r = 3.6
    • pheatmap = 1.0.12
    • gplots = 3.0.1.1
  • blast = 2.9.0
  • samtools = 1.10
  • bedops = 2.4.37
  • seqkit=0.11.0

Installation

Change to src directory in RKP repository:

cd path/to/repo/src

Create environment with all dependencies needed by RKP:

conda env create -f RKP.yaml

Activate RKP environment:

conda activate RKP

Run RKP:

 python RKP.py -A <acceptor genome dir A> -B <hybrid genome dir B> -C <donor genome dir C> -k  <kmerlength> -a <acceptor treshold> -c <donor threshold> -g <acceptor reference genome fasta> -f <acceptor refernecs genome gff> -o <output directory>

Required parameters:

Parameter Description
-A, -C Two directories with genomes (.fna) of acceptor and donor
-B Directory with genomes (.fasta) and fnn files of hybrids
-k Length of kmers
-at Relative amount (0 to 1) of isolates of acceptor that should have kmer x
-dt Relative amount (0 to 1) of isolates of donor that should have kmer x
-g acceptor reference genome
-f acceptor reference gff file
-o output directory

Optional parameters:

Parameter Description
-d Keep all temporary files
--version Show version of RKP
-h Show help
-t number of threads, default = 8

File structure of output

output
│
│  
│
└───Acceptor
│   │   (only temporary files)
│   
└───Hybrid
|   │   *_iso_seq_protein.fasta
|   |   *_iso_seq.fasta
|   |   mapping_result_Genes_count.csv
|   |   mapping_result_Genes_cutoff_20.csv
|   |   mapping_result_Genes_raw.csv
|   |   mapping_result.csv
|   |   mapping_result.pdf
|   |   recombination_cov_<kmerLength>_W50.pdf
|   |   recombination_cov_<kmerLength>_W100.pdf
|   |   recombination_cov_<kmerLength>_W200.pdf
|   |   recombination_cov_<kmerLength>_W300.pdf
|   |   recombination_cov_<kmerLength>_W400.pdf
|   |   recombination_cov_<kmerLength>_W500.pdf
|   |   Recombination_result_<kmerLength>_W50.csv
|   |   Recombination_result_<kmerLength>_W100.csv
|   |   Recombination_result_<kmerLength>_W200.csv
|   |   Recombination_result_<kmerLength>_W300.csv
|   |   Recombination_result_<kmerLength>_W400.csv
|   |   Recombination_result_<kmerLength>_W500.csv
|
└───Donor
|   │   (only temporary files)
|
└───RKP.log

Call structure

graph TD;
  RKP.py-->create_kmers.sh;
  create_kmers.sh-->map_kmers.sh;
  RKP.py-->heatmap.R;

Workflow

workflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RKP-0.1.0.tar.gz (14.7 kB view hashes)

Uploaded Source

Built Distribution

RKP-0.1.0-py3-none-any.whl (29.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page