Relative K-mer Project
Relative K-mer Project
WGS analysis reveals extended natural transformation in Campylobacter impacting diagnostics and the pathogens adaptive potential.
Running title: WGS analysis of Campylobacter hybrid strains
Julia C. Golz 1a, Lennard Epping 2#, Marie-Theres Knüver 1a, Maria Borowiak 1b, Felix Hartkopf 2, Carlus Deneke 1b, Burkhard Malorny 1b, Torsten Semmler 2, Kerstin Stingl 1a*
1 German Federal Institute for Risk Assessment, Department of Biological Safety, a National Reference Laboratory for Campylobacter, b Study Centre for Genome Sequencing and Analysis, Berlin, Germany 2 Robert Koch Institute, Microbial Genomics, Berlin, Germany
# sharing first author
* corresponding author
In the past decade, Campylobacter infections are getting more common worldwide. These infections can lead to diarrhea, abdominal pain, fever, headache, nausea, and/or vomiting and pose a serious danger for public health. This sparked efforts to improve prevention, treatment and reduce transmissions. As further stated by Kaakoush et al. , the main risks are the consumption of animal products and water, contact with animals and international travels.
As the threat to public health differs among Campylobacter species, it is important to identify dangerous Campylobacter species and investigate their characteristics in genotype and phenotype. In this work, a kmer mapping approach is used to identify recombination events and involved genes to describe hybrid species. Therefore, hybrids of Campylobacter jejuni and Campylobacter coli are analyzed to validate this approach and to develop a workflow that can be applied to emerging hybrids in general. This would allow a fast and reliable classification of hybrids.
KMC3  and BEDTools  are utilized to extract kmers of Campylobacter genomes and to calculate shared kmers of two species and their hybrids. Subsequently, these kmers can be used in combination with Blast  and Bowtie 2  to select genes that are shared with the hybrid genomes. These genes can be grouped into batches that were involved in a single recombination event. A visualization of the gene coverage generated using R provides further information about the selected genes.
This work will provide a new generic tool for hybrid analysis that could be expanded to other bacteria and enable researchers to classify new species and recombination events in a fast and reliable manner.
 Global Epidemiology of Campylobacter Infection
Nadeem O. Kaakoush, Natalia Castaño-Rodríguez, Hazel M. Mitchell, Si Ming Man
Clinical Microbiology Reviews Jun 2015, 28 (3) 687-720; DOI: 10.1128/CMR.00006-15
 Marek Kokot, Maciej Długosz, Sebastian Deorowicz, KMC 3: counting and manipulating k-mer statistics, Bioinformatics, Volume 33, Issue 17, 01 September 2017, Pages 2759–2761, https://doi.org/10.1093/bioinformatics/btx304
 Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman, Basic local alignment search tool, Journal of Molecular Biology, Volume 215, Issue 3, 1990, Pages 403-410, ISSN 0022-2836, https://doi.org/10.1016/S0022-2836(05)80360-2.
 Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
 Aaron R. Quinlan, Ira M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, Volume 26, Issue 6, 15 March 2010, Pages 841–842, https://doi.org/10.1093/bioinformatics/btq033
- Python 3.X
- numpy = 1.17.3
- matplotlib = 3.1.2
- pandas = 0.25.3
- biopython = 1.76
- argparse = 1.4.0
- tqdm = 4.41.1
- kmc = 3.1.1
- bowtie2 = 2.3.5
- bedtools = 2.29.2
- r = 3.6
- pheatmap = 1.0.12
- gplots = 188.8.131.52
- blast = 2.9.0
- samtools = 1.10
- bedops = 2.4.37
Change to src directory in RKP repository:
Create environment with all dependencies needed by RKP:
conda env create -f RKP.yaml
Activate RKP environment:
conda activate RKP
python RKP.py -A <acceptor genome dir A> -B <hybrid genome dir B> -C <donor genome dir C> -k <kmerlength> -a <acceptor treshold> -c <donor threshold> -g <acceptor reference genome fasta> -f <acceptor refernecs genome gff> -o <output directory>
|-A, -C||Two directories with genomes (.fna) of acceptor and donor|
|-B||Directory with genomes (.fasta) and fnn files of hybrids|
|-k||Length of kmers|
|-at||Relative amount (0 to 1) of isolates of acceptor that should have kmer x|
|-dt||Relative amount (0 to 1) of isolates of donor that should have kmer x|
|-g||acceptor reference genome|
|-f||acceptor reference gff file|
|-d||Keep all temporary files|
|--version||Show version of RKP|
|-t||number of threads, default = 8|
File structure of output
output │ │ │ └───Acceptor │ │ (only temporary files) │ └───Hybrid | │ *_iso_seq_protein.fasta | | *_iso_seq.fasta | | mapping_result_Genes_count.csv | | mapping_result_Genes_cutoff_20.csv | | mapping_result_Genes_raw.csv | | mapping_result.csv | | mapping_result.pdf | | recombination_cov_<kmerLength>_W50.pdf | | recombination_cov_<kmerLength>_W100.pdf | | recombination_cov_<kmerLength>_W200.pdf | | recombination_cov_<kmerLength>_W300.pdf | | recombination_cov_<kmerLength>_W400.pdf | | recombination_cov_<kmerLength>_W500.pdf | | Recombination_result_<kmerLength>_W50.csv | | Recombination_result_<kmerLength>_W100.csv | | Recombination_result_<kmerLength>_W200.csv | | Recombination_result_<kmerLength>_W300.csv | | Recombination_result_<kmerLength>_W400.csv | | Recombination_result_<kmerLength>_W500.csv | └───Donor | │ (only temporary files) | └───RKP.log
graph TD; RKP.py-->create_kmers.sh; create_kmers.sh-->map_kmers.sh; RKP.py-->heatmap.R;
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size RKP-0.1.0-py3-none-any.whl (29.1 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size RKP-0.1.0.tar.gz (14.7 kB)||File type Source||Python version None||Upload date||Hashes View hashes|