Generate a PanGenome given a set of genomes
Project description
Primary contact: Anthony Aylward, aaylward@salk.edu
PanKmer
k-mer based and reference-free pangenome analysis. See the quickstart below, or read the documentation.
Installation
In a conda environment
First create an environment that includes all dependencies:
conda create -c conda-forge -c bioconda -n pankmer rust python \
biopython seaborn urllib3 python-newick pyfaidx gff2bed upsetplot \
pybedtools
Then install PanKmer with pip
:
conda activate pankmer
pip install pankmer
With pip
PanKmer is built with Rust,
so you will need to install
it if you have not already done so. Then you can install PanKmer with pip
:
pip install pankmer
Check installation
Check that the installation was successful by running:
pankmer --version
Tutorial
Download example dataset
The download-example
subcommand will download a small example dataset of
Chr19 sequences from S. polyrhiza.
pankmer download-example -d .
After running this command the directory PanKmer_example_Sp_Chr19/
will be present in the working directory. It contains FASTA files representing Chr19 from three genomes, and GFF files giving their gene annotations.
ls PanKmer_example_Sp_Chr19/*
PanKmer_example_Sp_Chr19/README.md
PanKmer_example_Sp_Chr19/Sp_Chr19_features:
Sp9509_oxford_v3_Chr19.gff3.gz Sp9512_a02_genes_Chr19.gff3.gz
PanKmer_example_Sp_Chr19/Sp_Chr19_genomes:
Sp7498_HiC_Chr19.fasta.gz Sp9509_oxford_v3_Chr19.fasta.gz Sp9512_a02_genome_Chr19.fasta.gz
To get started, navigate to the downloaded directory.
cd PanKmer_example_Sp_Chr19/
Build a k-mer index
The k-mer index is a table tracking presence or absence of k-mers in the set of input genomes. To build an index, use the index
subcommand and provide a directory containing the input genomes.
pankmer index -g Sp_Chr19_genomes/ -o Sp_Chr19_index.tar
After completion, the index will be present as a tar file Sp_Chr19_index.tar
.
tar -tvf Sp_Chr19_index.tar
Sp_Chr19_index/
Sp_Chr19_index/kmers.b.gz
Sp_Chr19_index/metadata.json
Sp_Chr19_index/scores.b.gz
Note
The input genomes argument proided with the
-g
flag can be a directory, a tar archive, or a comma-separated list of FASTA files.If the output argument provided with the
-o
flag ends with.tar
, then the index will be written as a tar archive. Otherwise it will be written as a directory.
Create an adjacency matrix
A useful application of the k-mer index is to generate an adjacency matrix. This is a table of k-mer similarity values for each pair of genomes in the index. We can generate one using the adj-matrix
subcommand, which will produce a CSV file containing the matrix.
pankmer adj-matrix -i Sp_Chr19_index.tar -o Sp_Chr19_adj_matrix.csv
Note
The input index argument proided with the
-i
flag can be tar archive or a directory.
Plot a clustered heatmap
To visualize the adjacency matrix, we can plot a clustered heatmap of the adjacency values. In this case we use the Jaccard similarity metric for pairwise comparisons between genomes:
pankmer clustermap -i Sp_Chr19_adj_matrix.csv \
-o Sp_Chr19_adj_matrix.svg \
--metric jaccard \
--width 6.5 \
--height 6.5
Generate a gene variability heatmap
Generate a heatmap showing variability of genes across genomes. The following command uses the --n-features
option to limit analysis to the first two genes from each input GFF3 file. The resulting image shows the level of variability observed across genes from each genome.
pankmer reg_heatmap -i Sp_Chr19_index/ \
-r Sp_Chr19_genomes/Sp9509_oxford_v3_Chr19.fasta.gz Sp_Chr19_genomes/Sp9512_a02_genome_Chr19.fasta.gz \
-f Sp_Chr19_features/Sp9509_oxford_v3_Chr19.gff3.gz Sp_Chr19_features/Sp9512_a02_genes_Chr19.gff3.gz \
-o Sp_Chr19_gene_var.png \
--n-features 2 \
--height 3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pankmer-0.12.4.tar.gz
.
File metadata
- Download URL: pankmer-0.12.4.tar.gz
- Upload date:
- Size: 47.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8dcd06cf5fbc6584ad894067cce841508dea8e34f4f27921d7a8967627321512 |
|
MD5 | 7e9a1e17d92e23e99da75933eaf49693 |
|
BLAKE2b-256 | 75c3359c9d9718461d601772bdbc956dbf1e7884abcae343afbefaa52d1e36b9 |
File details
Details for the file pankmer-0.12.4-cp310-cp310-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: pankmer-0.12.4-cp310-cp310-macosx_10_7_x86_64.whl
- Upload date:
- Size: 594.6 kB
- Tags: CPython 3.10, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30dc9a9df95ac4633ceaf761f21276f4371d24e82083bd0df8d4b609e2357ec4 |
|
MD5 | 94c7c2054ae7f41c5cbe625eb6833cac |
|
BLAKE2b-256 | 9ec69ef1fa4d12b0277dd0e8995d3626689929a1389f4f9744dccea238305404 |