Skip to main content

Non-redundant pangenome assemblies from multiple genomes or bins

Project description

SuperPang: non-redundant pangenome assemblies from multiple genomes or bins

Installation

Requires graph-tool, mOTUlizer v0.2.4, minimap2 and mappy. The easiest way to get it running is using conda.

# Install into a new conda environment
conda create -n SuperPang -c conda-forge -c bioconda -c fpusan superpang
# Check that it works for you!
conda activate SuperPang
test-SuperPang.py

Usage

SuperPang.py --fasta <genome1.fasta> <genome2.fasta> <genomeN.fasta> --checkm <check_results> --output-dir <output_directory>

Arguments

  • -f/--fasta: Input fasta files with the sequences for each bin/genome
  • -q/--checkm: CheckM output for the bins. This can be the STDOUT of running checkm on all the fasta files passed in --fasta, or a tab-delimited file in the form genome1 percent_completeness. If empty, completeness will be estimated by mOTUpan but this may lead to wrong estimations for very incomplete genomes.
  • -i/--identity_threshold: Identity threshold (fraction) to initiate correction with minimap2. Default 0.9.
  • -m/--mismatch-size-threshold: Maximum contiguous mismatch size that will be corrected. Default 100.
  • -g/--indel-size-threshold: Maximum contiguous indel size that will be corrected. Default 100.
  • -r/--correction-repeats: Maximum iterations for sequence correction. Default 5.
  • -n/--correction-repeats-min: Minimum iterations for sequence correction. Default 5.
  • -k/--ksize: Kmer-size. Default 301.
  • -l/--minlen: Scaffold length cutoff. Default 0 (no cutoff).
  • -c/--mincov: Scaffold coverage cutoff. Default 0 (no cutoff).
  • -b/--bubble-identity-threshold: Minimum identity (matches / alignment length) required to remove a bubble in the sequence graph.
  • -a/--genome-assignment-threshold. Fraction of shared kmers required to assign a contig to an input genome (0 means a shared kmer is enough). Default 0.5.
  • -x/--default-completeness: Default genome completeness to assume if a CheckM output is not provided with --checkm. Default 50.
  • -t/--threads: Number of processors to use. Default 1.
  • -o/--output: Output directory. Default output.
  • --assume-complete: Assume that the input genomes are complete (--genome-assignment-threshold 0.95, --default-completeness 95).
  • --minimap2-path: Path to the minimap2 executable. Default minimap2.
  • --keep-intermediate: Keep intermediate files.

Output

  • assembly.fasta: contigs.
  • assembly.info: core/auxiliary and path information for each contig.
  • nodes.fasta: assembly nodes.
  • core.fasta: assembly nodes deemed to belong to the core genome of the species by mOTUpan.
  • auxiliary.fasta: assembly nodes deemed to belong to the auxiliary genome of the species.
  • graph.fastg: assembly graph in a format compatible with bandage.
  • node2origins.tsv: tab-separated file with the assembly nodes, and a comma-separated list of the input genome in which that node was deemed present.
  • params.tsv: parameters used in the run.

About

SuperPang is developed by Fernando Puente-Sánchez (Sveriges lantsbruksuniversitet). Feel free to open an issue or reach out for support fernando.puente.sanchez@slu.se.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SuperPang-0.8.1.tar.gz (6.3 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page