Skip to main content

NucDiff locates and categorizes differences between two closely related nucleotide sequences.

Project description

NucDiff manual



1 Introduction

NucDiff locates and categorizes differences between two closely related nucleotide sequences. It is able to deal with very fragmented genomes, structural rearrangements and various local differences. These features make NucDiff to be perfectly suitable to compare assemblies with each other or with available reference genomes.

NucDiff provides information about the types of differences and their locations. It is possible to upload the results into genome browser for visualization and further inspection. It was written in Python and uses the NUCmer package from MUMmer[1] for sequence comparison.



2 Prerequisites

NucDiff can be run on Linux and Mac OS. It uses Python 2.7, MUMmer v3.23 and the Biopython package. MUMmer and the Biopython package should be installed and be in the PATH before running NucDiff.

The MUMmer tarball can be downloaded at http://sourceforge.net/projects/mummer/ . The Biopython package can be downloaded at http://biopython.org/wiki/Download .



3 Running NucDiff

3.1 Command line syntax and input arguments

To run NucDiff, run nucdiff.py script with valid input arguments:

$ python  nucdiff.py [-h] [--reloc_dist [int]]
                          [--nucmer_opt [NUCMER_OPT]]
                          [--filter_opt [FILTER_OPT]] 
                          [--delta_file [DELTA_FILE]]
                          [--proc [int]] 
                          [--ref_name_full [{yes,no}]]
                          [--query_name_full [{yes,no}]] 
                          [--vcf [{yes,no}]]
                          [--version]
                          Reference.fasta Query.fasta Output_dir Prefix

Positional arguments:

  • Reference.fasta - Fasta file with the reference sequences
  • Query.fasta - Fasta file with the query sequences
  • Output_dir - Path to the directory where all intermediate and final results will be stored
  • Prefix - Name that will be added to all generated files including the ones created by NUCmer

Optional arguments:

  • -h, --help - show this help message and exit
  • --reloc_dist - Minimum distance between two relocated blocks [10000]
  • --nucmer_opt - NUCmer run options. By default, NUCmer will be run with its default parameters values, except the --maxmatch parameter. --maxmatch is hard coded and cannot be changed. To change any other parameter values, type parameter names and new values inside single or double quotation marks.
  • --filter_opt - Delta-filter run options. By default, it will be run with -q parameter only. -q is hard coded and cannot be changed. To add any other parameter values, type parameter names and their values inside single or double quotation marks.
  • --delta_file - Path to the already existing delta file (NUCmer output file)
  • --proc - Number of processes to be used [1]
  • --ref_name_full - Print full reference names in output files ('yes' value). In case of 'no', everything after the first space will be ignored. ['no']
  • --query_name_full - Print full query names in output files ('yes' value). In case of 'no', everything after the first space will be ignored. ['no']
  • --vcf [{yes,no}] - Output small and medium local differences in the VCF format ['no']
  • --version - show program's version number and exit



3.2 Running examples

A running example with NucDiff and NUCmer predefined parameters values, except NUCmer --maxmatch parameter and delta-filter -q parameter. --maxmatch is hard coded and cannot be changed neither to --mum nor to --mumreference. -q is also hard coded and cannot be changed neither to -g nor to -r:

$python nucdiff.py my_reference.fasta my_query.fasta my_output_dir my_prefix



A running example when user needs to change NUCmer and NucDiff default parameter values:

$python nucdiff.py --proc 5 --ref_name_full yes --query_name_full yes --nucmer_opt '-c 200 -l 250' my_reference.fasta my_query.fasta my_output_dir my_prefix



A detailed description of all possible NUCmer and delta-filter parameters as well as the .delta and .coord output files can be found in MUMmer manual at http://mummer.sourceforge.net/manual/ .



4 Method overview

4.1 NucDiff steps

The NucDiff workflow is shown in Figure 1. The detailed description of all steps can be found in [2].

Figure 1: The NucDiff workflow

4.2 Types of differences

All types of differences are classified into 3 groups: Global, Local and Structural (Figure 2).

Figure 2: Classification of the types of differences with group names found in coloured boxes with capitalised names and the specific types found in white boxes with lowercase names.

The definitions of all types of differences can be found in [2] and in GithHub wiki (https://github.com/uio-cels/NucDiff/wiki ).



5. NucDiff output

NucDiff puts its output in the directory <output_dir>/results. The output consists of 9 files :

  • ‹prefix›_ref_snps.gff
  • ‹prefix›_ref_struct.gff
  • ‹prefix›_ref_blocks.gff
  • ‹prefix›_ref_snps.vcf
  • ‹prefix›_query_snps.gff
  • ‹prefix›_query_struct.gff
  • ‹prefix›_query_blocks.gff
  • ‹prefix›_query_snps.vcf
  • ‹prefix›_stat.out

A detailed description of all output files can be found in GithHub wiki (https://github.com/uio-cels/NucDiff/wiki ).

A detailed description of used GFF3 and VCF file formats can be also found at https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md and https://samtools.github.io/hts-specs/VCFv4.2.pdf , respectively.

6.Citing NucDiff

To cite your use of NucDiff in your publication:

Khelik K, Lagesen K, Sandve GK, Rognes T, Nederbragt AJ. NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences. BMC Bioinformatics. 2017;18(1):338. doi: 10.1186/s12859-017-1748-z.



References

[1] Kurtz S et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12. doi 10.1186/gb-2004-5-2-r12.

[2] Khelik K et al. NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences. BMC Bioinformatics. 2017;18(1):338. doi: 10.1186/s12859-017-1748-z.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NucDiff-2.0.3.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

NucDiff-2.0.3-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file NucDiff-2.0.3.tar.gz.

File metadata

  • Download URL: NucDiff-2.0.3.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/20.7.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/2.7.12

File hashes

Hashes for NucDiff-2.0.3.tar.gz
Algorithm Hash digest
SHA256 33a8f466849dee3b0083202500a3aa416e8399e75ef6781060216341ee52fedb
MD5 9fc49f2df0a48d6faa98764bb7fbfacd
BLAKE2b-256 c2a5fe7439640acea0a8d526164ee216b3a2c5bd208e052bd13f3491b8c73acf

See more details on using hashes here.

File details

Details for the file NucDiff-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: NucDiff-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/20.7.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/2.7.12

File hashes

Hashes for NucDiff-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dd5ec82d3bded90f907ae77130734b215dcf7b9e0a7d8a8313705bdb2ddfc58c
MD5 30b1ef379214767ad1e4ca46e6bad377
BLAKE2b-256 b0a9aeb80f98e8076f48647f03fecd2cb748350b3c5b9d2095099122a9e272d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page