Bionano SCaffolding Correction Tool
Bionano Scaffolding Correction Tool (BiSCoT)
BiSCoT is a tool that aims to improve the contiguity of scaffolds and contigs generated after a Bionano scaffolding. It looks for enzymatic labelling sites on contigs. If two distinct contigs share labels, BiSCoT merges them at the last shared site.
Biorxiv preprint : link
In case of troubles when using or installing the software, please open up an issue by clicking here.
BiSCoT comes in the form of a Python3 script with some Python and software dependencies. In order to run it correctly, you will need :
- Python 3 (tested with Python 3.6)
- the Biopython and Argparse python modules (both installed automatically with BiSCoT)
- the BLAT aligner if you plan on using the aggressive scaffolding mode that is based on shared labels and sequence similarity (BiSCoT was tested with the v36 version of BLAT)
BiSCoT is available on PyPI and can be installed with the following command:
pip install biscot
BiSCoT was designed to improve a prior Bionano scaffolding so it needs a few files generated during this step :
- one CMAP file (--cmap-ref argument) describing the positions of enzymatic labelling sites on the reference genome (filename usually looks like this
*.cut_CTTAAG_GCTCTTC_0kb_0labels_NGS_contigs_HYBRID_Export_r.cmapin the case of a double hybrid scaffolding)
- one CMAP file per enzyme (--cmap-1 and --cmap-2 arguments) describing the positions of enzymatic labelling sites on the contigs (filenames usually look like this :
E_CTTAAG_Q_NGScontigs_A_HYBRID_q.cmapfor DLE1 and
- a KEY file (--key argument) describing the names of the contig maps related to their FASTA file header names (filename usually looks like this
- one XMAP file per enzyme (--xmap-1 and --xmap-2 arguments) describing the alignments of contig labels on the anchor (filename usually looks like this
E_CTTAAG_Q_NGScontigs_A_HYBRID.xmapfor DLE1 and
- the contigs FASTA file (--contigs argument) that was used for the scaffolding
A typical execution of BiSCoT should look like this :
# Execute BiSCoT biscot.py --cmap-ref cmap_reference.cmap \\ --cmap-1 cmap_dle.cmap \\ --cmap-2 cmap_bspqi.cmap \\ --xmap-1 xmap_dle.xmap \\ --xmap-2 xmap_bspqi.xmap \\ --key key.txt \\ --contigs contigs.fasta \\ --output biscot
If everything went fine, a
biscot directory should have been created. Inside, you will find two output files :
scaffolds.fastacontaining the new scaffolds
scaffolds.agpfile containing the changes made to contigs
If you would like to change the name/path of the output directory, you can do so with the
--xmap-2enzargument is used to provide the final XMAP file containing the mappings of labels of both enzymes. This argument is useful (and recommended) to ensure that no mapping has been missed inside one of the individual XMAP file. Usually, this file's name looks like this:
--only-confirmed-posargument is used so that only mappings contained in the
--xmap-2enzfile are retained. Indeed, by using one XMAP file per enzyme, contigs can be placed two times in the final assembly. This ensures that contigs are only placed one time and that created scaffolds are validated by both enzymes.
--aggressiveenables the sequence similarity scaffolding. In a first phase, BiSCoT will search similarities between contigs based on label mappings. If this parameter is set, BiSCoT will search for sequence similarity to close gaps created by the first step.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.