A package for de-barcoding and error correction of sequencing data containing molecular barcodes.

Project description

A sample config file is provided in /debarcer/config/sample_config.ini, and a sample prepfile is provided in /debarcer/config/library_prep_types.ini. Thresholds and paths that you expect to use multiple times should be stored in the config file. The prepfile contains instructions for how different library preps should be handled, for example:

; One library entry
    Number of unprocessed fastq files (1-3).
    Number of fastq files after reheadering (1-2).
    Comma-separated indices of reads containing a UMI (1-3).
    Comma-separated lengths of UMIs corresponding to UMI_LOCS (1-100).
    TRUE if a spacer is present, FALSE otherwise (TRUE/FALSE).
  • SPACER_SEQ (optional):
    Base sequence of the spacer ([A,C,G,T]+).

LIBRARY_NAME would then be the --prepname argument passed to

Typical Workflow

## Preprocess some fastq files
$ python preprocess -o /path/to/output_dir -r1 /path/to/read1.fastq -r2 /path/to/read2.fastq
  -prepname "prepname" -prepfile /path/to/library_prep_types.ini

## Align, sort, index
## ...
## produces: bam_file.bam, bam_file.bam.bai

## Error-correct and group UMIs into families
$ python group -r chrN:posA-posB -c /path/to/config.ini -b /path/to/bam_file.bam
  -o /path/to/output_dir

## Perform base collapsing
$ python collapse -o /path/to/output_dir -r chrN:posA-posB
  -b /path/to/bam_file.bam -u /path/to/umi_file.umis
  -c /path/to/config.ini

## Call variants of specified family sizes
$ python call -o /path/to/output_dir -r chrN:posA-posB
  -cf /path/to/cons_file.cons -f 1,2,5 -c path/to/config.ini


Debarcer was tested using Python 3.6.4 and depends on the packages pysam and pandas. See requirements.txt.

