Skip to main content

Find barcode in long reads

Project description

FBILR: Find Barcode In Long Reads

Description

FBILR is designed to find the best-matched barcode on reads and report detailed information (orientation, location, and edit distance). Since the barcode is likely to be located at one of the ends of the read (head or tail), and the read length is longer than 1,000 bp, FBILR restricts the search range to within 200bp of both ends to reduce the amount of computation and save time. Besides, FBILR is able to run in parallel and the order of output is accordant to input.

Usage

The usage of FBILR is shown below:

find_barcodes.py -t 8 -w 200 -m matrix.txt -s summary.txt barcodes.fasta reads.fastq.gz    

After that, you can visualize the result with the following command:

plot_barcode_detail.py -m matrix.txt -p outdir/out

Output files

The find_barcode.py script will output 2 files (matrix.txt and summary.txt).

The matrix.txt is a tab-delimited file that consists of 8 columns (shown as follows). In this file, one row corresponds to one read in the input FASTQ file. Each read can find an optimal barcode, even though the edit distance is large (edit distance represents the difference between barcode sequence and reference sequence, including mismatch, insertion, and deletion of bases).

column 1: read name
column 2: read length
column 3: barcode name
column 4: barcode orientation (F or R)
column 5: barcode location (H, M or T)
column 6: start in read (0-base, included)
column 7: end in read (0-base, not included)
column 8: edit distance

# Example:
1b2e274b-9da7-4a5f-b40f-e6c36249d825    215     Bar4    R       T       172     196     0
ed320d59-77c6-41ba-895d-f4fdba5855f2    249     Bar2    F       H       29      53      0
9aa445f6-63b9-44e5-9b9c-43feea216b7a    492     Bar3    F       H       36      60      0
3087cbe0-7b00-40ff-837c-4cc59cf7e7ff    280     Bar4    R       T       239     263     0
15c53c45-ff43-4374-8716-049495d113aa    345     Bar4    F       H       27      50      3
21c0fe8d-1725-42ba-b490-eec2cd6f76b3    408     Bar2    F       H       27      51      0
90af744f-1367-493d-84e2-ca2375413e2d    551     Bar8    F       H       47      71      0

The summary.txt is a tab-delimited file that consists of 2 columns (shown ad follows). Whether a barcode exists on a read is determined by edit distance (similarity). If the edit distance is small enough (-e option), we can confidently judge that the barcode exists, otherwise does not exist (unclassified). This file statistics the count of each barcode.

column 1: barcode name
column 2: count

# Example: 
Bar1	783154
Bar2	236937
Bar3	1579564
Bar4	1266932
Bar5	2876236
Bar6	1571845
Bar7	1663693
Bar8	5781377
Bar9	720325
unclassified	2578447

Schema

Here, we show the schema of the barcode that exists in the read (100 nt):

Schema

In case 1, the barcode exists in the head of the read with 0 edit distance (fully matched).

In case 2, the barcode exists in the middle of the read with 2 edit distance (2 mismatch).

In case 3, the barcode exists in the tail of the read with 3 edit distance (1 mismatch and 2 deletion).

Finally, the bar1 is the best-matched barcode in this read.

Packaging and distributing

python -m build
python3 -m twine upload --repository pypi dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fbilr-1.0.3.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

fbilr-1.0.3-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file fbilr-1.0.3.tar.gz.

File metadata

  • Download URL: fbilr-1.0.3.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for fbilr-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f0b56d9796266a97291acca2e15c2855bd0bfeb640f045f85979cbfd5ee9f441
MD5 01b1929c7bbde3188a525dde6a3076cf
BLAKE2b-256 c14456869272092807bfeb35edd84d6905ce73c11c855b68240b9b0487cc54c2

See more details on using hashes here.

File details

Details for the file fbilr-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: fbilr-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for fbilr-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 792fe1c504dbb90b366290da219813afaadbe1585f4cc120eb92c0b5e4179946
MD5 833bb95c036e860fc7a54fb343926400
BLAKE2b-256 23e437b1e2f953662f6a106dd5fa3ccdb5019160f98c1b56dbda98a7640d3759

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page