Find barcode in long reads
Project description
FBILR: Find Barcode In Long Reads
Description
FBILR is designed to find the best-matched barcode on reads and report detailed information (orientation, location, and edit distance). Since the barcode is likely to be located at one of the ends of the read (head or tail), and the read length is longer than 1,000 bp, FBILR restricts the search range to within 200bp of both ends to reduce the amount of computation and save time. Besides, FBILR is able to run in parallel and the order of output is accordant to input.
Usage
The usage of FBILR is shown below:
find_barcodes.py -t 8 -w 200 -m matrix.txt -s summary.txt barcodes.fasta reads.fastq.gz
After that, you can visualize the result with the following command:
plot_barcode_detail.py -m matrix.txt -p outdir/out
Output files
The find_barcode.py script will output 2 files (matrix.txt
and summary.txt
).
The matrix.txt
is a tab-delimited file that consists of 8 columns (shown as follows). In this file, one row corresponds to one read in the input FASTQ file. Each read can find an optimal barcode, even though the edit distance is large (edit distance represents the difference between barcode sequence and reference sequence, including mismatch, insertion, and deletion of bases).
column 1: read name
column 2: read length
column 3: barcode name
column 4: barcode orientation (F or R)
column 5: barcode location (H, M or T)
column 6: start in read (0-base, included)
column 7: end in read (0-base, not included)
column 8: edit distance
# Example:
1b2e274b-9da7-4a5f-b40f-e6c36249d825 215 Bar4 R T 172 196 0
ed320d59-77c6-41ba-895d-f4fdba5855f2 249 Bar2 F H 29 53 0
9aa445f6-63b9-44e5-9b9c-43feea216b7a 492 Bar3 F H 36 60 0
3087cbe0-7b00-40ff-837c-4cc59cf7e7ff 280 Bar4 R T 239 263 0
15c53c45-ff43-4374-8716-049495d113aa 345 Bar4 F H 27 50 3
21c0fe8d-1725-42ba-b490-eec2cd6f76b3 408 Bar2 F H 27 51 0
90af744f-1367-493d-84e2-ca2375413e2d 551 Bar8 F H 47 71 0
The summary.txt
is a tab-delimited file that consists of 2 columns (shown ad follows). Whether a barcode exists on a read is determined by edit distance (similarity). If the edit distance is small enough (-e option), we can confidently judge that the barcode exists, otherwise does not exist (unclassified). This file statistics the count of each barcode.
column 1: barcode name
column 2: count
# Example:
Bar1 783154
Bar2 236937
Bar3 1579564
Bar4 1266932
Bar5 2876236
Bar6 1571845
Bar7 1663693
Bar8 5781377
Bar9 720325
unclassified 2578447
Schema
Here, we show the schema of the barcode that exists in the read (100 nt):
In case 1, the barcode exists in the head of the read with 0 edit distance (fully matched).
In case 2, the barcode exists in the middle of the read with 2 edit distance (2 mismatch).
In case 3, the barcode exists in the tail of the read with 3 edit distance (1 mismatch and 2 deletion).
Finally, the bar1 is the best-matched barcode in this read.
Packaging and distributing
python -m build
python3 -m twine upload --repository pypi dist/*
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fbilr-1.0.3.tar.gz
.
File metadata
- Download URL: fbilr-1.0.3.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0b56d9796266a97291acca2e15c2855bd0bfeb640f045f85979cbfd5ee9f441 |
|
MD5 | 01b1929c7bbde3188a525dde6a3076cf |
|
BLAKE2b-256 | c14456869272092807bfeb35edd84d6905ce73c11c855b68240b9b0487cc54c2 |
File details
Details for the file fbilr-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: fbilr-1.0.3-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 792fe1c504dbb90b366290da219813afaadbe1585f4cc120eb92c0b5e4179946 |
|
MD5 | 833bb95c036e860fc7a54fb343926400 |
|
BLAKE2b-256 | 23e437b1e2f953662f6a106dd5fa3ccdb5019160f98c1b56dbda98a7640d3759 |