A emoji based bioinformatics command line tool
Project description
FASTQ with Emoji = FASTQE ๐ค
Read one or more FASTQ files, fastqe will compute quality stats for each file and print those stats as emoji... for some reason.
Given a fastq file in Illumina 1.8+/Sanger format, calculate the mean (rounded) score for each position and print a corresponding emoji!
Install
Latest release versions of fastqe
are available via pip
or BioConda:
pip install fastqe
conda install -c bioconda fastqe
Development
Development version can be isntall from this repository in the master
branch.
Usage
fastqe
can display usage information on the command line via the -h
or --help
argument:
usage: fastqe [-h] [--minlen N] [--scale] [--version] [--mean]
[--custom CUSTOM_DICT] [--bin] [--noemoji] [--min] [--max]
[--output OUTPUT_FILE] [--long READ_LENGTH] [--log LOG_FILE]
[FASTQ_FILE [FASTQ_FILE ...]]
Read one or more FASTQ files, compute quality stats for each file, print as
emoji... for some reason.๐
positional arguments:
FASTQ_FILE Input FASTQ files
optional arguments:
-h, --help show this help message and exit
--minlen N Minimum length sequence to include in stats (default
0)
--scale show relevant scale in output
--version show program's version number and exit
--mean show mean quality per position (DEFAULT)
--custom CUSTOM_DICT use a mapping of custom emoji to quality in
CUSTOM_DICT (๐๐ด)
--bin use binned scores (๐ซ๐๐ฉโ ๏ธ๐๐๐๐)
--noemoji use mapping without emoji (โโโโโ
โโโ)
--min show minimum quality per position
--max show maximum quality per position
--output OUTPUT_FILE write output to OUTPUT_FILE instead of stdout
--long READ_LENGTH enable long reads up to READ_LENGTH bp long
--log LOG_FILE record program progress in LOG_FILE
Convert
fastqe
will summarise FASTQ files to display the max, mean and minumum quality using emoji. To convert a file into this format, rather than summarise, you can use the companion program biomojify
that will convert both sequence and quality information to emoji:
$ cat test.fq
@ Sequence
GTGCCAGCCGCCGCGGTAGTCCGACGTGGC
+
GGGGGGGGGGGGGGGGGGGGGG!@#$%&%(
$ biomojify fastq test.fq
โถ๏ธ Sequence
๐๐
๐๐ฝ๐ฝ๐ฅ๐๐ฝ๐ฝ๐๐ฝ๐ฝ๐๐ฝ๐๐๐
๐ฅ๐๐
๐ฝ๐ฝ๐๐ฅ๐ฝ๐๐
๐๐๐ฝ
๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ซ๐๐บ๐๐
๐พ๐
๐
Intall with pip install biomojify
, and see the biomojify
page for more information: https://github.com/fastqe/biomojify/
Quickstart
fastqe test.fastq
fastqe --min test.fastq
fastqe --max test.fastq
fastqe --max -min -bin test.fastq
Teaching Materials
Command line and NGS Introduction
This lesson introduces NGS process in the command line using by using the results of FASTQE before and after quality filerting
using fastp
:
https://qubeshub.org/publications/1092/2
Rachael St. Jacques, Max Maza, Sabrina Robertson, Guoqing Lu, Andrew Lonsdale, Ray A Enke (2019).
A Fun Introductory Command Line Exercise: Next Generation Sequencing Quality Analysis with Emoji!.
NIBLSE Incubator: Intro to Command Line Coding Genomics Analysis, (Version 2.0).
QUBES Educational Resources. doi:10.25334/Q4D172
Galaxy
A Galaxy wrapper is available from the IUC toolshed. Contact your Galaxy Admin if you would like to have it installed. A Galaxy Tutorial using FASTQE is in development.
History
FASTQE started out as part of PyCon Au presentations:
- PyCon Au 2016 - Python for science, side projects and stuff!
- PyCon Au 2017 - Lightning Talk
- BCC 2020 - Short Presentaion
Versions
- version 0.0.1 at PyCon Au 2016:
- Mean position per read
- version 0.0.2 at PyconAu 2017:
- update emoji map
- Max and minimum scores per position added
- Wrapper code based on early version of Bionitio added
- prepare for PyPi
- version 0.1.0 July 2018
- clean up code
- add binning
- version 0.2.6 July 2020
- refactor code
- add long read support with --long
- add --noemoji for block-based output on systems that don't support emoji
- add --custom for user-defined mapping to emoji
- add --output to redirect to file instead of stdout
- add gzip support
- add redirect from stdin support
- fix bug of dropping position if some sequences are only 0 quality
- Galaxy Wrapper created July 2020
biomojify
created July 2020- version 0.2.7 2021
- bugfix
- version 0.3.1 2023
- HTML reporting for Galaxy
- version 0.3.3 2024
- Update emoji that render in default fonts with alternatives
Limitations
Reads up to 500bp onlyRead length above 500bp allowed but must be set by user with--long MAX_LENGTH
- Same emoji for all scores above 41
Licence
This program is released as open source software under the terms of BSD License
Dependencies
- pyemojify
- BioPython
- NumPy
Roadmap
- Rearrange emoji to use more realistic ranges (i.e > 60 use uncommon emoji) and remove inconsistencies
-
Add conversion to emoji sequence format, with/without binning, for compressed fastq datafits into https://github.com/fastqe/biomojify/ - Rewrite conversion to standalone function for use in iPython etc.
- Teaching resources
- Test data and unit tests
-
Add FASTA mode for nucleotide and proteins emojisee https://github.com/fastqe/biomojify/ - MultiQC plugin
-
Galaxy Wrapper: available form the IUC toolshed
Rather convert to emoji than summarise? We've just started biomojify
for that: https://github.com/fastqe/biomojify/
Contributors
- Andrew Lonsdale
- Bjรถrn Grรผning
- Catherine Bromhead
- Clare Sloggett
- Clarissa Womack
- Helena Rasche
- Maria Doyle
- Michael Franklin
- Nicola Soranzo
- Phil Ewels
Scale
Use the --scale
option to include in output.
0 ! ๐ซ
1 " โ
2 # ๐บ
3 $ ๐
4 % ๐
5 & ๐พ
6 ' ๐ฟ
7 ( ๐
8 ) ๐ป
9 * ๐
10 + ๐
11 , ๐
12 - ๐ต
13 . ๐ฟ
14 / ๐พ
15 0 ๐
16 1 ๐ฃ
17 2 ๐ฅ
18 3 ๐ก
19 4 ๐ฉ
20 5 ๐จ
21 6 ๐
22 7 ๐
23 8 ๐
24 9 ๐
25 : ๐
26 ; ๐
27 < ๐
28 = ๐
29 > ๐
30 ? ๐
31 @ ๐
32 A ๐
33 B ๐
34 C ๐
35 D ๐
36 E ๐
37 F ๐
38 G ๐
39 H ๐
40 I ๐
41 J ๐
Binned scale:
0 ! ๐ซ
1 " ๐ซ
2 # ๐
3 $ ๐
4 % ๐
5 & ๐
6 ' ๐
7 ( ๐
8 ) ๐
9 * ๐
10 + ๐ฉ
11 , ๐ฉ
12 - ๐ฉ
13 . ๐ฉ
14 / ๐ฉ
15 0 ๐ฉ
16 1 ๐ฉ
17 2 ๐ฉ
18 3 ๐ฉ
19 4 ๐ฉ
20 5 ๐จ
21 6 ๐จ
22 7 ๐จ
23 8 ๐จ
24 9 ๐จ
25 : ๐
26 ; ๐
27 < ๐
28 = ๐
29 > ๐
30 ? ๐
31 @ ๐
32 A ๐
33 B ๐
34 C ๐
35 D ๐
36 E ๐
37 F ๐
38 G ๐
39 H ๐
40 I ๐
41 J ๐
Custom
Use a dictionary of Pyemojify mappings in a text file instead of built in emoji choices:
{
'#': ':no_entry_sign:',
'\"': ':x:',
'!': ':japanese_goblin:',
'$': ':broken_heart:'
}
Emoji characters can also be used directlty instead (experimental):
{
'#': ':no_entry_sign:',
'\"': ':x:',
'!': '๐ฟ',
'$': ':broken_heart:'
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fastqe-0.3.3.tar.gz
.
File metadata
- Download URL: fastqe-0.3.3.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ddcef4a9d25e22d7391ea6f6234181f3ec22c0767d07c90cb2c66deb4ec3881 |
|
MD5 | cc282d5aae9ca8e8287d0b621d96a5d3 |
|
BLAKE2b-256 | 7c7bd748e7e174a6dd6e3001bdd118a2c27208e75715ce26e3cad206c5183d24 |
File details
Details for the file fastqe-0.3.3-py3-none-any.whl
.
File metadata
- Download URL: fastqe-0.3.3-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11a3568fdd1416c2f5ee27b57a5daa1bcfe2d58a37d87edd07a2880797efb963 |
|
MD5 | 5ec1cc47eab586d08fd469d34495aa98 |
|
BLAKE2b-256 | 0326e75fe9d149e4e0c26b9bd0a29d6a09bbf6b98f8459216d27c7f19a417699 |