Skip to main content

A emoji based bioinformatics command line tool

Project description

Example

FASTQ with Emoji = FASTQE ๐Ÿค”

Read one or more FASTQ files, fastqe will compute quality stats for each file and print those stats as emoji... for some reason.

Given a fastq file in Illumina 1.8+/Sanger format, calculate the mean (rounded) score for each position and print a corresponding emoji!

Example

https://fastqe.com/

Install

Latest release versions of fastqe are available via pip or BioConda:

pip install fastqe

conda install -c bioconda fastqe

Development

Development version can be isntall from this repository in the master branch.

Usage

fastqe can display usage information on the command line via the -h or --help argument:

usage: fastqe [-h] [--minlen N] [--scale] [--version] [--mean]
              [--custom CUSTOM_DICT] [--bin] [--noemoji] [--min] [--max]
              [--output OUTPUT_FILE] [--long READ_LENGTH] [--log LOG_FILE]
              [FASTQ_FILE [FASTQ_FILE ...]]

Read one or more FASTQ files, compute quality stats for each file, print as
emoji... for some reason.๐Ÿ˜„

positional arguments:
  FASTQ_FILE            Input FASTQ files

optional arguments:
  -h, --help            show this help message and exit
  --minlen N            Minimum length sequence to include in stats (default
                        0)
  --scale               show relevant scale in output
  --version             show program's version number and exit
  --mean                show mean quality per position (DEFAULT)
  --custom CUSTOM_DICT  use a mapping of custom emoji to quality in
                        CUSTOM_DICT (๐Ÿ๐ŸŒด)
  --bin                 use binned scores (๐Ÿšซ๐Ÿ’€๐Ÿ’ฉโš ๏ธ๐Ÿ˜„๐Ÿ˜†๐Ÿ˜Ž๐Ÿ˜)
  --noemoji             use mapping without emoji (โ–โ–‚โ–ƒโ–„โ–…โ–†โ–‡โ–ˆ)
  --min                 show minimum quality per position
  --max                 show maximum quality per position
  --output OUTPUT_FILE  write output to OUTPUT_FILE instead of stdout
  --long READ_LENGTH    enable long reads up to READ_LENGTH bp long
  --log LOG_FILE        record program progress in LOG_FILE

Convert

fastqe will summarise FASTQ files to display the max, mean and minumum quality using emoji. To convert a file into this format, rather than summarise, you can use the companion program biomojify that will convert both sequence and quality information to emoji:

$ cat test.fq
@ Sequence
GTGCCAGCCGCCGCGGTAGTCCGACGTGGC
+
GGGGGGGGGGGGGGGGGGGGGG!@#$%&%(
$ biomojify fastq test.fq
โ–ถ๏ธ  Sequence
๐Ÿ‡๐Ÿ…๐Ÿ‡๐ŸŒฝ๐ŸŒฝ๐Ÿฅ‘๐Ÿ‡๐ŸŒฝ๐ŸŒฝ๐Ÿ‡๐ŸŒฝ๐ŸŒฝ๐Ÿ‡๐ŸŒฝ๐Ÿ‡๐Ÿ‡๐Ÿ…๐Ÿฅ‘๐Ÿ‡๐Ÿ…๐ŸŒฝ๐ŸŒฝ๐Ÿ‡๐Ÿฅ‘๐ŸŒฝ๐Ÿ‡๐Ÿ…๐Ÿ‡๐Ÿ‡๐ŸŒฝ
๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿšซ๐Ÿ˜„๐Ÿ‘บ๐Ÿ’”๐Ÿ™…๐Ÿ‘พ๐Ÿ™…๐Ÿ’€

Intall with pip install biomojify, and see the biomojify page for more information: https://github.com/fastqe/biomojify/

Quickstart

fastqe test.fastq

fastqe --min test.fastq

fastqe --max test.fastq

fastqe --max -min -bin test.fastq

Teaching Materials

Command line and NGS Introduction

This lesson introduces NGS process in the command line using by using the results of FASTQE before and after quality filerting using fastp:

https://qubeshub.org/publications/1092/2

Rachael St. Jacques, Max Maza, Sabrina Robertson, Guoqing Lu, Andrew Lonsdale, Ray A Enke (2019).
A Fun Introductory Command Line Exercise: Next Generation Sequencing Quality Analysis with Emoji!.
NIBLSE Incubator: Intro to Command Line Coding Genomics Analysis, (Version 2.0).
QUBES Educational Resources. doi:10.25334/Q4D172

Galaxy

A Galaxy wrapper is available from the IUC toolshed. Contact your Galaxy Admin if you would like to have it installed. A Galaxy Tutorial using FASTQE is in development.

FASTQE in Galaxy

History

FASTQE started out as part of PyCon Au presentations:

Responsive image

Versions

  • version 0.0.1 at PyCon Au 2016:
    • Mean position per read
  • version 0.0.2 at PyconAu 2017:
    • update emoji map
    • Max and minimum scores per position added
    • Wrapper code based on early version of Bionitio added
    • prepare for PyPi
  • version 0.1.0 July 2018
    • clean up code
    • add binning
  • version 0.2.6 July 2020
    • refactor code
    • add long read support with --long
    • add --noemoji for block-based output on systems that don't support emoji
    • add --custom for user-defined mapping to emoji
    • add --output to redirect to file instead of stdout
    • add gzip support
    • add redirect from stdin support
    • fix bug of dropping position if some sequences are only 0 quality
  • Galaxy Wrapper created July 2020
  • biomojify created July 2020
  • version 0.2.7 2021
    • bugfix
  • version 0.3.1 2023
    • HTML reporting for Galaxy
  • version 0.3.3 2024
    • Update emoji that render in default fonts with alternatives

Limitations

  • Reads up to 500bp only Read length above 500bp allowed but must be set by user with --long MAX_LENGTH
  • Same emoji for all scores above 41

Licence

This program is released as open source software under the terms of BSD License

Dependencies

  • pyemojify
  • BioPython
  • NumPy

Roadmap

  • Rearrange emoji to use more realistic ranges (i.e > 60 use uncommon emoji) and remove inconsistencies
  • Add conversion to emoji sequence format, with/without binning, for compressed fastq data fits into https://github.com/fastqe/biomojify/
  • Rewrite conversion to standalone function for use in iPython etc.
  • Teaching resources
  • Test data and unit tests
  • Add FASTA mode for nucleotide and proteins emoji see https://github.com/fastqe/biomojify/
  • MultiQC plugin
  • Galaxy Wrapper: available form the IUC toolshed

Rather convert to emoji than summarise? We've just started biomojify for that: https://github.com/fastqe/biomojify/

Contributors

  • Andrew Lonsdale
  • Bjรถrn Grรผning
  • Catherine Bromhead
  • Clare Sloggett
  • Clarissa Womack
  • Helena Rasche
  • Maria Doyle
  • Michael Franklin
  • Nicola Soranzo
  • Phil Ewels

Scale

Use the --scale option to include in output.

0 ! ๐Ÿšซ
1 " โŒ
2 # ๐Ÿ‘บ
3 $ ๐Ÿ’”
4 % ๐Ÿ™…
5 & ๐Ÿ‘พ
6 ' ๐Ÿ‘ฟ
7 ( ๐Ÿ’€
8 ) ๐Ÿ‘ป
9 * ๐Ÿ™ˆ
10 + ๐Ÿ™‰
11 , ๐Ÿ™Š
12 - ๐Ÿต
13 . ๐Ÿ˜ฟ
14 / ๐Ÿ˜พ
15 0 ๐Ÿ™€
16 1 ๐Ÿ’ฃ
17 2 ๐Ÿ”ฅ
18 3 ๐Ÿ˜ก
19 4 ๐Ÿ’ฉ
20 5 ๐Ÿšจ
21 6 ๐Ÿ˜€
22 7 ๐Ÿ˜…
23 8 ๐Ÿ˜
24 9 ๐Ÿ˜Š
25 : ๐Ÿ˜™
26 ; ๐Ÿ˜—
27 < ๐Ÿ˜š
28 = ๐Ÿ˜ƒ
29 > ๐Ÿ˜˜
30 ? ๐Ÿ˜†
31 @ ๐Ÿ˜„
32 A ๐Ÿ˜‹
33 B ๐Ÿ˜„
34 C ๐Ÿ˜
35 D ๐Ÿ˜›
36 E ๐Ÿ˜œ
37 F ๐Ÿ˜‰
38 G ๐Ÿ˜
39 H ๐Ÿ˜„
40 I ๐Ÿ˜Ž
41 J ๐Ÿ˜

Binned scale:

0 ! ๐Ÿšซ
1 " ๐Ÿšซ
2 # ๐Ÿ’€
3 $ ๐Ÿ’€
4 % ๐Ÿ’€
5 & ๐Ÿ’€
6 ' ๐Ÿ’€
7 ( ๐Ÿ’€
8 ) ๐Ÿ’€
9 * ๐Ÿ’€
10 + ๐Ÿ’ฉ
11 , ๐Ÿ’ฉ
12 - ๐Ÿ’ฉ
13 . ๐Ÿ’ฉ
14 / ๐Ÿ’ฉ
15 0 ๐Ÿ’ฉ
16 1 ๐Ÿ’ฉ
17 2 ๐Ÿ’ฉ
18 3 ๐Ÿ’ฉ
19 4 ๐Ÿ’ฉ
20 5 ๐Ÿšจ
21 6 ๐Ÿšจ
22 7 ๐Ÿšจ
23 8 ๐Ÿšจ
24 9 ๐Ÿšจ
25 : ๐Ÿ˜„
26 ; ๐Ÿ˜„
27 < ๐Ÿ˜„
28 = ๐Ÿ˜„
29 > ๐Ÿ˜„
30 ? ๐Ÿ˜†
31 @ ๐Ÿ˜†
32 A ๐Ÿ˜†
33 B ๐Ÿ˜†
34 C ๐Ÿ˜†
35 D ๐Ÿ˜Ž
36 E ๐Ÿ˜Ž
37 F ๐Ÿ˜Ž
38 G ๐Ÿ˜Ž
39 H ๐Ÿ˜Ž
40 I ๐Ÿ˜
41 J ๐Ÿ˜

Custom

Use a dictionary of Pyemojify mappings in a text file instead of built in emoji choices:

{
'#': ':no_entry_sign:',
'\"': ':x:',
'!': ':japanese_goblin:',
'$': ':broken_heart:'
}

Emoji characters can also be used directlty instead (experimental):

{
'#': ':no_entry_sign:',
'\"': ':x:',
'!': '๐Ÿ‘ฟ',
'$': ':broken_heart:'
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastqe-0.3.3.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

fastqe-0.3.3-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file fastqe-0.3.3.tar.gz.

File metadata

  • Download URL: fastqe-0.3.3.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for fastqe-0.3.3.tar.gz
Algorithm Hash digest
SHA256 6ddcef4a9d25e22d7391ea6f6234181f3ec22c0767d07c90cb2c66deb4ec3881
MD5 cc282d5aae9ca8e8287d0b621d96a5d3
BLAKE2b-256 7c7bd748e7e174a6dd6e3001bdd118a2c27208e75715ce26e3cad206c5183d24

See more details on using hashes here.

File details

Details for the file fastqe-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: fastqe-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for fastqe-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 11a3568fdd1416c2f5ee27b57a5daa1bcfe2d58a37d87edd07a2880797efb963
MD5 5ec1cc47eab586d08fd469d34495aa98
BLAKE2b-256 0326e75fe9d149e4e0c26b9bd0a29d6a09bbf6b98f8459216d27c7f19a417699

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page