Skip to main content

Turn noise to read

Project description

./logo/logo.svg Documentation Status

Turn ‘noise’ to signal: accurately rectify millions of erroneous short reads through graph learning on edit distances

noise2read, originated in a computable rule translated from PCR erring mechanism that: a rare read is erroneous if it has a neighboring read of high abundance, turns erroneous reads into their original state without bringing up any non-existing sequences into the short read set(&lt 300bp) including DNA and RNA sequencing (DNA/RNA-seq), small RNA, unique molecular identifiers (UMI) and amplicon sequencing data.

Click noise2read to jump to its documentation

Note: All the experimental results obtained in this study utilised version 0.2.7 of noise2read.

Quick-run example

Quick-run example for testing noise2read by setting only 1 trial for Optuna and 10 estimators for xGboost which are not the parameters used in our paper.

Please refer to QuickStart or Installation.

  • Clone the codes with datasets in github

git clone https://github.com/Jappy0/noise2read
cd noise2read/Examples/simulated_miRNAs
  • Quick-run testing noise2read on D14

    • with high ambiguous errors correction and using GPU for training (running about 4 mins with 26 cores and GPU)

    noise2read -m correction -c ../../config/quick_test.ini -a True -g gpu_hist

Examples for correcting simulated miRNAs data with mimic UMIs by noise2read

Take data sets D14 and D16 as examples.

Please refer to QuickStart or Installation.

  • Clone the codes with datasets in github

git clone https://github.com/Jappy0/noise2read
cd noise2read/Examples/simulated_miRNAs
  • Reproduce the evaluation results for D14 and D16 from raw, true and corrected datasets

noise2read -m evaluation -i ./raw/D14_umi_miRNA_mix.fa -t ./true/D14_umi_miRNA_mix.fa -r ./correct/D14_umi_miRNA_mix.fasta -d ./D14
noise2read -m evaluation -i ./raw/D16_umi_miRNA_subs.fa -t ./true/D16_umi_miRNA_subs.fa -r ./correct/D16_umi_miRNA_subs.fasta -d ./D16
  • correcting D14

    • with high ambiguous errors correction and using GPU for training

    noise2read -m correction -c ./configs/D14.ini
    • without high ambiguous errors correction and using GPU for training

    noise2read -m correction -c ./configs/D14_without_high.ini
  • correcting D16

    • with high ambiguous errors correction and using GPU for training

    noise2read -m correction -c ./configs/D16.ini
    • without high ambiguous errors correction and using GPU for training

    noise2read -m correction -c ./configs/D16_without_high.ini
  • Expected Results

Please find the expected log files and correction results at the folder noise2read of benchmark for correcting data sets of D14-D16. The results under noise2read and noise2read-1 represent the corrected results with and without high ambiguous errors’ prediction, respectively.

Note: Noise2read may produce slightly different corrected result from these results under Examples/simulated_miRNAs/correct and correction. This is because the easy-usable and automatic tuning of the classifiers’ parameters facilitates wide-range explorations, different best models are obtained for each training, but the final prediction results are stable within a certain range. We have discussed this in the Discussion section of our paper.

Examples for correcting outcome sequence of ABEs and CBEs by noise2read

  • Clone the codes

git clone https://github.com/Jappy0/noise2read
cd noise2read/CaseStudies
mkdir ABEs_CBEs
cd ABEs_CBEs
  • Download datasets under the folder of data of D32_D33.

  • Using noise2read to correct the datasets. The running time of each experiment is about 13 minutes using 26 cores and GPU for training.

noise2read -m correction -i ./data/D32_ABE_outcome_seqs.fasta -a False -d ./ABE/
noise2read -m correction -i ./data/D33_CBE_outcome_seqs.fasta -a False -d ./CBE/
  • Expected Results

Please find the expected log files and correction results at the folder D32_D33. The results for correcting D32 and D33 are presented under the folders of ABE and CBE, respectively.

Note: Noise2read may produce slightly different corrected result from these under D32_ABE and D33_CBE of D32_D33. This is because the easy-usable and automatic tuning of the classifiers’ parameters facilitates wide-range explorations, different best models are obtained for each training, but the final prediction results are stable within a certain range. We have discussed this in the Discussion section of our paper.

More examples for reproducing our experiments in this paper can be found at the Examples of the documentation

Feel free to contact me if you have any questions on running noise2read or are interested in noise2read.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noise2read-0.3.0.tar.gz (74.1 kB view details)

Uploaded Source

Built Distribution

noise2read-0.3.0-py3-none-any.whl (77.9 kB view details)

Uploaded Python 3

File details

Details for the file noise2read-0.3.0.tar.gz.

File metadata

  • Download URL: noise2read-0.3.0.tar.gz
  • Upload date:
  • Size: 74.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for noise2read-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d259167a34cf99776cc3ad968c2d0aafb720b781854b09e33737c2927f34a713
MD5 3f1141136d505899bcccded5e0bf33d5
BLAKE2b-256 ab3e38814699a8ce686572d6e9ac309f7de6444467742e4d0f9d9d912db4046f

See more details on using hashes here.

File details

Details for the file noise2read-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: noise2read-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 77.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for noise2read-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35dd593ecfc367ff67109189877950151687db5c173ce204f084812252f6783c
MD5 93ec2cbb36139a8aeaaadfc38e06b0d0
BLAKE2b-256 47c60fb90097cd5e6fa03f564864349e5d5a7eae67b4b2d71c3e9878ca733264

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page