Skip to main content

Juno: read data generator

Project description

Juno: read data generator

Juno have two methods to generate reads fastq.

  1. Download the real fastq submitted to NCBI SRA from the contributors
  2. Simulate the "fake" fastq

If you want to develope genomic tools but has no real data, juno can generate the read fastq for your testing.

Juno is also available as a public online resource: https://juno.hlin.tw

Requirements

  • Linux
  • Python >= 3.6

Installation

Pypi version

https://pypi.org/project/juno/

pip install juno

Intall from source

git clone https://github.com/hunglin59638/juno.git
cd juno
python3 setup.py install

CLI

juno -h 
usage: juno [-h] SUBCOMMAND ...

Juno: read data generator

optional arguments:
  -h, --help  show this help message and exit

subcommands:
  subcommands

  SUBCOMMAND
    sra       Download reads from SRA database
    simulate  Simulating reads by reference genome

Download reads from SRA database

juno sra -a SRR19400588 -o /path/to/directory

Simulate reads fastq

There are two way to simulate read fastq

  1. Input your genome fasta
juno simulate -r /your/genome/fasta -o /path/to/directory --compressed --depth 200
  1. Input RefSeq assembly accession and its genome will be downloaded from NCBI
juno simulate -a GCF_002004995.1 -o /path/to/directory --compressed --depth 200

Tips: depth is greater than 200x is the better parameter for bacteria

Update local NCBI RefSeq assembly summary

juno simulate --update

Python API

Use Case: Update NCBI RefSeq Assembly Summary and get it in local

from juno.data import Assembly
assembly = Assembly()
assembly.update_assembly()
df = assembly.dataframe
df.head()
	assembly_accession	bioproject	biosample	wgs_master	refseq_category	taxid	species_taxid	organism_name	infraspecific_name	isolate	version_status	assembly_level	release_type	genome_rep	seq_rel_date	asm_name	submitter	gbrs_paired_asm
0	GCF_000001215.4	PRJNA164	SAMN02803731		reference genome	7227	7227	Drosophila melanogaster			latest	Chromosome	Major	Full	2014/08/01	Release 6 plus ISO1 MT	The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics	GCA_000001215.4
1	GCF_000001405.40	PRJNA168			reference genome	9606	9606	Homo sapiens			latest	Chromosome	Patch	Full	2022/02/03	GRCh38.p14	Genome Reference Consortium	GCA_000001405.29
2	GCF_000001635.27	PRJNA169			reference genome	10090	10090	Mus musculus			latest	Chromosome	Major	Full	2020/06/24	GRCm39	Genome Reference Consortium	GCA_000001635.9
3	GCF_000001735.4	PRJNA116	SAMN03081427		reference genome	3702	3702	Arabidopsis thaliana	ecotype=Columbia		latest	Chromosome	Minor	Full	2018/03/15	TAIR10.1	The Arabidopsis Information Resource (TAIR)	GCA_000001735.2
4	GCF_000001905.1	PRJNA70973	SAMN02953622	AAGU00000000.3	representative genome	9785	9785	Loxodonta africana		ISIS603380	latest	Scaffold	Major	Full	2009/07/15	Loxafr3.0	Broad Institute	GCA_000001905.1

Use Case: Download assembly genome

from juno.data import Assembly
assembly = Assembly()
genome_path = assembly.download("GCF_002004995.1", "/your/output/directory")

Use Case: Simulate reads from genome reference

from juno.simulator import Simulator
sm = Simulator

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio-juno-1.0.0.tar.gz (27.4 MB view hashes)

Uploaded Source

Built Distribution

bio_juno-1.0.0-py3-none-any.whl (27.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page