Skip to main content

Juno: read data generator

Project description

Juno: read data generator

Juno have two methods to generate reads fastq.

  1. Download the real fastq submitted to NCBI SRA from the contributors
  2. Simulate the "fake" fastq

If you want to develope genomic tools but has no real data, juno can generate the read fastq for your testing.

Juno is also available as a public online resource: https://juno.hlin.tw

Requirements

  • Linux
  • Python >= 3.6

Installation

Pypi version

https://pypi.org/project/juno/

pip install juno

Intall from source

git clone https://github.com/hunglin59638/juno.git
cd juno
python3 setup.py install

CLI

juno -h 
usage: juno [-h] SUBCOMMAND ...

Juno: read data generator

optional arguments:
  -h, --help  show this help message and exit

subcommands:
  subcommands

  SUBCOMMAND
    sra       Download reads from SRA database
    simulate  Simulating reads by reference genome

Download reads from SRA database

juno sra -a SRR19400588 -o /path/to/directory

Simulate reads fastq

There are two way to simulate read fastq

  1. Input your genome fasta
juno simulate -r /your/genome/fasta -o /path/to/directory --compressed --depth 200
  1. Input RefSeq assembly accession and its genome will be downloaded from NCBI
juno simulate -a GCF_002004995.1 -o /path/to/directory --compressed --depth 200

Tips: depth is greater than 200x is the better parameter for bacteria

Update local NCBI RefSeq assembly summary

juno simulate --update

Python API

Use Case: Update NCBI RefSeq Assembly Summary and get it in local

from juno.data import Assembly
assembly = Assembly()
assembly.update_assembly()
df = assembly.dataframe
df.head()
	assembly_accession	bioproject	biosample	wgs_master	refseq_category	taxid	species_taxid	organism_name	infraspecific_name	isolate	version_status	assembly_level	release_type	genome_rep	seq_rel_date	asm_name	submitter	gbrs_paired_asm
0	GCF_000001215.4	PRJNA164	SAMN02803731		reference genome	7227	7227	Drosophila melanogaster			latest	Chromosome	Major	Full	2014/08/01	Release 6 plus ISO1 MT	The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics	GCA_000001215.4
1	GCF_000001405.40	PRJNA168			reference genome	9606	9606	Homo sapiens			latest	Chromosome	Patch	Full	2022/02/03	GRCh38.p14	Genome Reference Consortium	GCA_000001405.29
2	GCF_000001635.27	PRJNA169			reference genome	10090	10090	Mus musculus			latest	Chromosome	Major	Full	2020/06/24	GRCm39	Genome Reference Consortium	GCA_000001635.9
3	GCF_000001735.4	PRJNA116	SAMN03081427		reference genome	3702	3702	Arabidopsis thaliana	ecotype=Columbia		latest	Chromosome	Minor	Full	2018/03/15	TAIR10.1	The Arabidopsis Information Resource (TAIR)	GCA_000001735.2
4	GCF_000001905.1	PRJNA70973	SAMN02953622	AAGU00000000.3	representative genome	9785	9785	Loxodonta africana		ISIS603380	latest	Scaffold	Major	Full	2009/07/15	Loxafr3.0	Broad Institute	GCA_000001905.1

Use Case: Download assembly genome

from juno.data import Assembly
assembly = Assembly()
genome_path = assembly.download("GCF_002004995.1", "/your/output/directory")

Use Case: Simulate reads from genome reference

from juno.simulator import Simulator
sm = Simulator

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio-juno-1.0.0.tar.gz (27.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bio_juno-1.0.0-py3-none-any.whl (27.9 MB view details)

Uploaded Python 3

File details

Details for the file bio-juno-1.0.0.tar.gz.

File metadata

  • Download URL: bio-juno-1.0.0.tar.gz
  • Upload date:
  • Size: 27.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.5

File hashes

Hashes for bio-juno-1.0.0.tar.gz
Algorithm Hash digest
SHA256 20ccaa920926d691d09b7ce129d0ab52e8e03ca025ff175b86e0c1b01e86155d
MD5 d2ffb3b6ace76bf6fedcd6ebb2c569cd
BLAKE2b-256 aab21d5a2cc5ecfb34aad2ef95aa8b03197e5852f02b4bab8a3f620be1e152d7

See more details on using hashes here.

File details

Details for the file bio_juno-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bio_juno-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.5

File hashes

Hashes for bio_juno-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62f28e12586256bde8416411b1d5616aab15918a7c6b28d9844b4cc93071517b
MD5 0167fc8781692ded9cbbe2e2427e52ff
BLAKE2b-256 73c2f3855e31c6b803e1b907f1d78aedcd4f5ffc987a515f5744f302db2a8368

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page