Juno: read data generator
Project description
Juno: read data generator
Juno have two methods to generate reads fastq.
- Download the real fastq submitted to NCBI SRA from the contributors
- Simulate the "fake" fastq
If you want to develope genomic tools but has no real data, juno can generate the read fastq for your testing.
Juno is also available as a public online resource: https://juno.hlin.tw
Requirements
- Linux
- Python >= 3.6
Installation
Pypi version
https://pypi.org/project/juno/
pip install juno
Intall from source
git clone https://github.com/hunglin59638/juno.git
cd juno
python3 setup.py install
CLI
juno -h
usage: juno [-h] SUBCOMMAND ...
Juno: read data generator
optional arguments:
-h, --help show this help message and exit
subcommands:
subcommands
SUBCOMMAND
sra Download reads from SRA database
simulate Simulating reads by reference genome
Download reads from SRA database
juno sra -a SRR19400588 -o /path/to/directory
Simulate reads fastq
There are two way to simulate read fastq
- Input your genome fasta
juno simulate -r /your/genome/fasta -o /path/to/directory --compressed --depth 200
- Input RefSeq assembly accession and its genome will be downloaded from NCBI
juno simulate -a GCF_002004995.1 -o /path/to/directory --compressed --depth 200
Tips: depth is greater than 200x is the better parameter for bacteria
Update local NCBI RefSeq assembly summary
juno simulate --update
Python API
Use Case: Update NCBI RefSeq Assembly Summary and get it in local
from juno.data import Assembly
assembly = Assembly()
assembly.update_assembly()
df = assembly.dataframe
df.head()
assembly_accession bioproject biosample wgs_master refseq_category taxid species_taxid organism_name infraspecific_name isolate version_status assembly_level release_type genome_rep seq_rel_date asm_name submitter gbrs_paired_asm
0 GCF_000001215.4 PRJNA164 SAMN02803731 reference genome 7227 7227 Drosophila melanogaster latest Chromosome Major Full 2014/08/01 Release 6 plus ISO1 MT The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics GCA_000001215.4
1 GCF_000001405.40 PRJNA168 reference genome 9606 9606 Homo sapiens latest Chromosome Patch Full 2022/02/03 GRCh38.p14 Genome Reference Consortium GCA_000001405.29
2 GCF_000001635.27 PRJNA169 reference genome 10090 10090 Mus musculus latest Chromosome Major Full 2020/06/24 GRCm39 Genome Reference Consortium GCA_000001635.9
3 GCF_000001735.4 PRJNA116 SAMN03081427 reference genome 3702 3702 Arabidopsis thaliana ecotype=Columbia latest Chromosome Minor Full 2018/03/15 TAIR10.1 The Arabidopsis Information Resource (TAIR) GCA_000001735.2
4 GCF_000001905.1 PRJNA70973 SAMN02953622 AAGU00000000.3 representative genome 9785 9785 Loxodonta africana ISIS603380 latest Scaffold Major Full 2009/07/15 Loxafr3.0 Broad Institute GCA_000001905.1
Use Case: Download assembly genome
from juno.data import Assembly
assembly = Assembly()
genome_path = assembly.download("GCF_002004995.1", "/your/output/directory")
Use Case: Simulate reads from genome reference
from juno.simulator import Simulator
sm = Simulator
Citation
- pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive (https://f1000research.com/articles/8-532/v1)
- PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores (https://doi.org/10.1093/bioinformatics/btaa835)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bio-juno-1.0.0.tar.gz.
File metadata
- Download URL: bio-juno-1.0.0.tar.gz
- Upload date:
- Size: 27.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20ccaa920926d691d09b7ce129d0ab52e8e03ca025ff175b86e0c1b01e86155d
|
|
| MD5 |
d2ffb3b6ace76bf6fedcd6ebb2c569cd
|
|
| BLAKE2b-256 |
aab21d5a2cc5ecfb34aad2ef95aa8b03197e5852f02b4bab8a3f620be1e152d7
|
File details
Details for the file bio_juno-1.0.0-py3-none-any.whl.
File metadata
- Download URL: bio_juno-1.0.0-py3-none-any.whl
- Upload date:
- Size: 27.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62f28e12586256bde8416411b1d5616aab15918a7c6b28d9844b4cc93071517b
|
|
| MD5 |
0167fc8781692ded9cbbe2e2427e52ff
|
|
| BLAKE2b-256 |
73c2f3855e31c6b803e1b907f1d78aedcd4f5ffc987a515f5744f302db2a8368
|