Random generation of genetic files
Project description
Genetic data files generator for testing purposes
biophony
is a package for generating random genetic data files intended specifically for testing and validation.
Real genetic data is often too large, lacks flexibility, or raises privacy concerns, making it
unsuitable for thorough testing.
biophony
makes it simpler to test software in different scenarios without needing real data,
enabling focused and efficient development and validation.
Installation
biophony
requires at least Python 3.11 to work.
To install with pip
, run:
pip install biophony
Usage
Command Line Interfaces
biophony
provides the following CLIs to generate data:
gen-cov
: generates a BED file with custom depth,gen-fasta
: generates a FASTA file with a custom size sequence,gen-fastavar
: generates a FASTA file with custom size sequences, each withn
variants with control over insertion, deletion and mutation rate,gen-fastq
: generates a FASTQ file with custom read count and size,gen-vcf
: generates a VCF file from a FASTA file, with control over insertion, deletion and mutation rate.
CLIs that read and / or write data do it on stdin
and stdout
by default,
thus permitting to chain operations with the pipe operator |
.
For exemple, run the following command to generate a VCF with 2% SNP, 1% INS and 1% DEL:
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01
To save the generated content, you can either use the regular output operator >
to redirect stdout
to a file or
use the dedicated option:
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 > test.vcf # redirect
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 -o test.vcf # dedicated option
Python API
You can also use the Python API to generate random genetic data files in your scripts.
Link to the Python API documentation: https://cnrgh.gitlab.io/databases/biophony/.
NEWS
1.2.0 - 2024-11-18
- Provide type annotations by adding a
py.typed
marker file. - Export
MutSim
class to allow generation of VCF files in scripts. - Generate documentation with
sphinx
andsphinx-autoapi
. - Improve
README.md
.
1.1.0 - 2024-08-10
- Allow to output gzipped files with gen-fasta and gen-fastq.
- New gen-fastq script to generate FASTQ files.
1.0.1 - 2024-05-30
- gen-vcf: correct usage of file path to pass to mutation-simulator.
- gen-fasta: disable header tag line (comment line) by default.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file biophony-1.2.0.tar.gz
.
File metadata
- Download URL: biophony-1.2.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f49fa8b9813ebdd55750cb49f533322ae268402daf8de79c5693cec2442296f7 |
|
MD5 | 4867a80d958acbbafffd8159c52a9aec |
|
BLAKE2b-256 | 8f5af4fcc3fcc086c2df2b0a670bbb465743fa08063aaf9c7727c57869d5ea8f |
File details
Details for the file biophony-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: biophony-1.2.0-py3-none-any.whl
- Upload date:
- Size: 37.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 991c8ed20681fc5e1f09dff07f57f4d470f8a871b0b3e7475538265dce413ba2 |
|
MD5 | f3cf64399b1ab69514bbf28fbdb3b33a |
|
BLAKE2b-256 | 9ca1d7c14e2ef823a112633396737bb358823e1ced1eecca12b23de85176a24c |