Skip to main content

Random generation of genetic files

Project description

Genetic data files generator for testing purposes

biophony is a package for generating random genetic data files intended specifically for testing and validation. Real genetic data is often too large, lacks flexibility, or raises privacy concerns, making it unsuitable for thorough testing. biophony makes it simpler to test software in different scenarios without needing real data, enabling focused and efficient development and validation.

Installation

biophony requires at least Python 3.11 to work.

To install with pip, run:

pip install biophony

Usage

Command Line Interfaces

biophony provides the following CLIs to generate data:

  • gen-cov: generates a BED file with custom depth,
  • gen-fasta: generates a FASTA file with a custom size sequence,
  • gen-fastavar: generates a FASTA file with custom size sequences, each with n variants with control over insertion, deletion and mutation rate,
  • gen-fastq: generates a FASTQ file with custom read count and size,
  • gen-vcf: generates a VCF file from a FASTA file, with control over insertion, deletion and mutation rate.

CLIs that read and / or write data do it on stdin and stdout by default, thus permitting to chain operations with the pipe operator |.

For exemple, run the following command to generate a VCF with 2% SNP, 1% INS and 1% DEL:

gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01

To save the generated content, you can either use the regular output operator > to redirect stdout to a file or use the dedicated option:

gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 > test.vcf  # redirect
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 -o test.vcf  # dedicated option

Python API

You can also use the Python API to generate random genetic data files in your scripts.

Link to the Python API documentation: https://cnrgh.gitlab.io/databases/biophony/.

NEWS

1.2.0 - 2024-11-18

  • Provide type annotations by adding a py.typed marker file.
  • Export MutSim class to allow generation of VCF files in scripts.
  • Generate documentation with sphinx and sphinx-autoapi .
  • Improve README.md.

1.1.0 - 2024-08-10

  • Allow to output gzipped files with gen-fasta and gen-fastq.
  • New gen-fastq script to generate FASTQ files.

1.0.1 - 2024-05-30

  • gen-vcf: correct usage of file path to pass to mutation-simulator.
  • gen-fasta: disable header tag line (comment line) by default.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biophony-1.2.0.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

biophony-1.2.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file biophony-1.2.0.tar.gz.

File metadata

  • Download URL: biophony-1.2.0.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for biophony-1.2.0.tar.gz
Algorithm Hash digest
SHA256 f49fa8b9813ebdd55750cb49f533322ae268402daf8de79c5693cec2442296f7
MD5 4867a80d958acbbafffd8159c52a9aec
BLAKE2b-256 8f5af4fcc3fcc086c2df2b0a670bbb465743fa08063aaf9c7727c57869d5ea8f

See more details on using hashes here.

File details

Details for the file biophony-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: biophony-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for biophony-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 991c8ed20681fc5e1f09dff07f57f4d470f8a871b0b3e7475538265dce413ba2
MD5 f3cf64399b1ab69514bbf28fbdb3b33a
BLAKE2b-256 9ca1d7c14e2ef823a112633396737bb358823e1ced1eecca12b23de85176a24c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page