Skip to main content

Random generation of genetic files

Project description

Genetic data files generator for testing purposes

biophony is a package for generating random genetic data files intended specifically for testing and validation. Real genetic data is often too large, lacks flexibility, or raises privacy concerns, making it unsuitable for thorough testing. biophony makes it simpler to test software in different scenarios without needing real data, enabling focused and efficient development and validation.

Installation

biophony requires at least Python 3.11 to work.

To install with pip, run:

pip install biophony

Usage

Command Line Interfaces

biophony provides the following CLIs to generate data:

  • gen-cov: generates a BED file with custom depth,
  • gen-fasta: generates a FASTA file with a custom size sequence,
  • gen-fastavar: generates a FASTA file with custom size sequences, each with n variants with control over insertion, deletion and mutation rate,
  • gen-fastq: generates a FASTQ file with custom read count and size,
  • gen-vcf: generates a VCF file from a FASTA file, with control over insertion, deletion and mutation rate.

CLIs that read and / or write data do it on stdin and stdout by default, thus permitting to chain operations with the pipe operator |.

For exemple, run the following command to generate a VCF with 2% SNP, 1% INS and 1% DEL:

gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01

To save the generated content, you can either use the regular output operator > to redirect stdout to a file or use the dedicated option:

gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 > test.vcf  # redirect
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 -o test.vcf  # dedicated option

Python API

You can also use the Python API to generate random genetic data files in your scripts.

Link to the Python API documentation: https://cnrgh.gitlab.io/databases/biophony/.

NEWS

1.2.1 - 2024-11-18

  • Fix Invalid cross-device link error that occurred with gen-vcf, in environments where /tmp and the VCF output parent directory (--output-vcf option) were not on the same partition / filesystem. This error was occurring in the GitLab CI pipeline.

1.2.0 - 2024-11-18

  • Provide type annotations by adding a py.typed marker file.
  • Export MutSim class to allow generation of VCF files in scripts.
  • Generate documentation with sphinx and sphinx-autoapi.
  • Improve README.md.

1.1.0 - 2024-08-10

  • Allow to output gzipped files with gen-fasta and gen-fastq.
  • New gen-fastq script to generate FASTQ files.

1.0.1 - 2024-05-30

  • gen-vcf: correct usage of file path to pass to mutation-simulator.
  • gen-fasta: disable header tag line (comment line) by default.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biophony-1.2.1.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

biophony-1.2.1-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file biophony-1.2.1.tar.gz.

File metadata

  • Download URL: biophony-1.2.1.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for biophony-1.2.1.tar.gz
Algorithm Hash digest
SHA256 27f6d2f8c9cc120a56212127361debce5234a4889d43bbca41f258da30935c42
MD5 9724902f2a2939b6cdc4a711036566b0
BLAKE2b-256 ff5eb3120f32f3cc0d7104247bdcba7cfe5e7992dbf0416a1d391f09df40c0d8

See more details on using hashes here.

File details

Details for the file biophony-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: biophony-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for biophony-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7863c7df0d685e22e6394d44c5738361ca04a33548ad69d424e382296762586b
MD5 34cf68e4982ea6aa1ff88e0579ccc6e7
BLAKE2b-256 bd178361bd4491368277115fcaa6c24da4fe17fd9417a974a2a3525f1970e73e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page