Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

These synthetic data generators model genomic analysis of snails in the Pacific Northwest that are growing to unusual size as a result of exposure to pollution.

  • A grid is created to record the pollution levels at a sampling site.
  • One or more specimens are collected from the grid. Each specimen has a genome and a mass.
  • Laboratory staff design and perform assays of those genomes.
  • Each assay is represented by a design file and an assay file.
  • Assay files are mangled to create raw files with formatting glitches.

Usage

  1. Create a fresh Python environment: uv venv
  2. Activate that environment: source .venv/bin/activate
  3. Install dependencies and editable version of package: uv pip install -e '.[dev]'
  4. View available commands: doit list or snailz --help
  5. Regenerate all data in ./tmp using parameters in ./params: doit all
workflow

Parameters

./params contains the parameter files used to control generation of the reference dataset.

  • grid.json
    • depth: integer range of random values in cells
    • seed: RNG seed
    • size: width and height of (square) grid in cells
  • people.json
    • locale: language and region to use for name generation
    • number: number of staff to create
    • seed: RNG seed
  • specimens.json
    • length: genome length in characters
    • max_mass: maximum specimen mass
    • min_mass: minimum specimen mass
    • mut_scale: scaling factor for mutated specimens
    • mutations: number of mutations to introduce
    • number: number of specimens to create
    • seed: RNG seed
  • assays.json
    • baseline: assay response for unmutated specimens
    • end_date: date of final assay
    • mutant: assay response for mutated specimens
    • noise: noise to add to control cells
    • plate_size: width and height of assay plate
    • seed: RNG seed
    • start_date: date of first assay

Note: there are no parameters for assay file mangling.

Data Dictionary

doit all creates these files in tmp using the sample parameters in params:

  • assays/
    • NNNNNN_assay.csv: tidy, consistently-formatted CSV file with assay result.
    • NNNNNN_design.csv: tidy, consistently-formatted CSV file with assay design.
    • NNNNNN_raw.csv: CSV file derived from NNNNNN_assay.csv with randomly-introduced formatting errors.
  • assays.csv: CSV file containing summary of assay metadata with columns.
    • ident: assay identifier (integer).
    • specimen_id: specimen identifier (text).
    • performed: assay date (date).
    • performed_by: person identifier (text).
  • assays.json: all assay data in JSON format.
  • grid.csv: CSV file containing pollution grid values.
    • This file is a matrix of values with no column IDs or row IDs.
  • grid.json: grid data as JSON.
  • people.csv: CSV file describing experimental staff members.
    • ident: person identifier (text)
    • personal: personal name (text)
    • family: family name (text)
  • people.json: staff member data in JSON format.
  • specimens.csv: CSV file containing details of snail specimens.
    • ident: specimen identifier (text)
    • x: X coordinate of collection cell (integer)
    • y: Y coordinate of collection cell (integer)
    • genome: base sequence (text)
    • mass: snail mass (real)
  • specimens.json: specimen data in JSON format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-0.2.1.tar.gz (735.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-0.2.1-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file snailz-0.2.1.tar.gz.

File metadata

  • Download URL: snailz-0.2.1.tar.gz
  • Upload date:
  • Size: 735.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for snailz-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b85476f041628b1d55b353d8bd6a53c52d8ac6ff39929ffe4fcc2f864b1bcb42
MD5 4d031c80cab0b61dc2f67b01a55a1c99
BLAKE2b-256 38b39e560160b0e3dda8b00549988715904d511bcb7dbbc147eb62d5361ab21b

See more details on using hashes here.

File details

Details for the file snailz-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: snailz-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for snailz-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 39b49be20ba551c9172abecd426996c94cc137bcb9667e619e3ff287299538a3
MD5 f92008ce622352e48dbae409def3f458
BLAKE2b-256 cf943ab0c50f88f4680bc1785cb59ff2604cca59967932ea00cad00753a394e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page