Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

These synthetic data generators model genomic analysis of snails in the Pacific Northwest that are growing to unusual size as a result of exposure to pollution.

  • A grid is created to record the pollution levels at a sampling site.
  • One or more specimens are collected from the grid. Each specimen has a genome and a mass.
  • Laboratory staff design and perform assays of those genomes.
  • Each assay is represented by a design file and an assay file.
  • Assay files are mangled to create raw files with formatting glitches.

Usage

  1. Create a fresh Python environment: uv venv
  2. Activate that environment: source .venv/bin/activate
  3. Install dependencies and editable version of package: uv pip install -e '.[dev]'
  4. View available commands: doit list or snailz --help
  5. Regenerate all data in ./tmp using parameters in ./params: doit all
workflow

Parameters

./params contains the parameter files used to control generation of the reference dataset.

  • grid.json
    • depth: integer range of random values in cells
    • seed: RNG seed
    • size: width and height of (square) grid in cells
  • people.json
    • locale: language and region to use for name generation
    • number: number of staff to create
    • seed: RNG seed
  • specimens.json
    • length: genome length in characters
    • max_mass: maximum specimen mass
    • min_mass: minimum specimen mass
    • mut_scale: scaling factor for mutated specimens
    • mutations: number of mutations to introduce
    • number: number of specimens to create
    • seed: RNG seed
  • assays.json
    • baseline: assay response for unmutated specimens
    • end_date: date of final assay
    • mutant: assay response for mutated specimens
    • noise: noise to add to control cells
    • plate_size: width and height of assay plate
    • seed: RNG seed
    • start_date: date of first assay

Note: there are no parameters for assay file mangling.

Data Dictionary

doit all creates these files in tmp using the sample parameters in params:

  • assays/
    • NNNNNN_assay.csv: tidy, consistently-formatted CSV file with assay result.
    • NNNNNN_design.csv: tidy, consistently-formatted CSV file with assay design.
    • NNNNNN_raw.csv: CSV file derived from NNNNNN_assay.csv with randomly-introduced formatting errors.
  • assays.csv: CSV file containing summary of assay metadata with columns.
    • ident: assay identifier (integer).
    • specimen_id: specimen identifier (text).
    • performed: assay date (date).
    • performed_by: person identifier (text).
  • assays.json: all assay data in JSON format.
  • grid.csv: CSV file containing pollution grid values.
    • This file is a matrix of values with no column IDs or row IDs.
  • grid.json: grid data as JSON.
  • people.csv: CSV file describing experimental staff members.
    • ident: person identifier (text)
    • personal: personal name (text)
    • family: family name (text)
  • people.json: staff member data in JSON format.
  • specimens.csv: CSV file containing details of snail specimens.
    • ident: specimen identifier (text)
    • x: X coordinate of collection cell (integer)
    • y: Y coordinate of collection cell (integer)
    • genome: base sequence (text)
    • mass: snail mass (real)
  • specimens.json: specimen data in JSON format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-0.2.2.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-0.2.2-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file snailz-0.2.2.tar.gz.

File metadata

  • Download URL: snailz-0.2.2.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for snailz-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d29cea04e04f4f1c08ca46efdb96a690d717d7e773e851a62124a6ca095de248
MD5 4609aceba0422a703caaabef9baf337c
BLAKE2b-256 f590313b13d1fd892742c331036721f4f490d92b66bd803d9865b913a7e50930

See more details on using hashes here.

File details

Details for the file snailz-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: snailz-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for snailz-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 67056eb9128041b921961b623d74b2b8e5773501b4b8bfac03ca9d59b071ad9e
MD5 b73d197d31b7f1bc6bec2344b456b087
BLAKE2b-256 3bd34a4ae0a3f26c0baac116e3835df22caf108c193c5241be6b86d9c2275e66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page