Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

These synthetic data generators model genomic analysis of snails in the Pacific Northwest that are growing to unusual size as a result of exposure to pollution.

  • A grid is created to record the pollution levels at a sampling site.
  • One or more specimens are collected from the grid. Each specimen has a genome and a mass.
  • Laboratory staff design and perform assays of those genomes.
  • Each assay is represented by a design file and an assay file.
  • Assay files are mangled to create raw files with formatting glitches.

Usage

  1. Create a fresh Python environment: uv venv
  2. Activate that environment: source .venv/bin/activate
  3. Install dependencies and editable version of package: uv pip install -e '.[dev]'
  4. View available commands: doit list or snailz --help
  5. Regenerate all data in ./tmp using parameters in ./params: doit all
workflow

Parameters

./params contains the parameter files used to control generation of the reference dataset.

  • grid.json
    • depth: integer range of random values in cells
    • seed: RNG seed
    • size: width and height of (square) grid in cells
  • people.json
    • locale: language and region to use for name generation
    • number: number of staff to create
    • seed: RNG seed
  • specimens.json
    • length: genome length in characters
    • max_mass: maximum specimen mass
    • min_mass: minimum specimen mass
    • mut_scale: scaling factor for mutated specimens
    • mutations: number of mutations to introduce
    • number: number of specimens to create
    • seed: RNG seed
  • assays.json
    • baseline: assay response for unmutated specimens
    • end_date: date of final assay
    • mutant: assay response for mutated specimens
    • noise: noise to add to control cells
    • plate_size: width and height of assay plate
    • seed: RNG seed
    • start_date: date of first assay

Note: there are no parameters for assay file mangling.

Data Dictionary

doit all creates these files in tmp using the sample parameters in params:

  • assays/
    • NNNNNN_assay.csv: tidy, consistently-formatted CSV file with assay result.
    • NNNNNN_design.csv: tidy, consistently-formatted CSV file with assay design.
    • NNNNNN_raw.csv: CSV file derived from NNNNNN_assay.csv with randomly-introduced formatting errors.
  • assays.csv: CSV file containing summary of assay metadata with columns.
    • ident: assay identifier (integer).
    • specimen_id: specimen identifier (text).
    • performed: assay date (date).
    • performed_by: person identifier (text).
  • assays.json: all assay data in JSON format.
  • grid.csv: CSV file containing pollution grid values.
    • This file is a matrix of values with no column IDs or row IDs.
  • grid.json: grid data as JSON.
  • people.csv: CSV file describing experimental staff members.
    • ident: person identifier (text)
    • personal: personal name (text)
    • family: family name (text)
  • people.json: staff member data in JSON format.
  • specimens.csv: CSV file containing details of snail specimens.
    • ident: specimen identifier (text)
    • x: X coordinate of collection cell (integer)
    • y: Y coordinate of collection cell (integer)
    • genome: base sequence (text)
    • mass: snail mass (real)
  • specimens.json: specimen data in JSON format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-0.2.0.tar.gz (716.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-0.2.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file snailz-0.2.0.tar.gz.

File metadata

  • Download URL: snailz-0.2.0.tar.gz
  • Upload date:
  • Size: 716.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for snailz-0.2.0.tar.gz
Algorithm Hash digest
SHA256 21c04f1f7f5bbf11b8738fe488409c56f55ac761d19f4a06a4666c2b0c79c2c3
MD5 d1fa023127ddbea481d1ae893437fce4
BLAKE2b-256 0312cf2ce55439ab787a3d0afda13e14dd7e75d7b94cb8487d30325f5e50383f

See more details on using hashes here.

File details

Details for the file snailz-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: snailz-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for snailz-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b8981c7a0b37db8c11173d7292f2c30c7149e222ec9d651a774a8887b6d38e1
MD5 c2232907abee71f817919b5cf9259b1f
BLAKE2b-256 c5f5340117fd9929d2c75e612caf91ba88a87895c142c15dfc1f13a77c8291d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page