Synthetic data generator for snail mutation survey
Project description
Snailz
These synthetic data generators model genomic analysis of snails in the Pacific Northwest that are growing to unusual size as a result of exposure to pollution.
- A grid is created to record the pollution levels at a sampling site.
- One or more specimens are collected from the grid. Each specimen has a genome and a mass.
- Laboratory staff design and perform assays of those genomes.
- Each assay is represented by a design file and an assay file.
- Assay files are mangled to create raw files with formatting glitches.
Usage
- Create a fresh Python environment:
uv venv - Activate that environment:
source .venv/bin/activate - Install dependencies and editable version of package:
uv pip install -e '.[dev]' - View available commands:
doit listorsnailz --help - Regenerate all data in
./tmpusing parameters in./params:doit all
Parameters
./params contains the parameter files used to control generation of the reference dataset.
grid.jsondepth: integer range of random values in cellsseed: RNG seedsize: width and height of (square) grid in cells
people.jsonlocale: language and region to use for name generationnumber: number of staff to createseed: RNG seed
specimens.jsonlength: genome length in charactersmax_mass: maximum specimen massmin_mass: minimum specimen massmut_scale: scaling factor for mutated specimensmutations: number of mutations to introducenumber: number of specimens to createseed: RNG seed
assays.jsonbaseline: assay response for unmutated specimensend_date: date of final assaymutant: assay response for mutated specimensnoise: noise to add to control cellsplate_size: width and height of assay plateseed: RNG seedstart_date: date of first assay
Note: there are no parameters for assay file mangling.
Data Dictionary
doit all creates these files in tmp using the sample parameters in params:
assays/NNNNNN_assay.csv: tidy, consistently-formatted CSV file with assay result.NNNNNN_design.csv: tidy, consistently-formatted CSV file with assay design.NNNNNN_raw.csv: CSV file derived fromNNNNNN_assay.csvwith randomly-introduced formatting errors.
assays.csv: CSV file containing summary of assay metadata with columns.ident: assay identifier (integer).specimen_id: specimen identifier (text).performed: assay date (date).performed_by: person identifier (text).
assays.json: all assay data in JSON format.grid.csv: CSV file containing pollution grid values.- This file is a matrix of values with no column IDs or row IDs.
grid.json: grid data as JSON.people.csv: CSV file describing experimental staff members.ident: person identifier (text)personal: personal name (text)family: family name (text)
people.json: staff member data in JSON format.specimens.csv: CSV file containing details of snail specimens.ident: specimen identifier (text)x: X coordinate of collection cell (integer)y: Y coordinate of collection cell (integer)genome: base sequence (text)mass: snail mass (real)
specimens.json: specimen data in JSON format.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snailz-0.2.0.tar.gz.
File metadata
- Download URL: snailz-0.2.0.tar.gz
- Upload date:
- Size: 716.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21c04f1f7f5bbf11b8738fe488409c56f55ac761d19f4a06a4666c2b0c79c2c3
|
|
| MD5 |
d1fa023127ddbea481d1ae893437fce4
|
|
| BLAKE2b-256 |
0312cf2ce55439ab787a3d0afda13e14dd7e75d7b94cb8487d30325f5e50383f
|
File details
Details for the file snailz-0.2.0-py3-none-any.whl.
File metadata
- Download URL: snailz-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b8981c7a0b37db8c11173d7292f2c30c7149e222ec9d651a774a8887b6d38e1
|
|
| MD5 |
c2232907abee71f817919b5cf9259b1f
|
|
| BLAKE2b-256 |
c5f5340117fd9929d2c75e612caf91ba88a87895c142c15dfc1f13a77c8291d3
|