Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

snailz is a synthetic data generator that models a study of snails in the Pacific Northwest which are growing to unusual size as a result of exposure to pollution. The package generates fully-reproducible datasets of varying sizes and with varying statistical properties, and is intended for classroom use. For example, an instructor can give each learner a unique dataset to analyze, while learners can test their analysis pipelines using datasets they generate themselves.

The Story

Years ago, logging companies dumped toxic waste in a remote region of Vancouver Island. As the containers leaked and the pollution spread, some snails in the region began growing unusually large. Your team is now collecting and analyzing specimens from affected regions to determine if exposure to pollution is responsible.

Usage:

usage: snailz [-h]
              [--defaults]
              [--outdir OUTDIR]
              [--override OVERRIDE [OVERRIDE ...]]
              [--params PARAMS]
              [--profile]

options:
  -h, --help            show this help message and exit
  --defaults            show default parameters as JSON
  --outdir OUTDIR       output directory
  --override OVERRIDE [OVERRIDE ...]
                        name=value parameters to override defaults
  --params PARAMS       specify JSON parameter file
  --profile             enable profiling

See the documentation of the Parameters class for a description of data generation parameters.

Schema

snailz schema

An asterisk beside the name of a field indicates that the value may be missing (i.e., the field may be NULL in the final database).

table field type purpose
grid ident text unique identifier for each survey grid
size int height and width of survey grid in cells
spacing float size of survey grid cell (meters)
lat0 float southernmost latitude of grid (fractional degrees)
lon0 float westernmost longitude of grid (fractional degrees)
grid_cells grid_id text foreign key reference to grid
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
value float pollution measurement in that grid cell
machine ident text unique identifier for each piece of laboratory equipment
name text name of piece of laboratory equipment
person ident text unique identifier for member of staff
family text family name of staff member
personal text personal name of staff member
supervisor_id text* foreign key reference to person's supervisor
rating person_id text foreign key reference to person
machine_id text foreign key reference to machine
certified bool whether person is certified to use machine
assay ident text unique identifier for soil assay
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
person_id text foreign key reference to person who did assay
machine_id text foreign key reference to machine used to do assay
performed* date date that assay was done
assay_readings assay_id text foreign key reference to assay
reading_id int serial number within assay
contents text "C" or "T" showing control or treatment
reading float pollution measurement
species reference text reference genome
susc_locus int location of susceptible locus within genome
susc_base text base that causes significant mutation at that locus
species_loci ident int unique locus serial number
locus int locus where mutation might occur
specimen ident text unique identifier for specimen
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
genome text specimen genome
mass float specimen mass (g)
diameter float specimen diameter (mm)
collected* date when specimen was collected

Colophon

snailz was inspired by the Palmer Penguins dataset and by conversations with Rohan Alexander about his book Telling Stories with Data.

My thanks to everyone who built the tools this project relies on, including:

The snail logo was created by sunar.ko.

Acknowledgments

  • Greg Wilson is a programmer, author, and educator based in Toronto. He was the co-founder and first Executive Director of Software Carpentry and received ACM SIGSOFT's Influential Educator Award in 2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-5.5.3.tar.gz (822.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-5.5.3-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file snailz-5.5.3.tar.gz.

File metadata

  • Download URL: snailz-5.5.3.tar.gz
  • Upload date:
  • Size: 822.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for snailz-5.5.3.tar.gz
Algorithm Hash digest
SHA256 714d556b5fd23225f8cd014a2192f3a5c3dc7e5fc2df425e06bd059c9a42e388
MD5 5895dabfb7d40e82c61184f0f7183003
BLAKE2b-256 326d18e81943184129a7596c513c12ad34a2b88c2cce4c56127f3c8fbb21e868

See more details on using hashes here.

File details

Details for the file snailz-5.5.3-py3-none-any.whl.

File metadata

  • Download URL: snailz-5.5.3-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for snailz-5.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f02d20f6277e416b010d88de69cbfc0af8a195b76420ac624e280341cd780a4e
MD5 1a19a14917116cffc9c070456dc66d84
BLAKE2b-256 bb1e29ff71025ac20b4a17a5f7166fdeb92c39e1f552ca624417ce251ba9a0dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page