Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

snailz is a synthetic data generator that models a study of snails in the Pacific Northwest which are growing to unusual size as a result of exposure to pollution. The package generates fully-reproducible datasets of varying sizes and with varying statistical properties, and is intended for classroom use. For example, an instructor can give each learner a unique dataset to analyze, while learners can test their analysis pipelines using datasets they generate themselves.

The Story

Years ago, logging companies dumped toxic waste in a remote region of Vancouver Island. As the containers leaked and the pollution spread, some snails in the region began growing unusually large. Your team is now collecting and analyzing specimens from affected regions to determine if exposure to pollution is responsible.

Usage:

usage: snailz [-h]
              [--defaults]
              [--outdir OUTDIR]
              [--override OVERRIDE [OVERRIDE ...]]
              [--params PARAMS]
              [--profile]

options:
  -h, --help            show this help message and exit
  --defaults            show default parameters as JSON
  --outdir OUTDIR       output directory
  --override OVERRIDE [OVERRIDE ...]
                        name=value parameters to override defaults
  --params PARAMS       specify JSON parameter file
  --profile             enable profiling

See the documentation of the Parameters class for a description of data generation parameters.

Schema

snailz schema

An asterisk beside the name of a field indicates that the value may be missing (i.e., the field may be NULL in the final database).

table field type purpose
grid ident text unique identifier for each survey grid
size int height and width of survey grid in cells
spacing float size of survey grid cell (meters)
lat0 float southernmost latitude of grid (fractional degrees)
lon0 float westernmost longitude of grid (fractional degrees)
grid_cells grid_id text foreign key reference to grid
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
value float pollution measurement in that grid cell
machine ident text unique identifier for each piece of laboratory equipment
name text name of piece of laboratory equipment
person ident text unique identifier for member of staff
family text family name of staff member
personal text personal name of staff member
supervisor_id text* foreign key reference to person's supervisor
rating person_id text foreign key reference to person
machine_id text foreign key reference to machine
certified bool whether person is certified to use machine
assay ident text unique identifier for soil assay
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
person_id text foreign key reference to person who did assay
machine_id text foreign key reference to machine used to do assay
performed* date date that assay was done
assay_readings assay_id text foreign key reference to assay
reading_id int serial number within assay
contents text "C" or "T" showing control or treatment
reading float pollution measurement
species reference text reference genome
susc_locus int location of susceptible locus within genome
susc_base text base that causes significant mutation at that locus
species_loci ident int unique locus serial number
locus int locus where mutation might occur
specimen ident text unique identifier for specimen
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
genome text specimen genome
mass float specimen mass (g)
diameter float specimen diameter (mm)
collected* date when specimen was collected

Colophon

snailz was inspired by the Palmer Penguins dataset and by conversations with Rohan Alexander about his book Telling Stories with Data.

My thanks to everyone who built the tools this project relies on, including:

The snail logo was created by sunar.ko.

Acknowledgments

  • Greg Wilson is a programmer, author, and educator based in Toronto. He was the co-founder and first Executive Director of Software Carpentry and received ACM SIGSOFT's Influential Educator Award in 2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-5.5.4.tar.gz (828.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-5.5.4-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file snailz-5.5.4.tar.gz.

File metadata

  • Download URL: snailz-5.5.4.tar.gz
  • Upload date:
  • Size: 828.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for snailz-5.5.4.tar.gz
Algorithm Hash digest
SHA256 0a9ef62f98bd484cd1fd1b6e259ea5f0d3d994b7dbb39d035a635e131bdb90f2
MD5 61415615e78f9c7b0909e01216f36e3d
BLAKE2b-256 c5ede3d8536f7d668058c7536153582b72221d7dbdbaa02536c7d9a0fe9e8d13

See more details on using hashes here.

File details

Details for the file snailz-5.5.4-py3-none-any.whl.

File metadata

  • Download URL: snailz-5.5.4-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for snailz-5.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dc96fd58b40191391255ab377d91e2971c929c8abc7122a22634399d6c23b0e5
MD5 fef93c39bc063abd4cb215b1505c61d5
BLAKE2b-256 2dae4948ccc8ba2db6f4d2b94719bfa35f3d8f3caa561fbdae4518454f315d72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page