Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

snailz is a synthetic data generator that models a study of snails in the Pacific Northwest which are growing to unusual size as a result of exposure to pollution. The package generates fully-reproducible datasets of varying sizes and with varying statistical properties, and is intended for classroom use. For example, an instructor can give each learner a unique dataset to analyze, while learners can test their analysis pipelines using datasets they generate themselves.

The Story

Years ago, logging companies dumped toxic waste in a remote region of Vancouver Island. As the containers leaked and the pollution spread, some snails in the region began growing unusually large. Your team is now collecting and analyzing specimens from affected regions to determine if exposure to pollution is responsible.

Usage:

usage: snailz [-h]
              [--defaults]
	      [--outdir OUTDIR]
              [--override OVERRIDE [OVERRIDE ...]]
	      [--params PARAMS]
              [--profile]

options:
  -h, --help            show this help message and exit
  --defaults            show default parameters as JSON
  --outdir OUTDIR       output directory
  --override OVERRIDE [OVERRIDE ...]
                        name=value parameters to override defaults
  --params PARAMS       specify JSON parameter file
  --profile             enable profiling

See the documentation of the Parameters class for a description of data generation parameters.

Schema

snailz schema

An asterisk beside the name of a field indicates that the value may be missing (i.e., the field may be NULL in the final database).

table field type purpose
grid ident text unique identifier for each survey grid
size int height and width of survey grid in cells
spacing float size of survey grid cell (meters)
lat0 float southernmost latitude of grid (fractional degrees)
lon0 float westernmost longitude of grid (fractional degrees)
grid_cells grid_id text foreign key reference to grid
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
value float pollution measurement in that grid cell
machine ident text unique identifier for each piece of laboratory equipment
name text name of piece of laboratory equipment
person ident text unique identifier for member of staff
family text family name of staff member
personal text personal name of staff member
supervisor_id text* foreign key reference to person's supervisor
rating person_id text foreign key reference to person
machine_id text foreign key reference to machine
certified bool whether person is certified to use machine
assay ident text unique identifier for soil assay
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
person_id text foreign key reference to person who did assay
machine_id text foreign key reference to machine used to do assay
performed* date date that assay was done
assay_readings assay_id text foreign key reference to assay
reading_id int serial number within assay
contents text "C" or "T" showing control or treatment
reading float pollution measurement
species reference text reference genome
susc_locus int location of susceptible locus within genome
susc_base text base that causes significant mutation at that locus
species_loci ident int unique locus serial number
locus int locus where mutation might occur
specimen ident text unique identifier for specimen
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
genome text specimen genome
mass float specimen mass (g)
diameter float specimen diameter (mm)
collected* date when specimen was collected

Colophon

snailz was inspired by the Palmer Penguins dataset and by conversations with Rohan Alexander about his book Telling Stories with Data.

My thanks to everyone who built the tools this project relies on, including:

The snail logo was created by sunar.ko.

Acknowledgments

  • Greg Wilson is a programmer, author, and educator based in Toronto. He was the co-founder and first Executive Director of Software Carpentry and received ACM SIGSOFT's Influential Educator Award in 2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-5.4.0.tar.gz (816.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-5.4.0-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file snailz-5.4.0.tar.gz.

File metadata

  • Download URL: snailz-5.4.0.tar.gz
  • Upload date:
  • Size: 816.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for snailz-5.4.0.tar.gz
Algorithm Hash digest
SHA256 18c77796082bcea8375204cf0f8db96a4b193466dbf1b050d55f6c36ae879b0e
MD5 648906eefd99bc9c2f2fcb3174792041
BLAKE2b-256 b9896c5819911f76ddb1221ca2b3aa6289a7f8745fb908e5bfe4ba6ee8952451

See more details on using hashes here.

File details

Details for the file snailz-5.4.0-py3-none-any.whl.

File metadata

  • Download URL: snailz-5.4.0-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for snailz-5.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e82610700b6ba9b1aacbf194c021b880f4e2b5934782c813b1a494fe96d8dbc3
MD5 9c6d447ecb17b452ddb8a593e4536abc
BLAKE2b-256 661fb05f8a62aa9c6d0f356c48adbcafcdeebc001bd97ff017988e3be622a2e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page