Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

snailz is a synthetic data generator that models a study of snails in the Pacific Northwest which are growing to unusual size as a result of exposure to pollution. The package generates fully-reproducible datasets of varying sizes and with varying statistical properties, and is intended for classroom use. For example, an instructor can give each learner a unique dataset to analyze, while learners can test their analysis pipelines using datasets they generate themselves.

The Story

Years ago, logging companies dumped toxic waste in a remote region of Vancouver Island. As the containers leaked and the pollution spread, some snails in the region began growing unusually large. Your team is now collecting and analyzing specimens from affected regions to determine if exposure to pollution is responsible.

Usage:

usage: snailz [-h]
              [--defaults]
	      [--outdir OUTDIR]
              [--override OVERRIDE [OVERRIDE ...]]
	      [--params PARAMS]
              [--profile]

options:
  -h, --help            show this help message and exit
  --defaults            show default parameters as JSON
  --outdir OUTDIR       output directory
  --override OVERRIDE [OVERRIDE ...]
                        name=value parameters to override defaults
  --params PARAMS       specify JSON parameter file
  --profile             enable profiling

See the documentation of the Parameters class for a description of data generation parameters.

Schema

snailz schema
table field type purpose
grid ident text unique identifier for each survey grid
size int height and width of survey grid in cells
spacing float size of survey grid cell (meters)
lat0 float southernmost latitude of grid (fractional degrees)
lon0 float westernmost longitude of grid (fractional degrees)
grid_cells grid_id text foreign key reference to grid
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
value float pollution measurement in that grid cell
machine ident text unique identifier for each piece of laboratory equipment
name text name of piece of laboratory equipment
person ident text unique identifier for member of staff
family text family name of staff member
personal text personal name of staff member
supervisor_id text* foreign key reference to person's supervisor
rating person_id text foreign key reference to person
machine_id text foreign key reference to machine
certified bool whether person is certified to use machine
assay ident text unique identifier for soil assay
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
person_id text foreign key reference to person who did assay
machine_id text foreign key reference to machine used to do assay
performed date date that assay was done
assay_readings assay_id text foreign key reference to assay
reading_id int serial number within assay
contents text "C" or "T" showing control or treatment
reading float pollution measurement
species reference text reference genome
susc_locus int location of susceptible locus within genome
susc_base text base that causes significant mutation at that locus
species_loci ident int unique locus serial number
locus int locus where mutation might occur
specimen ident text unique identifier for specimen
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
genome text specimen genome
mass float specimen mass (g)
diameter float specimen diameter (mm)
collected date when specimen was collected

Colophon

snailz was inspired by the Palmer Penguins dataset and by conversations with Rohan Alexander about his book Telling Stories with Data.

My thanks to everyone who built the tools this project relies on, including:

The snail logo was created by sunar.ko.

Acknowledgments

  • Greg Wilson is a programmer, author, and educator based in Toronto. He was the co-founder and first Executive Director of Software Carpentry and received ACM SIGSOFT's Influential Educator Award in 2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-5.2.1.tar.gz (798.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-5.2.1-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file snailz-5.2.1.tar.gz.

File metadata

  • Download URL: snailz-5.2.1.tar.gz
  • Upload date:
  • Size: 798.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for snailz-5.2.1.tar.gz
Algorithm Hash digest
SHA256 cbf8f6f0114755cf988113219759229dde5d9b78a42b317ffd96516fa7e480ae
MD5 0fe5fb1e216737e82c7a2a0ccfb6a832
BLAKE2b-256 eb8e16efd5a386e4370a6d9936f4483e39387f267eaeded2cf57084315f9d13e

See more details on using hashes here.

File details

Details for the file snailz-5.2.1-py3-none-any.whl.

File metadata

  • Download URL: snailz-5.2.1-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for snailz-5.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 64d7f39443426c9998d0dccc7fa1a0fbdf68b185e37c6aae4e3e01ea2c0e276a
MD5 b74d7905f51f84537a69c316188dd34f
BLAKE2b-256 f302e1bec4707c5a0368162cf8ed81d7ac3b9afdb8b91b5d302f78c3c342ee42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page