Skip to main content

Synthetic data generator for snail mutation survey

Project description

Snailz

snail logo

snailz is a synthetic data generator that models a study of snails in the Pacific Northwest which are growing to unusual size as a result of exposure to pollution. The package generates fully-reproducible datasets of varying sizes and with varying statistical properties, and is intended for classroom use. For example, an instructor can give each learner a unique dataset to analyze, while learners can test their analysis pipelines using datasets they generate themselves.

The Story

Years ago, logging companies dumped toxic waste in a remote region of Vancouver Island. As the containers leaked and the pollution spread, some snails in the region began growing unusually large. Your team is now collecting and analyzing specimens from affected regions to determine if exposure to pollution is responsible.

Usage:

usage: snailz [-h]
              [--defaults]
	      [--outdir OUTDIR]
              [--override OVERRIDE [OVERRIDE ...]]
	      [--params PARAMS]
              [--profile]

options:
  -h, --help            show this help message and exit
  --defaults            show default parameters as JSON
  --outdir OUTDIR       output directory
  --override OVERRIDE [OVERRIDE ...]
                        name=value parameters to override defaults
  --params PARAMS       specify JSON parameter file
  --profile             enable profiling

Schema

snailz schema
table field type purpose
grid ident text unique identifier for each survey grid
size int height and width of survey grid in cells
spacing float size of survey grid cell (meters)
lat0 float southernmost latitude of grid (fractional degrees)
lon0 float westernmost longitude of grid (fractional degrees)
grid_cells grid_id text foreign key reference to grid
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
value float pollution measurement in that grid cell
machine ident text unique identifier for each piece of laboratory equipment
name text name of piece of laboratory equipment
person ident text unique identifier for member of staff
family text family name of staff member
personal text personal name of staff member
supervisor_id text* foreign key reference to person's supervisor
rating person_id text foreign key reference to person
machine_id text foreign key reference to machine
certified bool whether person is certified to use machine
assay ident text unique identifier for soil assay
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
person_id text foreign key reference to person who did assay
machine_id text foreign key reference to machine used to do assay
performed date date that assay was done
assay_readings assay_id text foreign key reference to assay
reading_id int serial number within assay
contents text "C" or "T" showing control or treatment
reading float pollution measurement
species reference text reference genome
susc_locus int location of susceptible locus within genome
susc_base text base that causes significant mutation at that locus
species_loci ident int unique locus serial number
locus int locus where mutation might occur
specimen ident text unique identifier for specimen
lat float foreign key reference to grid cell
lon float foreign key reference to grid cell
genome text specimen genome
mass float specimen mass (g)
diameter float specimen diameter (mm)
collected date when specimen was collected

Colophon

snailz was inspired by the Palmer Penguins dataset and by conversations with Rohan Alexander about his book Telling Stories with Data.

My thanks to everyone who built the tools this project relies on, including:

The snail logo was created by sunar.ko.

Acknowledgments

  • Greg Wilson is a programmer, author, and educator based in Toronto. He was the co-founder and first Executive Director of Software Carpentry and received ACM SIGSOFT's Influential Educator Award in 2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snailz-5.2.0.tar.gz (790.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snailz-5.2.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file snailz-5.2.0.tar.gz.

File metadata

  • Download URL: snailz-5.2.0.tar.gz
  • Upload date:
  • Size: 790.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for snailz-5.2.0.tar.gz
Algorithm Hash digest
SHA256 8e9f030ff48dc56ded9c2a8242e3dcaac3d61eae430c92b66537014497dede5b
MD5 551f74f40b8b095af4de21cc494e785e
BLAKE2b-256 afda57a292184359e01089663fc509dff692e9003d7c898f46b419ae9417908c

See more details on using hashes here.

File details

Details for the file snailz-5.2.0-py3-none-any.whl.

File metadata

  • Download URL: snailz-5.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for snailz-5.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 646e6114d7ef4e050c505775d72c13ce8fc1b1f9a1a5b486f8b381f7af08cd33
MD5 df05a879c22b72d6df3ebae1ad916e1f
BLAKE2b-256 62869ffb79a0b5e6d145d3e2967c8c5e2b2a90009f52ebe1f65769616d9d1936

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page