This is a sampling simulator
Project description
samplingsimulatorpy
samplingsimulatorpy
is a Python package intended to assist those teaching or learning basic statistical inference.
Authors
Name | GitHub |
---|---|
Holly Williams | hwilliams10 |
Lise Braaten | lisebraaten |
Tao Guo | tguo9 |
Yue (Alex) Jiang | YueJiangMDSV |
Overview
This package allows users to generate virtual populations which can be sampled from in order to compare and contrast sample vs sampling distributions for different sample sizes. The package also allows users to sample from the generated virtual population (or any other population), plot the distributions, and view summaries for the parameters of interest.
Installation:
pip install -i https://test.pypi.org/simple/ samplingsimulatorpy
Function Descriptions
generate_virtual_pop
creates a virtual population.- Inputs : distribution function (i.e.
np.random.lognormal
,np.random.binomial
, etc), the parameters required by the distribution function, and the size of the population. - Outputs: the virtual population as a tibble
- Inputs : distribution function (i.e.
draw_samples
generates samples of different sizes- Inputs : population to sample from, the sample size, and the number of samples
- Outputs: returns a tibble with the sample number in one column and value in a second column.
plot_sample_hist
creates sample distributions for different sample sizes.- Inputs : population to sample from, the samples to plot, and a vector of the sample sizes
- Outputs: returns a grid of sample distribution plots
plot_sampling_dist
creates sampling distributions for different sample sizes.- Inputs : population to sample from, the samples to plot, and a vector of the sample sizes
- Outputs: returns a grid of sampling distribution plots
stat_summary
: returns a summary of the statistical parameters of interest- Inputs: population, samples, parameter(s) of interest
- Outputs: summary tibble
How do these fit into the Python ecosystem?
To the best of our knowledge, there is currently no existing Python package with the specific functionality to create virtual populations and make the specific sample and sampling distributions described above. We do make use of many existing Python packages and expand on them to make very specific functions. These include:
scipy.stats
to get distribution functionsnp.random
to generate random samples- Altair to create plots
Python pandas
already includes some summary statistics functions such as .describe()
, however our package will be more customizable. Our summary will only include the statistical parameters of interest and will provide a comparison between the sample, sampling, and true population parameters.
Dependencies
- python = "^3.7"
- pandas = "^1.0.1"
- numpy = "^1.18.1"
- altair = "^4.0.1"
Usage
generate_virtual_pop
from samplingsimulatorpy import generate_virtual_pop
generate_virtual_pop(size, distribution_func, *para)
Arguments:
size
: The number of samplesdistribution_func
: The distribution that we are generating samples from*para
: The arguments required for the distribution function
Example:
pop = generate_virtual_pop(100, np.random.normal, 0, 1)
draw_samples
from samplingsimulatorpy import draw_samples
draw_samples(pop, reps, n_s)
Arguments:
pop
the virtual population as a data framereps
the number of replication for each sample size as an integer valuen_s
the sample size for each one of the samples as a list
Example:
samples = draw_samples(pop, 3, [5, 10, 15, 20])
plot_sample_hist
from samplingsimulatorpy import plot_sample_hist
plot_sample_hist(pop, samples)
Arguments:
pop
the virtual population as a data framesamples
the samples as a data frame
Example:
plot_sample_hist(samples)
plot_sampling_hist
from samplingsimulatorpy import plot_sampling_hist
plot_sampling_hist(pop, samples)
Arguments:
samples
the samples as a data frame
Example:
plot_sampling_hist(samples)
stat_summary
from samplingsimulatorpy import stat_summary
plot_sampling_hist(pop, samples, parameter)
Arguments
population
The virtual populationsamples
The drawed samplesparameter
The parameter(s) of interest
Example
stat_summary(pop, samples, ['np.mean', 'np.std'])
Example Usage Scenario
from samplingsimulatorpy import generate_virtual_pop,
draw_samples,
plot_sample_dist,
plot_sampling_dist,
stat_summary
# create virtual population
pop = generate_virtual_pop(100, np.random.normal, 0, 1)
# take samples
samples = draw_samples(pop, 3, [10, 20])
# plot sample histogram
plot_sample_hist(pop, samples)
# plot sampling distribution
plot_sampling_hist(samples)
# compare mean and standard deviation
stat_summary(pop, samples, ['np.mean', 'np.std'])
Documentation
The official documentation is hosted on Read the Docs: https://samplingsimulatorpy.readthedocs.io/en/latest/
Credits
This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file samplingsimulatorpy-0.1.0.tar.gz
.
File metadata
- Download URL: samplingsimulatorpy-0.1.0.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4099c717507edabf8cd96b1bc3f5f929548488e1e82003fb3eefe7b80db86f1 |
|
MD5 | c50393420e16cedf90fccf9b85580f0d |
|
BLAKE2b-256 | c737b2b5865760c3a254f21f587122f591976c37344d981c1da95b013626e346 |
File details
Details for the file samplingsimulatorpy-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: samplingsimulatorpy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c176593ea7ac6d8e112bc8de15284ceb3dfeefd52cd1b0945b12ec59af3f6f64 |
|
MD5 | a137447e19da687a927fca0c3f21f66b |
|
BLAKE2b-256 | 227c20052b9bbc794a0d7013cb6f669a9a4bafea884866ea5f3fb0c08f1e1540 |