Skip to main content

Command line utilities for statistics, odds, and probabilities

Project description

pythodds

PyPI version Python 3.13+ License: MIT Tests codecov

A command-line utility and Python library for calculating statistics, odds, and probabilities.

Features

Binomial Distribution Calculate PMF, CDF, and survival functions for binomial distributions
Bayes' Theorem Compute posterior probabilities from priors, likelihoods, and either direct evidence or a false-positive rate
Birthday Problem Compute collision probabilities for uniform and non-uniform pools, find minimum group sizes, and generate probability tables
Normal Distribution Compute PDF, CDF, survival probabilities, interval probabilities, and the inverse CDF (percent-point function) for a Gaussian N(μ, σ²) distribution
Expected Value Compute E[X], Var(X), SD(X), Shannon entropy, and the moment generating function for discrete probability distributions; supports inline input or CSV/JSON files
Poisson Distribution Compute PMF, CDF, and survival probabilities, find minimum event counts for a target cumulative probability, and generate full probability tables
Prime Numbers Check primality, find the nth prime, count primes up to a limit (π function), list primes in a range, and compute prime factorizations
Streak Probability Compute the probability of at least one consecutive run of successes and the expected length of the longest streak
Pythagorean Record Calculate team winning percentage expectations using Bill James' Pythagorean formula or the SABR linear formula; project in-progress season records and compare actual vs. expected performance
Pearson Correlation Compute Pearson's r, r², t-statistic, p-value, and confidence intervals; test for linear relationships between two continuous variables; supports inline data or CSV input
Spearman Correlation Compute Spearman's ρ (rank correlation), ρ², t-statistic, p-value, and confidence intervals; test for monotonic relationships; robust to outliers and suitable for ordinal data; includes rank display for tie inspection
Linear Regression Perform ordinary least squares (OLS) regression with full statistical inference: coefficients, standard errors, R², F-statistic, t-tests, confidence intervals, and predictions with confidence/prediction intervals
Sample Size Calculator Determine minimum sample sizes for proportion estimation, mean difference detection, and two-proportion comparisons; includes power analysis sweeps
Monte Carlo Simulator Empirically estimate probabilities for binomial, birthday, streak, and Poisson experiments with confidence intervals and analytical comparison
Command-line Interface binom, bayes, birthday, normal, expected, poisson, prime, streak, pythag, pearson, spearman, linreg, sample, and simulate commands
Minimal Dependencies Core calculations use pure Python; Spearman correlation and Monte Carlo simulation use scipy/numpy for numerical robustness

Installation

Install from PyPI:

pip install pythodds

or

uv add pythodds

Or install from source:

git clone https://github.com/ncarsner/pythodds.git
cd pythodds
pip install -e .

Command Line Usage

binom — Binomial Distribution

Computes exact, cumulative, and survival probabilities for a Binomial(n, p) distribution, and renders a color-coded stacked progress bar showing the share of mass below k, at k, and above k.

# Calculate binomial distribution probabilities
binom -n 10 -k 3 -p 0.4

# Specify a target and minimum probability threshold
binom -n 100 -k 30 -p 0.35 --target 40 --min-prob 0.05

Typical output includes a stacked terminal bar like this:

n=10, k=3, p=0.4
P(X = 3):  0.214991 (21.499100%)
P(X <= 3): 0.382281 (38.228100%)
P(X >= 3): 0.832710 (83.271000%)
[stacked ANSI bar for <k | =k | >k]

bayes — Bayes' Theorem Posterior Probability

Computes posterior probability $P(A\mid B)$ from a prior probability, likelihood, and either direct evidence $P(B)$ or a false-positive rate $P(B\mid \neg A)$.

# Medical test example: prevalence 1%, sensitivity 99%, false-positive rate 5%
bayes -p 0.01 -l 0.99 -f 0.05

# Provide evidence directly instead of a false-positive rate
bayes -p 0.2 -l 0.8 -e 0.5

Options:

Flag Long form Description
-p --prior Prior probability $P(A)$, between 0 and 1
-l --likelihood Likelihood $P(B\mid A)$, between 0 and 1
-e --evidence Evidence probability $P(B)$, between 0 and 1
-f --false-positive False-positive rate $P(B\mid \neg A)$, between 0 and 1
-P --precision Decimal places for printed values (default: 6)

-e/--evidence and -f/--false-positive are mutually exclusive; one is required.


prime — Prime Number Operations

Provides tools for working with prime numbers: primality testing, finding the nth prime, counting primes up to a limit (prime counting function π), listing primes in a range, and computing prime factorizations.

# Check if a number is prime
prime -is 97
prime --check 100

# Find the nth prime number (1-indexed)
prime -n 100
prime --nth 10000

# Count primes up to a limit (π function)
prime -C 1000
prime --count 1000000

# List all primes in a range [start, end]
prime -R 50 100
prime --range 1 50

# Compute prime factorization
prime -F 360
prime --factorize 2024

# Output as JSON
prime --check 97 --format json
prime --range 1 100 --format json
prime --factorize 360 --format json

Options:

Flag Long form Description
-is --check Check if N is prime
-n --nth Find the Nth prime number (1-indexed)
-C --count Count primes up to and including LIMIT (π function)
-R --range START END List all primes in the range [START, END] inclusive
-F --factorize Compute prime factorization of N
--format Output format: text (default) or json

Operation flags are mutually exclusive; exactly one is required.

Implementation details:

  • Primality testing: Uses trial division with 6k±1 optimization
  • Nth prime: Uses Sieve of Eratosthenes with prime counting estimation
  • Prime counting: Implements the prime counting function π(n)
  • Range listing: Efficient sieve-based generation
  • Factorization: Trial division returning {prime: exponent} dict

Example outputs:

$ prime --check 97
97 is prime

$ prime --nth 100
The 100th prime number is 541

$ prime --count 100
π(100) = 25 (there are 25 primes  100)

$ prime --factorize 360
Prime factorization of 360:
  360 =  ×  × 5

Factor breakdown:
  2^3 = 8
  3^2 = 9
  5^1 = 5

birthday — Birthday Problem Collision Probability

Computes the probability that at least two items in a group share the same value when drawn from a pool of equally-likely possibilities. Defaults to a pool size of 365.25 (calendar days).

# P(duplicate birthday) in a group of 23 people
birthday -n 23

# Find the minimum group size to reach 50% collision probability
birthday --target-prob 0.50

# Print a probability table for group sizes 1–40
birthday --range 1 40

# Custom pool size (e.g. 7-digit phone numbers)
birthday -p 10_000_000 -n 1180

# Non-uniform pool via relative weights
birthday --group-size 30 --weights 0.10,0.15,0.20,0.30,0.25

# Output as JSON or CSV
birthday --range 1 60 --format <json|csv>

Options:

Flag Long form Description
-p --pool-size Pool size — number of equally-likely outcomes (default: 365.25)
-n --group-size Compute collision probability for exactly this group size
-t --target-prob Find the minimum group size reaching this probability
-r --range MIN MAX Print a probability table for group sizes MIN through MAX
-w --weights Comma-separated relative frequencies for a non-uniform pool
-f --format Output format: table (default), json, or csv
-P --precision Decimal places for printed probabilities (default: 6)

normal — Normal (Gaussian) Distribution

Computes PDF, CDF, survival probabilities, interval probabilities, and the inverse CDF (percent-point function) for a N(μ, σ²) distribution. Uses only the Python standard library.

# PDF, P(X ≤ 1.96), and P(X ≥ 1.96) for the standard normal
normal -x 1.96 -m 0 -s 1

# Same calculation for a custom distribution
normal -x 75 -m 70 -s 5

# P(−1.96 ≤ X ≤ 1.96)
normal --between -1.96 1.96 -m 0 -s 1

# Find the value x such that P(X ≤ x) = 0.975 (inverse CDF)
normal --quantile 0.975 -m 0 -s 1

Options:

Flag Long form Description
-x --value Compute PDF, P(X ≤ x), and P(X ≥ x) for this value
--between LOW HIGH Compute P(LOW ≤ X ≤ HIGH)
-q --quantile Find x such that P(X ≤ x) = P (inverse CDF)
-m --mean Distribution mean μ (default: 0)
-s --std Distribution standard deviation σ (default: 1)
-P --precision Decimal places for printed values (default: 6)

-x/--value, --between, and -q/--quantile are mutually exclusive; one is required.


expected — Expected Value & Discrete Distribution Statistics

Computes E[X], Var(X), SD(X), Shannon entropy, and optionally the moment generating function (MGF) for a discrete probability distribution supplied inline or via a CSV/JSON file.

# E[X] and statistics for a simple discrete distribution
expected --outcomes 0,1,5,10 --probs 0.50,0.25,0.15,0.10

# Non-uniform six-sided die
expected --outcomes 1,2,3,4,5,6 --probs 0.1,0.2,0.3,0.2,0.1,0.1

# Load distribution from a CSV or JSON file
expected --file payouts.csv

# Also compute the MGF at t=0.5
expected --outcomes 0,1 --probs 0.3,0.7 --mgf 0.5

Options:

Flag Long form Description
-o --outcomes Comma-separated outcome values
-f --file CSV or JSON file with outcomes and probabilities
-p --probs Comma-separated probabilities (required with --outcomes)
--mgf T Also compute the moment generating function M_X(t) at t=T
-P --precision Decimal places for printed values (default: 6)

--outcomes and --file are mutually exclusive; one is required
--probs is required when using --outcomes


poisson — Poisson Distribution

Computes PMF, CDF, and survival probabilities for a Poisson(λ) distribution. Models rare, independent events occurring at a known average rate — server errors per hour, calls per minute, defects per batch, and so on.

# P(X=7), P(X≤7), and P(X≥7) for λ=3.0
poisson -l 3.0 -k 7

# Find the minimum k such that P(X ≤ k) >= 0.95
poisson -l 3.0 -t 0.95

# Print a probability table for k = 0 through 15
poisson -l 3.0 -r 0 15

# Also show P(X ≥ 5) and whether it meets a 1% threshold
poisson -l 0.5 -k 2 --target 5 --min-prob 0.01

# Output as JSON or CSV
poisson -l 3.0 -r 0 20 -f json
poisson -l 3.0 -r 0 20 -f csv

Options:

Flag Long form Description
-l --rate Average event rate λ (required, must be > 0)
-k --events Compute PMF and CDF for exactly this event count
-t --target-prob Find the minimum k such that P(X ≤ k) ≥ PROB
-r --range MIN MAX Print a probability table for event counts MIN through MAX
--target With -k: also print P(X ≥ T) for this target count
--min-prob With --target: report whether P(X ≥ T) meets this threshold
-f --format Output format: table (default), json, or csv
-P --precision Decimal places for printed probabilities (default: 6)

streak — Streak / Consecutive Run Probability

Computes the exact probability of at least one run of k consecutive successes in n independent Bernoulli trials, and the expected length of the longest run. Uses dynamic programming for exact O(n·k) computation.

# P(at least one run of 5+ heads in 100 fair coin flips)
streak -n 100 -k 5 -p 0.5

# P(at least one hitting streak of 10+ games over a 162-game season at .320)
streak -n 162 -k 10 -p 0.32

# Expected length of the longest win streak in 50 trials at 40% success rate
streak -n 50 -p 0.40 --longest

Options:

Flag Long form Description
-n --trials Total number of independent trials (required)
-p --prob Success probability per trial, 0–1 (required)
-k --streak-length Compute P(at least one run of K consecutive successes)
--longest Compute E[length of longest run of consecutive successes]
-P --precision Decimal places for printed probabilities (default: 6)

-k/--streak-length and --longest are mutually exclusive; one is required.


pythag — Pythagorean Record / Win Expectation

Calculates expected winning percentage for sports teams based on runs/points scored and allowed. Supports both the traditional Pythagorean formula (Bill James) and the newer linear formula from SABR research (Rothman, 2014). Can project final season records for teams in progress.

# MLB team using linear formula (default)
pythag --scored 800 --allowed 650

# Compare both linear and Pythagorean methods
pythag --scored 800 --allowed 650 --method both

# Use traditional Pythagorean formula with custom exponent
pythag --scored 800 --allowed 650 --method pythagorean --exponent 1.83

# NFL team projection
pythag --scored 420 --allowed 300 --sport nfl

# NBA team projection
pythag --scored 8500 --allowed 8200 --sport nba

# In-progress season: team is 45-37 after 82 games (shows projection)
pythag --scored 550 --allowed 490 --current-wins 45 --games-played 82

Options:

Flag Long form Description
-s --scored Runs/points scored by the team (required)
-a --allowed Runs/points allowed by the team (required)
--sport Sport/league: mlb (default), nfl, or nba
-m --method Calculation method: linear (default), pythagorean, or both
-e --exponent Exponent for Pythagorean formula (default: 2.0; optimal ~1.83 for baseball)
-g --games Total games in season (default: 162 for mlb, 17 for nfl, 82 for nba)
-w --current-wins Current wins (for in-progress season projection)
-p --games-played Games already played (for in-progress season projection)
-P --precision Decimal places for percentages (default: 2)

--current-wins and --games-played must be used together for in-progress projections.

Formulas:

  • Pythagorean (James): EXP(W%) = RS^exp / (RS^exp + RA^exp)
  • Linear (Rothman, 2014):
    • MLB: EXP(W%) = 0.000683(RS - RA) + 0.50
    • NFL: EXP(W%) = 0.001538(PS - PA) + 0.50
    • NBA: EXP(W%) = 0.000351(PS - PA) + 0.50

pearson — Pearson Correlation Coefficient

Computes the Pearson correlation coefficient (r) to measure the linear relationship between two continuous variables. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship. Includes hypothesis testing, p-values, and confidence intervals using Fisher's Z-transformation.

# Compute correlation from command-line values
pearson --x 1,2,3,4,5 --y 2.1,3.8,6.2,7.9,10.1

# Load data from CSV file
pearson --file data.csv --x-col height --y-col weight

# Include hypothesis test at α=0.05 significance level
pearson --x 10,20,30,40,50 --y 15,28,41,55,68 --alpha 0.05

# One-tailed test for positive correlation
pearson --x 1,2,3,4,5 --y 2,4,5,4,5 --alpha 0.05 --sided one

# Custom precision for output
pearson --x 1,2,3,4 --y 2,4,6,8 --precision 4

Options:

Flag Long form Description
--x Comma-separated x values (use with --y)
--y Comma-separated y values (required with --x)
--file CSV file path (use with --x-col and --y-col)
--x-col Column name for x values in CSV (required with --file)
--y-col Column name for y values in CSV (required with --file)
--alpha Significance level for hypothesis test and CI (e.g., 0.05)
--sided Hypothesis test type: one or two (default: two)
-P --precision Decimal places for printed values (default: 6)

--x and --file are mutually exclusive; one is required.
When using --x, must also provide --y.
When using --file, must also provide --x-col and --y-col

Output includes:

  • Pearson's r: Correlation coefficient measuring linear relationship strength
  • : Coefficient of determination (proportion of variance explained)
  • Interpretation: Qualitative description of correlation strength and direction
  • Hypothesis test (if --alpha provided): t-statistic, p-value, significance result
  • Confidence interval (if --alpha provided): CI for population correlation ρ using Fisher Z-transformation

spearman — Spearman Rank Correlation Coefficient

Computes the Spearman rank correlation coefficient (ρ) to measure monotonic relationships between two variables. Unlike Pearson, Spearman evaluates correlation based on ranked data, making it robust to outliers and suitable for ordinal data or non-linear but monotonic relationships. Values range from -1 to 1, with the same interpretation as Pearson correlation.

Implementation: Uses scipy for numerically robust t-distribution CDF and inverse normal CDF calculations.

# Compute rank correlation from command-line values
spearman --x 1,2,3,4,10 --y 2,3,5,6,20

# Load data from CSV file (e.g., survey Likert scales)
spearman --file survey.csv --x-col satisfaction --y-col loyalty

# Include hypothesis test at α=0.01 significance level
spearman --x 100,150,120,180,200 --y 5,3,4,2,1 --alpha 0.01

# One-tailed test with rank display for diagnostic inspection
spearman --x 1,2,3,4,5 --y 2,4,5,4,5 --alpha 0.05 --sided one --show-ranks

# Analyze ordinal data with tied values
spearman --x 1,2,2,3,4 --y 1,2,3,4,5 --show-ranks --precision 3

Options:

Flag Long form Description
--x Comma-separated x values (use with --y)
--y Comma-separated y values (required with --x)
--file CSV file path (use with --x-col and --y-col)
--x-col Column name for x values in CSV (required with --file)
--y-col Column name for y values in CSV (required with --file)
--alpha Significance level for hypothesis test and CI (e.g., 0.05)
--sided Hypothesis test type: one or two (default: two)
--show-ranks Display ranked data table for diagnostic inspection
-P --precision Decimal places for printed values (default: 6)

--x and --file are mutually exclusive; one is required.
When using --x, must also provide --y.
When using --file, must also provide --x-col and --y-col
Use --show-ranks to inspect how ties are handled (tied values receive average rank)

Output includes:

  • Spearman's ρ: Rank correlation coefficient measuring monotonic relationship strength
  • ρ²: Coefficient of determination for ranks
  • Interpretation: Qualitative description of correlation strength and direction
  • Rank table (if --show-ranks provided): Original values and their assigned ranks
  • Hypothesis test (if --alpha provided): t-statistic, p-value, significance result
  • Confidence interval (if --alpha provided): CI for population correlation ρ using Fisher Z-transformation

When to use Spearman vs. Pearson:

  • Spearman: Ordinal data, non-linear but monotonic relationships, presence of outliers, or when distribution assumptions are violated
  • Pearson: Continuous data with linear relationships and approximate normality

sample — Sample Size Calculator

Calculates the minimum sample size needed for statistical studies. Supports proportion estimation within a margin of error, mean difference detection with specified power, and two-proportion comparisons. Includes power analysis sweeps to show achieved power across a range of sample sizes.

# Minimum n to estimate a proportion within ±3% at 95% confidence
sample --type proportion --prop 0.5 --margin 0.03

# Minimum n to detect a mean shift of 5 units (σ=12) with 80% power
sample --type mean --delta 5 --std 12 --power 0.80

# Two-proportion comparison: detect difference between 40% and 50%
sample --type comparison --p1 0.40 --p2 0.50 --alpha 0.05 --power 0.80

# Power analysis sweep: show achieved power for n = 50 to 300
sample --type mean --delta 5 --std 12 --sweep 50 300 --step 25

# One-sided test with 90% power
sample --type mean --delta 3 --std 8 --power 0.90 --sided one

Options:

Flag Long form Description
--type Calculation type: proportion, mean, or comparison (required)
--prop Expected proportion for proportion estimation (0 to 1)
--margin Desired margin of error for proportion (e.g., 0.03 for ±3%)
--std, --sigma Population standard deviation for mean calculations
--delta Minimum detectable effect size (mean difference)
--p1 Proportion in group 1 for comparison
--p2 Proportion in group 2 for comparison
--alpha Significance level (default: 0.05)
--power Statistical power (1-β) for mean/comparison (default: 0.80)
--sided Test type: one or two (default: two)
--sweep MIN MAX Show power across range of sample sizes
--step Step size for sweep (default: 10)
-P --precision Decimal places for printed values (default: 4)

Calculation types:

  • Proportion: Determines sample size to estimate a single proportion within a specified margin of error at a given confidence level
  • Mean: Determines sample size to detect a mean difference (effect size) with specified statistical power
  • Comparison: Determines sample size per group to detect a difference between two proportions with specified power

linreg — Linear Regression (OLS)

Performs simple linear regression using ordinary least squares (OLS) to fit a line y = slope * x + intercept. Provides comprehensive statistical inference including coefficient tests, model fit statistics, and predictions with confidence and prediction intervals.

# Fit a line to comma-separated x, y values
linreg --x 1,2,3,4,5 --y 2.1,3.9,6.2,7.8,10.1

# Load data from CSV file
linreg --file data.csv --x-col height --y-col weight

# With prediction at x=6
linreg --x 1,2,3,4,5 --y 2,4,6,8,10 --predict 6

# With 90% confidence intervals (α=0.10)
linreg --x 10,20,30,40,50 --y 15,28,41,55,68 --alpha 0.10 --predict 60

# Custom precision
linreg --x 1,2,3,4 --y 2.1,4.3,5.8,8.2 --precision 4

Options:

Flag Long form Description
--x Comma-separated x values (use with --y)
--y Comma-separated y values (required with --x)
--file CSV file path (use with --x-col and --y-col)
--x-col Column name for x values in CSV (required with --file)
--y-col Column name for y values in CSV (required with --file)
--predict x value for prediction with confidence/prediction intervals
--alpha Significance level for confidence intervals (default: 0.05)
-P --precision Decimal places for printed values (default: 6)

--x and --file are mutually exclusive; one is required
When using --x, must also provide --y
When using --file, must also provide --x-col and --y-col

Output includes:

  • Model equation: Fitted line y = slope * x + intercept
  • Coefficients: Slope and intercept with standard errors, t-statistics, p-values, and confidence intervals
  • Model fit: R² (coefficient of determination), residual standard error, F-statistic, overall model significance
  • Predictions (if --predict specified):
    • Point estimate at the specified x value
    • Confidence interval for the mean response (where we expect the average y)
    • Prediction interval for an individual observation (wider, accounts for individual variability)

Statistical notes:

  • : Proportion of variance in y explained by x (0 to 1; higher is better)
  • Confidence interval: Uncertainty in estimating the mean response
  • Prediction interval: Uncertainty for a single new observation (always wider than confidence interval)
  • t-tests: Test whether each coefficient is significantly different from zero
  • F-statistic: Tests whether the overall model is significant (better than just predicting the mean)

simulate — Monte Carlo Probability Simulator

Runs repeated random experiments to estimate probabilities empirically, with optional confidence intervals and analytical comparison against binom, birthday, poisson, and streak.

Implementation: Uses numpy for efficient random number generation and scipy for statistical functions (Wilson confidence intervals).

# Estimate P(X >= 5) for Binomial(n=10, p=0.4) over 100,000 trials
simulate --experiment binom --params n=10 k=5 p=0.4 --trials 100000

# Birthday collision probability for a group of 23 with a 95% confidence interval
simulate --experiment birthday --params pool=365 group=23 --confidence

# Streak probability: P(run of 5+ successes in 100 trials, p=0.5)
simulate --experiment streak --params n=100 k=5 p=0.5 --trials 50000

# Poisson: P(X >= 7) for λ=3.0 with a fixed seed
simulate --experiment poisson --params lam=3.0 k=7 --seed 42

# Auto-size trial count to achieve a target standard error of 0.005
simulate --experiment binom --params n=20 k=8 p=0.5 --scale 0.005

Options:

Flag Long form Description
-e --experiment Experiment type: binom, birthday, streak, or poisson (required)
-p --params Space-separated KEY=VALUE experiment parameters (see below)
-t --trials Number of simulation trials (default: 10,000)
--scale Target standard error; auto-computes --trials (overrides -t)
-s --seed Random seed for reproducibility
-c --confidence Print 95% Wilson confidence interval
--dump Output per-trial results as CSV instead of summary
-f --format Summary output format: table (default) or json
-P --precision Decimal places for printed probabilities (default: 6)

Required params by experiment:

Experiment Required params
binom n=INT k=INT p=FLOAT
birthday pool=INT group=INT
streak n=INT k=INT p=FLOAT
poisson lam=FLOAT k=INT

Python Library

Binomial Distribution

from src.utils.binomial_distribution import binomial_pmf, binomial_cdf_le, binomial_cdf_ge

# P(X = 3) for Binomial(n=10, p=0.4)
pmf = binomial_pmf(10, 3, 0.4)

# P(X <= 3) for Binomial(n=10, p=0.4)
cdf = binomial_cdf_le(10, 3, 0.4)

# P(X >= 3) for Binomial(n=10, p=0.4)
survival = binomial_cdf_ge(10, 3, 0.4)

Bayes' Theorem

from src.utils.bayes_theorem import bayes_posterior, evidence_from_false_positive

# Derive P(B) from a prior, likelihood, and false-positive rate
evidence = evidence_from_false_positive(0.01, 0.99, 0.05)

# Posterior P(A|B)
posterior = bayes_posterior(0.01, 0.99, evidence)

Birthday Problem

from src.utils.birthday_problem import (
    collision_prob_uniform,
    collision_prob_nonuniform,
    min_group_for_prob,
    expected_duplicate_pairs,
)

# P(duplicate) for 23 people in a pool of 365.25
prob = collision_prob_uniform(23, 365.25)

# Minimum group size to reach 50% collision probability
n = min_group_for_prob(0.50, 365.25)

# P(duplicate) with a non-uniform pool
prob_nu = collision_prob_nonuniform(30, [0.10, 0.15, 0.20, 0.30, 0.25])

# Expected number of duplicate pairs
pairs = expected_duplicate_pairs(23, 365.25)

Normal Distribution

from src.utils.normal_gaussian import (
    normal_pdf,
    normal_cdf,
    normal_ppf,
    normal_prob_between,
)

# PDF value at x=1.96 for the standard normal
pdf = normal_pdf(1.96, mu=0.0, sigma=1.0)

# P(X ≤ 1.96)
cdf = normal_cdf(1.96, mu=0.0, sigma=1.0)

# P(X ≥ 1.96)
survival = 1.0 - normal_cdf(1.96, mu=0.0, sigma=1.0)

# P(−1.96 ≤ X ≤ 1.96)
prob = normal_prob_between(-1.96, 1.96, mu=0.0, sigma=1.0)

# Find x such that P(X ≤ x) = 0.975 (inverse CDF)
x = normal_ppf(0.975, mu=0.0, sigma=1.0)

Expected Value

from src.utils.expected_value import (
    expected_value,
    variance,
    std_dev,
    entropy,
    mgf,
    load_file,
)

outcomes = [0, 1, 5, 10]
probs    = [0.50, 0.25, 0.15, 0.10]

# E[X]
ev = expected_value(outcomes, probs)

# Var(X) and SD(X)
var = variance(outcomes, probs)
sd  = std_dev(outcomes, probs)

# Shannon entropy (bits)
H = entropy(probs)

# Moment generating function M_X(t) at t=0.5
M = mgf(outcomes, probs, t=0.5)

# Load a distribution from a CSV or JSON file
outcomes, probs = load_file("payouts.csv")

Poisson Distribution

from src.utils.poisson_distribution import (
    poisson_pmf,
    poisson_cdf_le,
    poisson_cdf_ge,
    min_k_for_prob,
)

# P(X = 7) for Poisson(λ=3.0)
pmf = poisson_pmf(7, 3.0)

# P(X ≤ 7) for Poisson(λ=3.0)
cdf = poisson_cdf_le(7, 3.0)

# P(X ≥ 7) for Poisson(λ=3.0)
survival = poisson_cdf_ge(7, 3.0)

# Minimum k such that P(X ≤ k) >= 0.95
k = min_k_for_prob(0.95, 3.0)

Prime Numbers

from src.utils.prime_numbers import (
    is_prime,
    nth_prime,
    count_primes,
    primes_in_range,
    prime_factorization,
    sieve_of_eratosthenes,
    format_factorization,
)

# Check if a number is prime
if is_prime(97):
    print("97 is prime")

# Find the 100th prime number
p = nth_prime(100)  # Returns 541

# Count primes up to 1000 (π function)
count = count_primes(1000)  # Returns 168

# Get all primes in a range
primes = primes_in_range(50, 100)  # [53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

# Generate all primes up to a limit using Sieve of Eratosthenes
all_primes = sieve_of_eratosthenes(100)  # [2, 3, 5, 7, 11, ..., 97]

# Prime factorization
factors = prime_factorization(360)  # {2: 3, 3: 2, 5: 1} → 2³ × 3² × 5
formatted = format_factorization(factors)  # "2³ × 3² × 5"

Streak Probability

from src.utils.streak_probability import (
    prob_at_least_one_streak,
    expected_longest_streak,
)

# P(at least one run of 5 consecutive heads in 100 fair coin flips)
p = prob_at_least_one_streak(100, 5, 0.5)

# Expected length of the longest run of successes in 162 trials at .300
e = expected_longest_streak(162, 0.300)

Pythagorean Record

from src.utils.pythagorean_record import (
    pythagorean_expectation,
    linear_expectation,
    expected_wins,
)

# Traditional Pythagorean formula: expected win % for 800 RS, 650 RA
win_pct = pythagorean_expectation(800, 650, exponent=2.0)

# Linear formula (SABR 2014): expected win % for MLB
win_pct = linear_expectation(800, 650, sport="mlb")

# Linear formula for NFL
win_pct = linear_expectation(420, 300, sport="nfl")

# Linear formula for NBA
win_pct = linear_expectation(8500, 8200, sport="nba")

# Convert win percentage to expected wins
wins = expected_wins(win_pct, games=162)

Pearson Correlation

from src.utils.pearson_correlation import (
    pearson_r,
    correlation_t_statistic,
    correlation_p_value,
    correlation_confidence_interval,
)

x = [1, 2, 3, 4, 5]
y = [2.1, 3.8, 6.2, 7.9, 10.1]

# Compute Pearson's r
r = pearson_r(x, y)

# r² (coefficient of determination)
r_squared = r ** 2

# t-statistic for testing H₀: ρ = 0
t_stat = correlation_t_statistic(r, n=len(x))

# p-value for two-tailed test
p_value = correlation_p_value(r, n=len(x), sided="two")

# 95% confidence interval for population correlation ρ
ci_lower, ci_upper = correlation_confidence_interval(r, n=len(x), alpha=0.05)

Spearman Correlation

from src.utils.spearman_correlation import (
    spearman_rho,
    rank_data,
    correlation_t_statistic,
    correlation_p_value,
    correlation_confidence_interval,
)

x = [1, 2, 3, 4, 10]
y = [2, 3, 5, 6, 20]

# Compute Spearman's ρ (rank correlation)
rho = spearman_rho(x, y)

# ρ² (coefficient of determination for ranks)
rho_squared = rho ** 2

# Get ranks for inspection (handles ties by averaging)
rank_x = rank_data(x)
rank_y = rank_data(y)

# t-statistic for testing H₀: ρ = 0
t_stat = correlation_t_statistic(rho, n=len(x))

# p-value for two-tailed test
p_value = correlation_p_value(rho, n=len(x), sided="two")

# 99% confidence interval for population correlation ρ
ci_lower, ci_upper = correlation_confidence_interval(rho, n=len(x), alpha=0.01)

Sample Size Calculator

from src.utils.sample_size import (
    sample_size_proportion,
    sample_size_mean,
    sample_size_comparison,
    achieved_power_mean,
    achieved_power_comparison,
)

# Minimum sample size to estimate a proportion within ±3% at 95% confidence
n = sample_size_proportion(p=0.5, margin=0.03, alpha=0.05)

# Minimum sample size to detect a mean difference of 5 with σ=12 at 80% power
n = sample_size_mean(sigma=12, delta=5, alpha=0.05, power=0.80, sided="two")

# Sample sizes for two-proportion comparison (detect 0.40 vs 0.50 with 80% power)
n1, n2 = sample_size_comparison(p1=0.40, p2=0.50, alpha=0.05, power=0.80, sided="two")

# Achieved power for a given sample size
power = achieved_power_mean(n=100, sigma=12, delta=5, alpha=0.05, sided="two")

# Achieved power for two-proportion comparison
power = achieved_power_comparison(n_per_group=150, p1=0.40, p2=0.50, alpha=0.05, sided="two")

Linear Regression

from src.utils.linear_regression import (
    linear_regression,
    predict,
    mean,
)

x = [1, 2, 3, 4, 5]
y = [2.1, 3.9, 6.2, 7.8, 10.1]

# Perform linear regression
model = linear_regression(x, y)

# Access model attributes
print(f"Slope: {model.slope}")
print(f"Intercept: {model.intercept}")
print(f"R²: {model.r_squared}")
print(f"Residual SE: {model.residual_std_error}")
print(f"t-statistic (slope): {model.t_slope}")

# Make a prediction with confidence and prediction intervals
x_new = 6.0
prediction, conf_lower, conf_upper, pred_lower, pred_upper = predict(
    model, x_new, alpha=0.05
)

print(f"Prediction at x={x_new}: {prediction}")
print(f"95% Confidence interval: [{conf_lower}, {conf_upper}]")
print(f"95% Prediction interval: [{pred_lower}, {pred_upper}]")

Monte Carlo Simulator

from src.utils.monte_carlo import (
    simulate_binomial,
    simulate_birthday,
    simulate_streak,
    simulate_poisson,
    wilson_ci,
    standard_error,
)

# Simulate P(X >= 5) for Binomial(10, 0.4) over 100,000 trials
results = simulate_binomial(n=10, k=5, p=0.4, trials=100_000, seed=42)
p_hat = sum(results) / len(results)
se = standard_error(p_hat, len(results))
ci = wilson_ci(p_hat, len(results))

Development

Clone the repository and install in editable mode:

git clone https://github.com/ncarsner/pythodds.git
cd pythodds
pip install -e .

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Nicholas Carsner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pythodds-0.13.0.tar.gz (126.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pythodds-0.13.0-py3-none-any.whl (65.2 kB view details)

Uploaded Python 3

File details

Details for the file pythodds-0.13.0.tar.gz.

File metadata

  • Download URL: pythodds-0.13.0.tar.gz
  • Upload date:
  • Size: 126.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for pythodds-0.13.0.tar.gz
Algorithm Hash digest
SHA256 c1611e767b3604e4dd78e7171ffa5bcc3f4a31bdb213963ca30e2052200f3095
MD5 98bbf608883c3058e89def1944b1b779
BLAKE2b-256 b99f997fe0c7753e504fcda029ee8e51dea51828b604686aeddaaa5d74f25cbe

See more details on using hashes here.

File details

Details for the file pythodds-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: pythodds-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 65.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for pythodds-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01ca962d2cb40ff7f66192ca308eb637afaeee90f79f648ee9d765ef02f13c5f
MD5 a5b519580ec38747c3733baf25859928
BLAKE2b-256 e9acf26eeb5968e5f5209f4243e2c38f00960899719828d6889895090ab89230

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page