Skip to main content

Squiggle programming language for intuitive probabilistic estimation features in Python

Project description

Squigglepy: Implementation of Squiggle in Python

Squiggle is a "simple programming language for intuitive probabilistic estimation". It serves as its own standalone programming language with its own syntax, but it is implemented in JavaScript. I like the features of Squiggle and intend to use it frequently, but I also sometimes want to use similar functionalities in Python, especially alongside other Python statistical programming packages like Numpy, Pandas, and Matplotlib. The squigglepy package here implements many Squiggle-like functionalities in Python.

Installation

pip3 install squigglepy

Usage

Core Features

Here's the Squigglepy implementation of the example from Squiggle Docs:

import squigglepy as sq
from squigglepy.numbers import K, M

pop_of_ny_2022 = sq.to(8.1*M, 8.4*M) # This means that you're 90% confident the value is between 8.1 and 8.4 Million.

def pct_of_pop_w_pianos():
    percentage = sq.to(.2, 1)
    return sq.sample(percentage) * 0.01 # We assume there are almost no people with multiple pianos

def piano_tuners_per_piano():
    pianos_per_piano_tuner = sq.to(2*K, 50*K)
    return 1 / sq.sample(pianos_per_piano_tuner)

def total_tuners_in_2022():
    return (sq.sample(pop_of_ny_2022) *
            pct_of_pop_w_pianos() *
            piano_tuners_per_piano())

sq.get_percentiles(sq.sample(total_tuners_in_2022, n=1000))

And the version from the Squiggle doc that incorporates time:

import squigglepy as sq
from squigglepy.numbers import K, M

pop_of_ny_2022 = sq.to(8.1*M, 8.4*M)

def pct_of_pop_w_pianos():
    percentage = sq.to(.2, 1)
    return sq.sample(percentage) * 0.01

def pct_of_pop_w_pianos():
    percentage = sq.to(.2, 1)
    return sq.sample(percentage) * 0.01

def piano_tuners_per_piano():
    pianos_per_piano_tuner = sq.to(2*K, 50*K)
    return 1 / sq.sample(pianos_per_piano_tuner)

# Time in years after 2022
def pop_at_time(t):
    avg_yearly_pct_change = sq.to(-0.01, 0.05) # We're expecting NYC to continuously grow with an mean of roughly between -1% and +4% per year
    return sq.sample(pop_of_ny_2022) * ((sq.sample(avg_yearly_pct_change) + 1) ** t)

def total_tuners_at_time(t):
    return (pop_at_time(t) *
            pct_of_pop_w_pianos() *
            piano_tuners_per_piano())

# Get total piano tuners at 2030
sq.get_percentiles(sq.sample(lambda: total_tuners_at_time(2030-2022), n=1000))

WARNING: Be careful about dividing by K, M, etc. 1/2*K = 500 in Python. Use 1/(2*K) instead to get the expected outcome.

Additional Features

import squigglepy as sq

# Normal distribution
sq.sample(sq.norm(1, 3))  # 90% interval from 1 to 3

# Distribution can be sampled with mean and sd too
sq.sample(sq.norm(mean=0, sd=1))
sq.sample(sq.norm(-1.67, 1.67))  # This is equivalent to mean=0, sd=1

# Get more than one sample
sq.sample(sq.norm(1, 3), n=100)

# Other distributions exist
sq.sample(sq.lognorm(1, 10))
sq.sample(sq.tdist(1, 10, t=5))
sq.sample(sq.triangular(1, 2, 3))
sq.sample(sq.binomial(p=0.5, n=5))
sq.sample(sq.beta(a=1, b=2))
sq.sample(sq.bernoulli(p=0.5))
sq.sample(sq.poisson(10))
sq.sample(sq.gamma(3, 2))
sq.sample(sq.exponential(scale=1))

# Discrete sampling
sq.sample(sq.discrete({'A': 0.1, 'B': 0.9}))

# Can return integers
sq.sample(sq.discrete({0: 0.1, 1: 0.3, 2: 0.3, 3: 0.15, 4: 0.15}))

# Alternate format (also can be used to return more complex objects)
sq.sample(sq.discrete([[0.1,  0],
                       [0.3,  1],
                       [0.3,  2],
                       [0.15, 3],
                       [0.15, 4]]))

sq.sample(sq.discrete([0, 1, 2])) # No weights assumes equal weights

# You can mix distributions together
sq.sample(sq.mixture([sq.norm(1, 3),
                      sq.norm(4, 10),
                      sq.lognorm(1, 10)],  # Distributions to mix
                     [0.3, 0.3, 0.4]))     # These are the weights on each distribution

# This is equivalent to the above, just a different way of doing the notation
sq.sample(sq.mixture([[0.3, sq.norm(1,3)],
                      [0.3, sq.norm(4,10)],
                      [0.4, sq.lognorm(1,10)]]))

# You can add and subtract distributions (a little less cool compared to native Squiggle unfortunately):
sq.sample(lambda: sq.sample(sq.norm(1,3)) + sq.sample(sq.norm(4,5)), n=100)
sq.sample(lambda: sq.sample(sq.norm(1,3)) - sq.sample(sq.norm(4,5)), n=100)
sq.sample(lambda: sq.sample(sq.norm(1,3)) * sq.sample(sq.norm(4,5)), n=100)
sq.sample(lambda: sq.sample(sq.norm(1,3)) / sq.sample(sq.norm(4,5)), n=100)

# You can change the CI from 90% (default) to 80%
sq.sample(sq.norm(1, 3, credibility=0.8))

# You can clip
sq.sample(sq.norm(0, 3, lclip=0, rclip=5)) # Sample norm with a 90% CI from 0-3, but anything lower than 0 gets clipped to 0 and anything higher than 5 gets clipped to 5.

# You can specify a constant (which can be useful for passing things into functions or mixtures)
sq.sample(sq.const(4)) # Always returns 4

Rolling a Die

An example of how to use distributions to build tools:

import squigglepy as sq

def roll_die(sides, n=1):
    return sq.sample(sq.discrete(list(range(1, sides + 1))), n=n) if sides > 0 else None

roll_die(sides=6, n=10)
# [2, 6, 5, 2, 6, 2, 3, 1, 5, 2]

This is already included standard in the utils of this package. Use sq.roll_die.

Bayesian inference

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies.

A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

We can approximate the answer with a Bayesian network (uses rejection sampling):

import squigglepy as sq
from squigglepy import bayes
from squigglepy.numbers import M

def mammography(has_cancer):
    p = 0.8 if has_cancer else 0.096
    return bool(sq.sample(sq.bernoulli(p)))

def define_event():
    cancer = sq.sample(sq.bernoulli(0.01))    
    return({'mammography': mammography(cancer),
            'cancer': cancer})

bayes.bayesnet(define_event,
               find=lambda e: e['cancer'],
               conditional_on=lambda e: e['mammography'],
               n=1*M)
# 0.07723995880535531

Or if we have the information immediately on hand, we can directly calculate it. Though this doesn't work for very complex stuff.

from squigglepy import bayes
bayes.simple_bayes(prior=0.01, likelihood_h=0.8, likelihood_not_h=0.096)
# 0.07763975155279504

You can also make distributions and update them:

import matplotlib.pyplot as plt
import squigglepy as sq
from squigglepy import bayes
from squigglepy.numbers import K

print('Prior')
prior = sq.norm(1,5)
prior_samples = sq.sample(prior, n=10*K)
plt.hist(prior_samples, bins = 200)
plt.show()
print(sq.get_percentiles(prior_samples))
print('Prior Mean: {} SD: {}'.format(np.mean(prior_samples), np.std(prior_samples)))
print('-')

print('Evidence')
evidence = sq.norm(2,3)
evidence_samples = sq.sample(evidence, n=10*K)
plt.hist(evidence_samples, bins = 200)
plt.show()
print(sq.get_percentiles(evidence_samples))
print('Evidence Mean: {} SD: {}'.format(np.mean(evidence_samples), np.std(evidence_samples)))
print('-')

print('Posterior')
posterior = bayes.update(prior_samples, evidence_samples)
posterior_samples = sq.sample(posterior, n=10*K)
plt.hist(posterior_samples, bins = 200)
plt.show()
print(sq.get_percentiles(posterior_samples))
print('Posterior Mean: {} SD: {}'.format(np.mean(posterior_samples), np.std(posterior_samples)))

print('Average')
average = bayes.average(prior, evidence)
average_samples = sq.sample(average, n=10*K)
plt.hist(average_samples, bins = 200)
plt.show()
print(sq.get_percentiles(average_samples))
print('Average Mean: {} SD: {}'.format(np.mean(average_samples), np.std(average_samples)))

Alarm net

This is the alarm network from Bayesian Artificial Intelligence - Section 2.5.1:

Assume your house has an alarm system against burglary.

You live in the seismically active area and the alarm system can get occasionally set off by an earthquake.

You have two neighbors, Mary and John, who do not know each other. If they hear the alarm they call you, but this is not guaranteed.

The chance of a burglary on a particular day is 0.1%. The chance of an earthquake on a particular day is 0.2%.

The alarm will go off 95% of the time with both a burglary and an earthquake, 94% of the time with just a burglary, 29% of the time with just an earthquake, and 0.1% of the time with nothing (total false alarm).

John will call you 90% of the time when the alarm goes off. But on 5% of the days, John will just call to say "hi". Mary will call you 70% of the time when the alarm goes off. But on 1% of the days, Mary will just call to say "hi".

import squigglepy as sq
from squigglepy import bayes
from squigglepy.numbers import M

def p_alarm_goes_off(burglary, earthquake):
    if burglary and earthquake:
        return 0.95
    elif burglary and not earthquake:
        return 0.94
    elif not burglary and earthquake:
        return 0.29
    elif not burglary and not earthquake:
        return 0.001

def p_john_calls(alarm_goes_off):
    return 0.9 if alarm_goes_off else 0.05
    
def p_mary_calls(alarm_goes_off):
    return 0.7 if alarm_goes_off else 0.01

def define_event():
    burglary_happens = bool(sq.sample(sq.bernoulli(p=0.001)))
    earthquake_happens = bool(sq.sample(sq.bernoulli(p=0.002)))
    alarm_goes_off = bool(sq.sample(sq.bernoulli(p_alarm_goes_off(burglary_happens, earthquake_happens))))
    john_calls = bool(sq.sample(sq.bernoulli(p_john_calls(alarm_goes_off))))
    mary_calls = bool(sq.sample(sq.bernoulli(p_mary_calls(alarm_goes_off))))
    return {'burglary': burglary_happens,
            'earthquake': earthquake_happens,
            'alarm_goes_off': alarm_goes_off,
            'john_calls': john_calls,
            'mary_calls': mary_calls}

# What are the chances that both John and Mary call if an earthquake happens?
bayes.bayesnet(define_event,
               n=1*M,
               find=lambda e: (e['mary_calls'] and e['john_calls']),
               conditional_on=lambda e: e['earthquake'])
# Result will be ~0.19, though it varies because it is based on a random sample.
# This also may take a minute to run.

# If both John and Mary call, what is the chance there's been a burglary?
bayes.bayesnet(define_event,
               n=1*M,
               find=lambda e: e['burglary'],
               conditional_on=lambda e: (e['mary_calls'] and e['john_calls']))
# Result will be ~0.27, though it varies because it is based on a random sample.
# This will run quickly because there is a built-in cache.
# Use `cache=False` to not build a cache and `reload_cache=True` to recalculate the cache.

Note that the amount of Bayesian analysis that squigglepy can do is pretty limited. For more complex bayesian analysis, consider sorobn, pomegranate, bnlearn, or pyMC.

A Demonstration of the Monte Hall Problem

import random
import squigglepy as sq
from squigglepy import bayes
from squigglepy.numbers import K, M, B, T


def monte_hall(door_picked, switch=False):
    doors = ['A', 'B', 'C']
    car_is_behind_door = random.choice(doors)
    reveal_door = random.choice([d for d in doors if d != door_picked and d != car_is_behind_door])
    
    if switch:
        old_door_picked = door_picked
        door_picked = [d for d in doors if d != old_door_picked and d != reveal_door][0]
        
    won_car = (car_is_behind_door == door_picked)
    return won_car 


def define_event():
    door = random.choice(['A', 'B', 'C'])
    switch = random.random() >= 0.5
    return {'won': monte_hall(door_picked=door, switch=switch),
            'switched': switch}

RUNS = 10*K
r = bayes.bayesnet(define_event,
                   find=lambda e: e['won'],
                   conditional_on=lambda e: e['switched'],
                   verbose=True,
                   n=RUNS)
print('Win {}% of the time when switching'.format(int(r * 100)))

r = bayes.bayesnet(define_event,
                   find=lambda e: e['won'],
                   conditional_on=lambda e: not e['switched'],
                   verbose=True,
                   n=RUNS)
print('Win {}% of the time when not switching'.format(int(r * 100)))

# Win 66% of the time when switching
# Win 34% of the time when not switching

More complex coin/dice interactions

Imagine that I flip a coin. If heads, I take a random die out of my blue bag. If tails, I take a random die out of my red bag. The blue bag contains only 6-sided dice. The red bag contains a 4-sided die, a 6-sided die, a 10-sided die, and a 20-sided die. I then roll the random die I took. What is the chance that I roll a 6?

import squigglepy as sq
from squigglepy.numbers import K, M, B, T
from squigglepy import bayes

def define_event():
    flip = sq.flip_coin()
    if flip == 'heads': # Blue bag
        dice_sides = 6
    else: # Red bag
        dice_sides = sq.sample(sq.discrete([4, 6, 10, 20]))
    return sq.roll_die(dice_sides)


bayes.bayesnet(define_event,
               find=lambda e: e == 6,
               verbose=True,
               n=100*K)
# This run for me returned 0.12306 which is pretty close to the correct answer of 0.12292

Run tests

rm -rf build && flake8 && pytest && pip3 install . && python3 tests/integration.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

squigglepy-0.8.tar.gz (22.7 kB view hashes)

Uploaded Source

Built Distribution

squigglepy-0.8-py3-none-any.whl (23.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page