Skip to main content

Library and utility module for Bayesian reasoning

Project description

bayesian is a small Python utility to reason about probabilities. It uses a Bayesian system to extract features, crunch belief updates and spew likelihoods back. You can use either the high-level functions to classify instances with supervised learning, or update beliefs manually with the Bayes class.

If you want to simply classify and move files into the most fitting folder, run this program from the command line passing the root folder path as parameter.

High Level

from bayesian import classify, classify_file, classify_folder, classify_normal

spams = ["buy viagra", "dear recipient", "meet sexy singles"] # etc
genuines = ["let's meet tomorrow", "remember to buy milk"]
message = "remember the meeting tomorrow"
# Classify as "genuine" because of the words "remember" and "tomorrow".
print(classify(message, {'spam': spams, 'genuine': genuines}))

# Decides if the person with those measures is male or female.
print(classify_normal({'height': 6, 'weight': 130, 'foot size': 8},
                      {'male': [{'height': 6, 'weight': 180, 'foot size': 12},
                                {'height': 5.92, 'weight': 190, 'foot size': 11},
                                {'height': 5.58, 'weight': 170, 'foot size': 12},
                                {'height': 5.92, 'weight': 165, 'foot size': 10}],
                       'female': [{'height': 5, 'weight': 100, 'foot size': 6},
                                  {'height': 5.5, 'weight': 150, 'foot size': 8},
                                  {'height': 5.42, 'weight': 130, 'foot size': 7},
                                  {'height': 5.75, 'weight': 150, 'foot size': 9}]}))

# Classifies "unknown_file" as either a Python or Java file, considering
# you have directories with examples of each language.
print(classify_file("unknown_file", ["java_files", "python_files"]))

# Classifies every file under "folder" as either a Python or Java file,
# considering you have subdirectories with examples of each language.
print(classify_folder("folder"))

Low Level

from bayesian import Bayes

print ' -- Spam Filter --'
# Database with number of sightings of each words in (genuine, spam)
# emails.
words_odds = {'buy': (5, 100), 'viagra': (1, 1000), 'meeting': (15, 2)}
# Emails to be analyzed.
emails = [
          "let's schedule a meeting for tomorrow", # 100% genuine (meeting)
          "buy some viagra", # 100% spam (buy, viagra)
          "buy coffee for the meeting", # buy x meeting, should be genuine
         ]

for email in emails:
    # Start with priors of 90% chance being genuine, 10% spam.
    # Probabilities are normalized automatically.
    b = Bayes([('genuine', 90), ('spam', 10)])
    # Update probabilities, using the words in the emails as events and the
    # database of chances to figure out the change.
    b.update_from_events(email.split(), words_odds)
    # Print the email and if it's likely spam or not.
    print email[:15] + '...', b.most_likely()

print ''

print ' -- Spam Filter With Email Corpus -- '

# Email corpus. A hundred spam emails to buy products and with the word
# "meeting" thrown around. Genuine emails are about meetings and buying
# milk.
instances = {'spam': ["buy viagra", "buy cialis"] * 100 + ["meeting love"],
             'genuine': ["meeting tomorrow", "buy milk"] * 100}

# Use str.split to extract features/events/words from the corpus and build
# the model.
model = Bayes.extract_events_odds(instances, str.split)
# Create a new Bayes instance with 10%/90% priors on emails being genuine.
b = Bayes({'spam': .9, 'genuine': .1})
# Update beliefs with features/events/words from an email.
b.update_from_events("buy coffee for meeting".split(), model)
# Print the email and if it's likely spam or not.
print "'buy coffee for meeting'", ':', b

print ''

print ' -- Classic Cancer Test Problem --'
# 1% chance of having cancer.
b = Bayes([('not cancer', 0.99), ('cancer', 0.01)])
# Test positive, 9.6% false positives and 80% true positives
b.update((9.6, 80))
print b
print 'Most likely:', b.most_likely()

print ''

print ' -- Are You Cheating? -- '
results = ['heads', 'heads', 'tails', 'heads', 'heads']
events_odds = {'heads': {'honest': .5, 'cheating': .9},
               'tails': {'honest': .5, 'cheating': .1}}
b = Bayes({'cheating': .5, 'honest': .5})
b.update_from_events(results, events_odds)
print b


def b():
    return Bayes((0.99, 0.01), labels=['not cancer', 'cancer'])

# Random equivalent examples, all achieve the same result.
b() * (9.6, 80)
(b() * (9.6, 80)).opposite().opposite()
b().update({'not cancer': 9.6, 'cancer': 80})
b().update((9.6, 80))
b().update_from_events(['pos'], {'pos': (9.6, 80)})
b().update_from_tests([True], [(9.6, 80)])
Bayes([('not cancer', 0.99), ('cancer', 0.01)]) * (9.6, 80)
Bayes({'not cancer': 0.99, 'cancer': 0.01}) * {'not cancer': 9.6,
                                               'cancer': 80}

Project details

License:

MIT

Code:

https://github.com/boppreh/bayesian/

PyPI:

https://pypi.python.org/pypi/Bayesian

Issue tracker:

https://github.com/boppreh/bayesian/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Bayesian-0.3.3.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

Bayesian-0.3.3-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file Bayesian-0.3.3.tar.gz.

File metadata

  • Download URL: Bayesian-0.3.3.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for Bayesian-0.3.3.tar.gz
Algorithm Hash digest
SHA256 866444ab2789ab76be5eac1a2317b010355578fb70915dac52f9945a325d2fe3
MD5 108e92a2a2a600c8423bda6d27828be3
BLAKE2b-256 623a37a7e7b06cf53883dae2b3a6e4892bc47ce7d7e8e1ee46ad45b3a02be29d

See more details on using hashes here.

File details

Details for the file Bayesian-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: Bayesian-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for Bayesian-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 67ba8042fed466da0b2b16f52c786f88d0c73dd07bd5bec91bcf68490a9ceb65
MD5 0c5dbff6dd9a0c66e76c5265fe9bc3b4
BLAKE2b-256 0d7dcb30b21df62f0c3348ddefdb3804a16b45509e30190e45c49dba93209928

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page