Skip to main content

A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

Project description

BayesDB

BayesDB, a Bayesian database, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.

BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB’s default assumptions when appropriate.

BayesDB’s inferences are based in part on CrossCat, a new, nonparametric Bayesian machine learning method, that automatically estimates the full joint distribution behind arbitrary data tables.

Installation

Docker

BayesDB can also be accessed via a community-contributed Docker container. Install instructions for Docker can be found here.

Once docker has been installed and configured enter the following command in your terminal to download and install the Docker container (this will take a few minutes):

docker pull bayesdb/bayesdb

To run:

docker run -t -i bayesdb/bayesdb /bin/bash

Local

BayesDB depends on CrossCat, so first install CrossCat by following its local installation instructions here.

BayesDB can be installed locally with:

git clone https://github.com/mit-probabilistic-computing-project/BayesDB.git
cd BayesDB
sudo python setup.py install

If you have trouble with matplotlib, you should try switching to a different backend. Open a python prompt ($ python):

import matplotlib
matplotlib.matplotlib_fname()

Then, edit the file at the path that was outputted, changing ‘backend’ to another one of the available values, until the matplotlib errors go away. Good ones to try are GTKAgg and Agg.

Documentation

Website

Documentation

Example

run_dha_example.py (github) is a basic example of analysis using BayesDB. For a first test, run the following from inside the top level BayesDB dir

python examples/dha/run_dha_example.py

License

Apache License, Version 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BayesDB-0.2.0.tar.gz (110.7 kB view details)

Uploaded Source

Built Distributions

BayesDB-0.2.0-py2.7.egg (283.1 kB view details)

Uploaded Source

BayesDB-0.2.0-py2-none-any.whl (131.9 kB view details)

Uploaded Python 2

File details

Details for the file BayesDB-0.2.0.tar.gz.

File metadata

  • Download URL: BayesDB-0.2.0.tar.gz
  • Upload date:
  • Size: 110.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for BayesDB-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c72e92c29b52d2d9666c60650afb5cb19700ce6db72c93d58f79a64f38c6c524
MD5 7f7e238c61c209421edd0b02b3c85b97
BLAKE2b-256 cf036f9835ed644e53f79c0238f04d757c640407a87b94067caffdee6a5c10cb

See more details on using hashes here.

File details

Details for the file BayesDB-0.2.0-py2.7.egg.

File metadata

  • Download URL: BayesDB-0.2.0-py2.7.egg
  • Upload date:
  • Size: 283.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for BayesDB-0.2.0-py2.7.egg
Algorithm Hash digest
SHA256 3039c02e8bbec63ee0ba84a7e42f2c440c2a9f51fd9f4e9ea660daf2c09d8417
MD5 235bb3f5d084c7980a91f66cfe60b28e
BLAKE2b-256 c492ff525f8972c2e6a26d1f8b05297eabca89c7fc7f4201a4ab8c1939fc513c

See more details on using hashes here.

File details

Details for the file BayesDB-0.2.0-py2-none-any.whl.

File metadata

File hashes

Hashes for BayesDB-0.2.0-py2-none-any.whl
Algorithm Hash digest
SHA256 1463259b167dadf37d56b6ef3ecbb7c6c6190714ad3786bb3259069ba9dcdab1
MD5 6d86728464f209c29d9df0943e2db513
BLAKE2b-256 e68d3f96a84bde558a0c7e2586b26d8bb5bccac4d0f3d46a93cc8247c8b0fb1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page