Skip to main content

Analyze your unstructured data

Project description

AIDB

Analyze unstructured data blazingly fast with machine learning. Connect your own ML models to your own data sources and query away!

Quick Start

In order to start using AIDB, all you need to do is install the requirements, specify a configuration, and query! Setting up on the environment is as simple as

git clone https://github.com/ddkang/aidb.git
cd aidb
pip install -r requirements.txt

# Optional if you'd like to run the examples below
gdown https://drive.google.com/uc?id=1SyHRaJNvVa7V08mw-4_Vqj7tCynRRA3x
unzip data.zip -d tests/

Text Example (in CSV)

We've set up an example of analyzing product reviews with HuggingFace. Set your HuggingFace API key. After this, all you need to do is run

python launch.py --config=config.sentiment --setup-blob-table --setup-output-table

As an example query, you can run

SELECT AVG(score)
FROM sentiment
WHERE label = '5 stars'
ERROR_TARGET 10%
CONFIDENCE 95%;

You can see the mappings here. We use the HuggingFace API to generate sentiments from the reviews.

Image Example (local directory)

We've also set up another example of analyzing whether or not user-generated content is adult content for filtering. In order to run this example, all you need to do is run

python launch.py --config=config.nsfw_detect --setup-blob-table --setup-output-table

As an example query, you can run

SELECT *
FROM nsfw
WHERE racy LIKE 'POSSIBLE';

You can see the mappings here. We use the Google Vision API to generate the safety labels.

Key Features

AIDB focuses on keeping cost down and interoperability high.

We reduce costs with our optimizations:

  • First-class support for approximate queries, reducing the cost of aggregations by up to 350x.
  • Caching, which speeds up multiple queries over the same data.

We keep interoperability high by allowing you to bring your own data source, ML models, and vector databases!

Approximate Querying

One key feature of AIDB is first-class support for approximate queries. Currently, we support approximate AVG, COUNT, and SUM. We don't currently support GROUP BY or JOIN for approximate aggregations, but it's on our roadmap. Please reach out if you'd like us to support your queries!

In order to execute an approximate aggregation query, simply append ERROR_TARGET <error percent>% CONFIDENCE <confidence>% to your normal aggregation. As a full example, you can compute an approximate count by doing:

SELECT COUNT(xmin)
FROM objects
ERROR_TARGET 5%
CONFIDENCE 95%;

The ERROR_TARGET specifies the percent error compared to running the query exactly. For example, if the true answer is 100, you will get answers between 95 and 105 (95% of the time).

Useful Links

Contribute

We have many improvements we'd like to implement. Please help us! For the time being, please email us, if you'd like to help contribute.

Contact Us

Need help in setting up AIDB for your specific dataset or want a new feature? Please fill this form.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai-db-0.0.2.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_db-0.0.2-py3-none-any.whl (66.5 kB view details)

Uploaded Python 3

File details

Details for the file ai-db-0.0.2.tar.gz.

File metadata

  • Download URL: ai-db-0.0.2.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ai-db-0.0.2.tar.gz
Algorithm Hash digest
SHA256 28e445b49d1d7d8b838437be56e250c1bafc1e826722cad68c8cee2b491d8c13
MD5 097b7e82e73ecaadbcb4c277736d3a15
BLAKE2b-256 2143c56ec3b867b8e94f14c38366e23327e51290244f9bfcf518518296136212

See more details on using hashes here.

File details

Details for the file ai_db-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ai_db-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 66.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ai_db-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1f95eba54145a46c14688833ab440d7c5d6dbd29d88b6b9d90d56d8e88b8c918
MD5 d2c5cf266f3de689cc6caf2b0653808b
BLAKE2b-256 fa0febe48ea28ba6df86b67f0a077e11c2471be8ae9b250a0e9cf56e3d672c1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page