Skip to main content

EvaDB AI-Relational Database System

Project description

EvaDB AI-SQL Database System

EvaDB is a database system for building simpler and faster AI-powered applications.

EvaDB is an AI-SQL database system for developing applications powered by AI models. We aim to simplify the development and deployment of AI-powered applications that operate on structured (tables, feature stores) and unstructured data (text documents, videos, PDFs, podcasts, etc.).

EvaDB accelerates AI pipelines by 10x using a collection of performance optimizations inspired by time-tested SQL database systems, including data-parallel query execution, function caching, sampling, and cost-based predicate reordering. EvaDB supports an AI-oriented query language tailored for analyzing both structured and unstructured data. It has first-class support for PyTorch, Hugging Face, YOLO, and Open AI models.

The high-level Python and SQL APIs allows even beginners to use EvaDB in a few lines of code. Advanced users can define custom user-defined functions that wrap around any AI model or Python library. EvaDB is fully implemented in Python and licensed under the Apache license.

Quick Links

Features

  • 🔮 Build simpler AI-powered applications using short Python or SQL queries
  • ⚡️ 10x faster applications using AI-centric query optimization
  • 💰 Save money spent on GPUs
  • 🚀 First-class support for your custom deep learning models through user-defined functions
  • 📦 Built-in caching to eliminate redundant model invocations across queries
  • ⌨️ First-class support for PyTorch, Hugging Face, YOLO, and Open AI models
  • 🐍 Installable via pip and fully implemented in Python

Illustrative Applications

Here are some illustrative EvaDB-powered applications (each Jupyter notebook can be opened on Google Colab):

Documentation

Quick Start

  • Step 1: Install EvaDB using pip. EvaDB supports Python versions >= 3.8:
pip install evadb
  • Step 2: Write your AI app!
import evadb

# Grab a EvaDB cursor to load data and run queries
cursor = evadb.connect().cursor()

# Load a collection of news videos into the 'news_videos' table
# This command returns a Pandas Dataframe with the query's output
# In this case, the output indicates the number of loaded videos
cursor.load(
    file_regex="news_videos/*.mp4",
    format="VIDEO",
    table_name="news_videos"
).df()

# Define a function that wraps around a speech-to-text (Whisper) model
# Such functions are known as user-defined functions or UDFs
# So, we are creating a Whisper UDF here
# After creating the UDF, we can use the function in any query
cursor.create_udf(
    udf_name="SpeechRecognizer",
    type="HuggingFace",
    task='automatic-speech-recognition',
    model='openai/whisper-base'
).df()

# EvaDB automatically extract the audio from the video
# We only need to run the SpeechRecongizer UDF on the 'audio' column
# to get the transcript and persist it in a table called 'transcripts'
cursor.query(
    """CREATE TABLE transcripts AS
       SELECT SpeechRecognizer(audio) from news_videos;"""
).df()

# We next incrementally construct the ChatGPT query using EvaDB's Python API
# The query is based on the 'transcripts' table
# This table has a column called 'text' with the transcript text
query = cursor.table('transcripts')

# Since ChatGPT is a built-in function, we don't have to define it
# We can just directly use it in the query
# We need to set the OPENAI_KEY as an environment variable
os.environ["OPENAI_KEY"] = OPENAI_KEY
query = query.select("ChatGPT('Is this video summary related to LLMs', text)")

# Finally, we run the query to get the results as a dataframe
response = query.df()
  • Write functions to wrap around your custom deep learning models
# Define a function that wraps around a speech-to-text (Whisper) model
# Such functions are known as user-defined functions or UDFs
# So, we are creating a Whisper UDF here
# After creating the UDF, we can use the function in any query
cursor.create_udf(
    udf_name="SpeechRecognizer",
    type="HuggingFace",
    task='automatic-speech-recognition',
    model='openai/whisper-base'
).df()
  • Chain multiple models in a single query to set up useful AI pipelines
# Analyse emotions of actors in an Interstellar movie clip using PyTorch models
query = cursor.table("Interstellar")
# Get faces using a `FaceDetector` function
query = query.cross_apply("UNNEST(FaceDetector(data))", "Face(bounding_box, confidence)")
# Focus only on frames 100 through 200 in the clip
query = query.filter("id > 100 AND id < 200")
# Get the emotions of the detected faces using a `EmotionDetector` function
query = query.select("id, bbox, EmotionDetector(Crop(data, bounding_box))")

# Run the query and get the query result as a dataframe
response = query.df()
  • EvaDB runs queries faster using its AI-centric query optimizer. Two key optimizations are:

    💾 Caching: EvaDB automatically caches and reuses previous query results (especially model inference results), eliminating redundant computation and reducing query processing time.

    🎯 Predicate Reordering: EvaDB optimizes the order in which the query predicates are evaluated (e.g., runs the faster, more selective model first), leading to faster queries and lower inference costs.

  -- Query 1: Find all images of black-colored dogs
  SELECT id, bbox FROM dogs 
  JOIN LATERAL UNNEST(Yolo(data)) AS Obj(label, bbox, score) 
  WHERE Obj.label = 'dog' 
    AND Color(Crop(data, bbox)) = 'black'; 

  -- Query 2: Find all Great Danes that are black-colored
  SELECT id, bbox FROM dogs 
  JOIN LATERAL UNNEST(Yolo(data)) AS Obj(label, bbox, score) 
  WHERE Obj.label = 'dog' 
    AND DogBreedClassifier(Crop(data, bbox)) = 'great dane' 
    AND Color(Crop(data, bbox)) = 'black';

By reusing the results of the first query and reordering the predicates based on the available cached inference results, EvaDB runs the second query 10x faster!

Architecture Diagram

This diagram presents the key components of EvaDB. EvaDB's AI-centric Query Optimizer takes a parsed query as input and generates a query plan that is then executed by the Query Engine. The Query Engine hits multiple storage engines to retrieve the data required for efficiently running the query:

  1. Structured data (SQL database system connected via sqlalchemy).
  2. Unstructured media data (on cloud buckets or local filesystem).
  3. Vector data (vector database system).
Architecture Diagram

Screenshots

🔮 Traffic Analysis (Object Detection Model)

Source Video Query Result
Source Video Query Result

🔮 PDF Question Answering (Question Answering Model)

App
Source Video

🔮 MNIST Digit Recognition (Image Classification Model)

Source Video Query Result
Source Video Query Result

🔮 Movie Emotion Analysis (Face Detection + Emotion Classification Models)

Source Video Query Result
Source Video Query Result

🔮 License Plate Recognition (Plate Detection + OCR Extraction Models)

Query Result
Query Result

Community and Support

👋 If you have general questions about EvaDB, want to say hello or just follow along, we'd like to invite you to join our Slack Community and to follow us on Twitter.

EvaDB Slack Channel

If you run into any problems or issues, please create a Github issue and we'll try our best to help.

Don't see a feature in the list? Search our issue tracker if someone has already requested it and add a comment to it explaining your use-case, or open a new issue if not. We prioritize our roadmap based on user feedback, so we'd love to hear from you.

Contributing

PyPI Version CI Status Documentation Status

EvaDB is the beneficiary of many contributors. All kinds of contributions to EvaDB are appreciated. To file a bug or to request a feature, please use GitHub issues. Pull requests are welcome.

For more information, see our contribution guide.

License

Copyright (c) 2018-present Georgia Tech Database Group. Licensed under Apache License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evadb-0.2.13.tar.gz (276.7 kB view details)

Uploaded Source

Built Distribution

evadb-0.2.13-py3-none-any.whl (549.8 kB view details)

Uploaded Python 3

File details

Details for the file evadb-0.2.13.tar.gz.

File metadata

  • Download URL: evadb-0.2.13.tar.gz
  • Upload date:
  • Size: 276.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for evadb-0.2.13.tar.gz
Algorithm Hash digest
SHA256 47e75b5be1ef63deb789b178d7ddc08471b43a62608deccb028959aea64492bf
MD5 01697a8a7a59f49ba8dfe29df5b81f4f
BLAKE2b-256 b901ceaf73812e46b2710b840fcdc53fd9aee2e022c7509886356e71dfc5d4e0

See more details on using hashes here.

File details

Details for the file evadb-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: evadb-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 549.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for evadb-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 f8d3758899c8c091bcd25cc53f908787acea7327dcac4b0291b8cb2bc8ed50c3
MD5 b48f926b35b457fc5b35e9aab078b5e9
BLAKE2b-256 51e97182fa49b07335ace34eebea545384cd9110fb1c258ce11ecada3b91b7b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page