🔮 Super-power your database with AI 🔮
Project description
Bring AI to your favorite database!
Docs | Blog | Showcases | Live Jupyter Demo
🔮 SuperDuperDB is open-source: Leave a star ⭐️ to support the project!
Easily implement AI without the need to copy and move your data to complex MLOps pipelines and specialized vector databases. Integrate, train, and manage your AI models and APIs directly with your chosen database, using a simple Python interface.
- Generative AI & chatbots
- Vector Search
- Standard Use-Cases (classification, regression, clustering, recommendation etc)
- Highly custom AI use-cases and workflows with specialized models.
SuperDuperDB is not another database. It is a framework that transforms your favorite database into an AI powerhouse:
- A single scalable AI deployment of all your models and AI APIs, including output computation (inference) — always up-to-date as changing data is handled automatically and immediately.
- A model trainer that allows to easily train and fine-tune models simply by querying the database.
- A feature store in which the model outputs are stored alongside the inputs in any data format.
- A fully functional vector database that allows to easily generate vector embeddings and vector indexes of the data with preferred models and APIs.
Current Integrations (more coming soon):
Databases | AI Frameworks | Models & AI APIs |
---|---|---|
• MongoDB • MongoDB Atlas • AWS S3 • PostgreSQL (experimental) • SQLite (experimental) • DuckDB (experimental) • MySQL (experimental) • Snowflake (experimental) |
• PyTorch • Scikit-Learn • HuggingFace Transformers |
• OpenAI • Cohere • Anthropic |
Featured Examples
Try our ready-to-use notebooks live on your browser.
Text-To-Image Search | Text-To-Video Search | Question the Docs |
---|---|---|
Semantic Search Engine | Classical Machine Learning | Cross-Framework Transfer Learning |
Installation
1. Install SuperDuperDB via pip
(~1 minute)
pip install superduperdb
2. Try SuperDuperDB via docker-compose
(~2 minutes):
- You need to install Docker? See the docs here.
make run-demo
Tutorial
In this tutorial, you will learn how to Integrate, train and manage any AI models and APIs directly with your database with your data. You can visit the docs to learn more.
- Deploy ML/AI models to your database:
Automatically compute outputs (inference) with your database in a single environment.
import pymongo
from sklearn.svm import SVC
from superduperdb import superduper
# Make your db superduper!
db = superduper(pymongo.MongoClient().my_db)
# Models client can be converted to SuperDuperDB objects with a simple wrapper.
model = superduper(SVC())
# Add the model into the database
db.add(model)
# Predict on the selected data.
model.predict(X='input_col', db=db, select=Collection(name='test_documents').find({'_fold': 'valid'}))
- Train models directly from your database.
Query your database, without additional ingestion and pre-processing:
import pymongo
from sklearn.svm import SVC
from superduperdb import superduper
# Make your db superduper!
db = superduper(pymongo.MongoClient().my_db)
# Models client can be converted to SuperDuperDB objects with a simple wrapper.
model = superduper(SVC())
# Predict on the selected data.
model.predict(X='input_col', db=db, select=Collection(name='test_documents').find({'_fold': 'valid'}))
- Vector-Search your data:
Use your existing favorite database as a vector search database, including model management and serving.
# First a "Listener" makes sure vectors stay up-to-date
indexing_listener = Listener(model=OpenAIEmbedding(), key='text', select=collection.find())
# This "Listener" is linked with a "VectorIndex"
db.add(VectorIndex('my-index', indexing_listener=indexing_listener))
# The "VectorIndex" may be used to search data. Items to be searched against are passed
# to the registered model and vectorized. No additional app layer is required.
# By default, SuperDuperDB uses LanceDB for vector comparison operations
db.execute(collection.like({'text': 'clothing item'}, 'my-index').find({'brand': 'Nike'}))
- Integrate AI APIs to work together with other models.
Use OpenAI, PyTorch or Hugging face model as an embedding model for vector search.
# Create a ``VectorIndex`` instance with indexing listener as OpenAIEmbedding and add it to the database.
db.add(
VectorIndex(
identifier='my-index',
indexing_listener=Listener(
model=OpenAIEmbedding(identifier='text-embedding-ada-002'),
key='abstract',
select=Collection(name='wikipedia').find(),
),
)
)
# The above also executes the embedding model (openai) with the select query on the key.
# Now we can use the vector-index to search via meaning through the wikipedia abstracts
cur = db.execute(
Collection(name='wikipedia')
.like({'abstract': 'philosophers'}, n=10, vector_index='my-index')
)
- Add a Llama 2 model directly into your database!:
model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.float16,
device_map="auto",
)
model = Pipeline(
identifier='my-sentiment-analysis',
task='text-generation',
preprocess=tokenizer,
object=pipeline,
torch_dtype=torch.float16,
device_map="auto",
)
# You can easily predict on your collection documents.
model.predict(
X=Collection(name='test_documents').find(),
db=db,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=200
)
- Use models outputs as inputs to downstream models:
model.predict(
X='input_col',
db=db,
select=coll.find().featurize({'X': '<upstream-model-id>'}), # already registered upstream model-id
listen=True,
)
Community & Getting Help
If you have any problems, questions, comments or ideas:
- Join our Slack (we look forward to seeing you there).
- Search through our GitHub Discussions, or add a new question.
- Comment an existing issue or create a new one.
- Send us an email to gethelp@superduperdb.com.
- Feel free to contact a maintainer or community volunteer directly!
Contributing
There are many ways to contribute, and they are not limited to writing code. We welcome all contributions such as:
- Bug reports
- Documentation improvements
- Enhancement suggestions
- Feature requests
- Expanding the tutorials and use case examples
Please see our Contributing Guide for details.
Feedback
Help us to improve SuperDuperDB by providing your valuable feedback here!
License
SuperDuperDB is open-source and intended to be a community effort, and it won't be possible without your support and enthusiasm. It is distributed under the terms of the Apache 2.0 license. Any contribution made to this project will be subject to the same provisions.
Join Us
We are looking for nice people who are invested in the problem we are trying to solve to join us full-time. Find roles that we are trying to fill here!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for superduperdb-0.0.14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6b42060cd55aad37ec189be50ae86248b21ce7349348b779bc46d7623effbb0 |
|
MD5 | 2f100604f7935db5ff87e7132af9e580 |
|
BLAKE2b-256 | 9a5b09c721944d745ce24cf503c7966d95c6f6060fd2411afbcb7c9b48685878 |