No project description provided
Project description
PeachDB
PeachDB - the AI-First, Embeddings Database
Build memory for your AI products in minutes!
Our core API has 4 functions
from peachdb import PeachDB
# Create a new PeachDB instance or reference an existing one
db = PeachDB(
project_name="my_app",
embedding_generator="imagebind", # "imagebind" or "sentence_transformer_L12"
embedding_backend="exact_cpu", # "exact_cpu", "exact_gpu", or "approx"
distance_metric="cosine", # "cosine" or "l2"
)
# Auto-compute & upsert embeddings at scale using the specified `embedding_generator` model on Modal
db.upsert_text( # or "upsert_image" or "upsert_audio"
csv_path="/path/to/local/csv", # or "s3://path/to/csv"
column_to_embed="foo", # column values can either be string or public URI to image/audio
id_column_name="id",
embeddings_output_s3_bucket_uri=None, # required when using S3 URI for `csv_path`
max_rows=None, # or N to process N rows
)
# Query top 5 similar results
ids, distances, results_df = db.query(
query_input='An example query', # or path to an image/audio file
modality='text', # "text", "image", or "audio"
top_k=5
)
# Deploy database as a publicly accessible FastAPI server
# GET /query?query_input='An example query'&modality=[text|image|audio]&top_k=5 to fetch 5 most similar results
db.deploy()
Why another embedding database?
We've streamlined the entire end-to-end process of creating, storing, and retrieving embeddings, making it developer-friendly, seamless, and cost-effective. You no longer have to build custom pipelines or fret over hardware setups & scalability issues. PeachDB ensures you can get started within minutes, leaving the worries of cost optimization and scale to us.
Our key features include:
- Automated, cost-effective & large-scale embedding computation: We leverage serverless GPU functions (through Modal) to compute embeddings on a large scale efficiently and affordably.
- For instance, we processed the Kaggle 5M song lyrics dataset in just 12 minutes at a cost of $4.90, using sentence transformers!
- We've developed Modal wrappers for:
- Sentence Transformer L12
- ImageBind,
- Coming soon, support for:
- Open-source embedding models such as OpenCLIP, Microsoft E5-v2, and more.
- Multi-threaded OpenAI embedding calculation.
- Coming soon: Bring your own embeddings.
- Coming soon: Custom embedding functions for even more flexibility.
- Multimodality: Native support for data with mixture of modalities (such as image/audio/video).
- Highly Customizable: PeachDB allows you to tailor its features to your needs. You can customize:
- Embedding computation: as described above.
- Backend: choose between
exact_cpu
(numpy),exact_gpu
(torch), orapprox
(HNSW). - Distance metrics:
cosine
orl2
.
- Effortless Deployment: Deploy PeachDB as a publicly available server with a single API. No need to worry about nginx or SSL certs.
- Coming soon: Managed, scalable deployment.
- Consistent API: Experience the same API across all environments - dev, test, and prod.
- Open Source: Apache 2.0.
Example
Below is a walkthrough of creating a web server for a music recommendation app. To power the app, we are using the Kaggle 5M song lyric dataset
-
Ssh into your remote instance (doesn't need GPU)
-
Create & activate a new conda environment
conda create -n spoti_vibe python=3.10
&conda activate spoti_vibe
-
Install PeachDB:
pip install peachdb
-
Setup Modal
- Create an account at modal.com
- Install the modal-client package:
pip install modal-client
- Setup token:
modal token new
-
(optional: for AWS S3) PeachDB accepts local & S3 paths to datasets for embedding computation. To use S3 URIs, ensure you've installed the
aws
cli and runaws configure
. The credentials should have read & write access to the relevant bucket you plan to use. -
mkdir spoti_vibe/
-
Create a new module inside the directory
server.py
-
Add the following code
from peachdb import PeachDB import os # Fetch the username & key by creating a new API token at https://www.kaggle.com/settings os.environ["KAGGLE_USERNAME"] = None # set user name os.environ["KAGGLE_KEY"] = None # set key import kaggle # make sure you've run `pip install kaggle` kaggle.api.authenticate() # It can take a few mins to download depending on the network speed kaggle.api.dataset_download_files("nikhilnayak123/5-million-song-lyrics-dataset", path=".", unzip=True) db = PeachDB( project_name="spoti_vibe", distance_metric="cosine", embedding_backend="exact_cpu", embedding_generator="sentence_transformer_L12", ) db.upsert_text( csv_path="./ds2.csv", # dataset name as observed on Kaggle column_to_embed="lyrics", id_column_name="id", ) db.deploy() # Public URL will be printed to console
And that's it! You should now have a publicly available server that can listen to query requests from the user on:
GET <PUBLIC_URL>/query?text='Happy, upbeat summer'&top_k=5
Use-cases
- Build web apps like - clip.audio & awesome-movies.life
- Build ChatGPT for X
- Build ChatGPT plugins
FAQs
Q) How can I delete a project?
Run db.delete(project_name="my_app")
Get Involved
We welcome PR contributors and ideas for how to improve the project.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file peachdb-0.1.0.tar.gz
.
File metadata
- Download URL: peachdb-0.1.0.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 365db5c9ded428bc3c57e731fd32b8eed5269b3ad141982ddddda4a7ef59f2b5 |
|
MD5 | 448ad1bcf8a5d857fde972100d71fee1 |
|
BLAKE2b-256 | 26a1ff683948667865bde3ffee07919ec9ef745b6bb7ccfbc1012e04c5025a39 |
File details
Details for the file peachdb-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: peachdb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5453ed3d53e229f6e218e74b584026fb083d97d4b2182bb68c2820f370caceb1 |
|
MD5 | e34096f385973a2ea4f655d37b5245db |
|
BLAKE2b-256 | d4af5cb280f3823d4854b0511f6f8b1a9181ff9d1b97052ccc838ff23aa58b4c |