No project description provided

Project description

PeachDB

PeachDB - the AI-First, Embeddings Database

Build memory for your AI products in minutes!

Our core API has 3 functions

from peachdb import PeachDB

# Create a new PeachDB instance
# Embeddings are automatically computed at scale using the selected `embedding_generator` model on Modal.
db = PeachDB.create(
    project_name="my_app",
    csv_path="/path/to/local/csv",  # or "s3://path/to/csv"
    column_to_embed="foo",
    id_column_name="id",
    max_rows=None, # or N, to process N rows of the dataset 
    distance_metric="cosine",  # or "l2"
    embedding_backend="exact_cpu",  # or "exact_gpu" or "approx"
    embedding_generator="sentence_transformer_L12",
    embeddings_output_s3_bucket_uri=None, # required when using S3 URI for `csv_path`
)
# Once the database has been successfully setup, you can reference it in the future via
# db = PeachDB(project_name="my_app")

# Query/search 5 most similar results
results_df = db.query(text='An example query', top_k=5)

# Deploy the database as a publicly available FastAPI server
# Call GET /query?text='An example query'&top_k=5 to fetch 5 most similar results
db.deploy()

Why another embedding database?

We've streamlined the entire end-to-end process of creating, storing, and retrieving embeddings, making it developer-friendly, seamless, and cost-effective. You no longer have to build custom pipelines or fret over hardware setups & scalability issues. PeachDB ensures you can get started within minutes, leaving the worries of cost optimization and scale to us.

Our key features include:

Automated, cost-effective & large-scale embedding computation: We leverage serverless GPU functions (through Modal) to compute embeddings on a large scale efficiently and affordably.
- For instance, we processed the Kaggle 5M song lyrics dataset in just 12 minutes at a cost of $4.90, using sentence transformers!
- We've developed a Modal wrapper for Sentence Transformer L12.
  - Coming soon: Modal wrappers for open-source embedding models such as ImageBind, OpenCLIP, and more.
  - Coming soon: Support for multi-threaded OpenAI embedding calculation.
- Coming soon: Bring your own embeddings.
- Coming soon: Custom embedding functions for even more flexibility.
Multimodality: Native support for data with mixture of modalities (such as image/audio/video).
Highly Customizable: PeachDB allows you to tailor its features to your needs. You can customize:
- Embedding computation: as described above.
- Backend: choose between exact_cpu (numpy), exact_gpu (torch), or approx (HNSW).
- Distance metrics: cosine or l2.
Effortless Deployment: Deploy PeachDB as a publicly available server with a single API. No need to worry about nginx or SSL certs.
- Coming soon: Managed, scalable deployment.
Consistent API: Experience the same API across all environments - dev, test, and prod.
Open Source: Apache 2.0.

Example

Below is a walkthrough of creating a web server for a music recommendation app. To power the app, we are using the Kaggle 5M song lyric dataset

Ssh into your remote instance (doesn't need GPU)
Create & activate a new conda environment conda create -n spoti_vibe python=3.10 & conda activate spoti_vibe
Install PeachDB: pip install peachdb
Setup Modal
- Create an account at modal.com
- Install the modal-client package: pip install modal-client
- Setup token: modal token new
(optional) PeachDB accepts local & S3 paths to datasets for embedding computation. To use S3 URIs, create & configure your .env file as below.
```
# Fetch the below values from ~/.aws/credentials
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
```
Please ensure the credentials have read & write access to the relevant bucket you plan to use.
mkdir spoti_vibe/
Create a new module inside the directory server.py

Add the following code

from peachdb import PeachDB

import os
# Fetch the username & key by creating a new API token at https://www.kaggle.com/settings
os.environ['KAGGLE_USERNAME'] = None # set user name
os.environ['KAGGLE_KEY'] = None # set key

import kaggle # make sure you've run `pip install kaggle`

kaggle.api.authenticate()
kaggle.api.dataset_download_files(
    'nikhilnayak123/5-million-song-lyrics-dataset',
    path='.',
    unzip=True
)

db = PeachDB.create(
    project_name="spoti_vibe",
    csv_path='./ds2.csv',  # dataset name as observed on Kaggle
    column_to_embed="lyrics",
    id_column_name="id",
    max_rows=None,
    distance_metric="cosine",
    embedding_backend="exact_cpu",
    embedding_generator="sentence_transformer_L12",
)

db.deploy() # Public URL will be printed to console

And that's it! You should now have a publicly available server that can listen to query requests from the user on:
GET <PUBLIC_URL>/query?text='Happy, upbeat summer'&top_k=5

Use-cases

Build web apps like - clip.audio & awesome-movies.life
Build ChatGPT for X
Build ChatGPT plugins

FAQs

Q) How can I delete a project? Run db.delete(project_name="my_app")

Get Involved

We welcome PR contributors and ideas for how to improve the project.

License

Apache 2.0

Project details

Release history Release notifications | RSS feed

0.1.0

Jun 6, 2023

0.0.16

Jun 1, 2023

0.0.15

Jun 1, 2023

0.0.14

Jun 1, 2023

0.0.11

May 31, 2023

0.0.10

May 31, 2023

0.0.9

May 30, 2023

This version

0.0.8

May 30, 2023

0.0.7

May 30, 2023

0.0.6

May 30, 2023

0.0.5

May 30, 2023

0.0.4

May 30, 2023

0.0.3

May 30, 2023

0.0.2

May 30, 2023

0.0.1

May 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peachdb-0.0.8.tar.gz (19.7 kB view details)

Uploaded May 30, 2023 Source

File details

Details for the file peachdb-0.0.8.tar.gz.

File metadata

Download URL: peachdb-0.0.8.tar.gz
Upload date: May 30, 2023
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for peachdb-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`63b4d59c34078da503cfb643f918af7e467ada336492de5f70d19cd27144ca5a`
MD5	`90f1c80e37e5417b170db4f3b4cfec51`
BLAKE2b-256	`b833af4d0ce7f16efe254e19df3e876fe9d37bb35d8ac15e6018945e5889242f`

See more details on using hashes here.

peachdb 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

PeachDB

PeachDB - the AI-First, Embeddings Database

Build memory for your AI products in minutes!

Why another embedding database?

Example

Use-cases

FAQs

Get Involved

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes