Skip to main content

No project description provided

Project description

PeachDB

PeachDB - the AI-First, Embeddings Database

Build memory for your AI products in minutes!


Our core API has 4 functions

from peachdb import PeachDB

# Create a new PeachDB instance or reference an existing one
db = PeachDB(
    project_name="my_app",
    embedding_generator="imagebind",  # "imagebind" or "sentence_transformer_L12"
    embedding_backend="exact_cpu",  # "exact_cpu", "exact_gpu", or "approx"
    distance_metric="cosine",  # "cosine" or "l2"
)

# Auto-compute & upsert embeddings at scale using the specified `embedding_generator` model on Modal
db.upsert_text(  # or "upsert_image" or "upsert_audio"
    csv_path="/path/to/local/csv",  # or "s3://path/to/csv"
    column_to_embed="foo",  # column values can either be string or public URI to image/audio
    id_column_name="id",
    embeddings_output_s3_bucket_uri=None,  # required when using S3 URI for `csv_path`
    max_rows=None,  # or N to process N rows
)

# Query top 5 similar results
ids, distances, results_df = db.query(
    query_input='An example query',  # or path to an image/audio file
    modality='text',  # "text", "image", or "audio"
    top_k=5
)

# Deploy database as a publicly accessible FastAPI server
# GET /query?query_input='An example query'&modality=[text|image|audio]&top_k=5 to fetch 5 most similar results
db.deploy()

Why another embedding database?

We've streamlined the entire end-to-end process of creating, storing, and retrieving embeddings, making it developer-friendly, seamless, and cost-effective. You no longer have to build custom pipelines or fret over hardware setups & scalability issues. PeachDB ensures you can get started within minutes, leaving the worries of cost optimization and scale to us.

Our key features include:

  • Automated, cost-effective & large-scale embedding computation: We leverage serverless GPU functions (through Modal) to compute embeddings on a large scale efficiently and affordably.
  • Multimodality: Native support for data with mixture of modalities (such as image/audio/video).
  • Highly Customizable: PeachDB allows you to tailor its features to your needs. You can customize:
    • Embedding computation: as described above.
    • Backend: choose between exact_cpu (numpy), exact_gpu (torch), or approx (HNSW).
    • Distance metrics: cosine or l2.
  • Effortless Deployment: Deploy PeachDB as a publicly available server with a single API. No need to worry about nginx or SSL certs.
    • Coming soon: Managed, scalable deployment.
  • Consistent API: Experience the same API across all environments - dev, test, and prod.
  • Open Source: Apache 2.0.

Example

Below is a walkthrough of creating a web server for a music recommendation app. To power the app, we are using the Kaggle 5M song lyric dataset

  • Ssh into your remote instance (doesn't need GPU)

  • Create & activate a new conda environment conda create -n spoti_vibe python=3.10 & conda activate spoti_vibe

  • Install PeachDB: pip install peachdb

  • Setup Modal

    • Create an account at modal.com
    • Install the modal-client package: pip install modal-client
    • Setup token: modal token new
  • (optional: for AWS S3) PeachDB accepts local & S3 paths to datasets for embedding computation. To use S3 URIs, ensure you've installed the aws cli and run aws configure. The credentials should have read & write access to the relevant bucket you plan to use.

  • mkdir spoti_vibe/

  • Create a new module inside the directory server.py

  • Add the following code

    from peachdb import PeachDB
    
    import os
    
    # Fetch the username & key by creating a new API token at https://www.kaggle.com/settings
    os.environ["KAGGLE_USERNAME"] = None  # set user name
    os.environ["KAGGLE_KEY"] = None  # set key
    
    import kaggle  # make sure you've run `pip install kaggle`
    
    kaggle.api.authenticate()
    # It can take a few mins to download depending on the network speed
    kaggle.api.dataset_download_files("nikhilnayak123/5-million-song-lyrics-dataset", path=".", unzip=True)
    
    db = PeachDB(
        project_name="spoti_vibe",
        distance_metric="cosine",
        embedding_backend="exact_cpu",
        embedding_generator="sentence_transformer_L12",
    )
    db.upsert_text(
        csv_path="./ds2.csv",  # dataset name as observed on Kaggle
        column_to_embed="lyrics",
        id_column_name="id",
    )
    
    db.deploy()  # Public URL will be printed to console
    

And that's it! You should now have a publicly available server that can listen to query requests from the user on:
GET <PUBLIC_URL>/query?text='Happy, upbeat summer'&top_k=5

Use-cases

FAQs

Q) How can I delete a project? Run db.delete(project_name="my_app")

Get Involved

We welcome PR contributors and ideas for how to improve the project.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peachdb-0.1.0.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

peachdb-0.1.0-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file peachdb-0.1.0.tar.gz.

File metadata

  • Download URL: peachdb-0.1.0.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for peachdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 365db5c9ded428bc3c57e731fd32b8eed5269b3ad141982ddddda4a7ef59f2b5
MD5 448ad1bcf8a5d857fde972100d71fee1
BLAKE2b-256 26a1ff683948667865bde3ffee07919ec9ef745b6bb7ccfbc1012e04c5025a39

See more details on using hashes here.

File details

Details for the file peachdb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peachdb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for peachdb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5453ed3d53e229f6e218e74b584026fb083d97d4b2182bb68c2820f370caceb1
MD5 e34096f385973a2ea4f655d37b5245db
BLAKE2b-256 d4af5cb280f3823d4854b0511f6f8b1a9181ff9d1b97052ccc838ff23aa58b4c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page