Skip to main content

Embed anything at lightning speed

Project description

Downloads Open in Colab license package discord

Supercharge your embedding pipeline with minimalist and lightening fast framework built in rust 🦀
Explore the docs »

View Demo · Examples · Request Feature . Search in Audio Space

EmbedAnything is a minimalist yet highly performant, lightweight, lightening fast, multisource, multimodal and local embedding pipeline, built in rust. Whether you're working with text, images, audio, PDFs, websites, or other media, EmbedAnything simplifies the process of generating embeddings from various sources and storing them in a vector database.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. How to add custom model and chunk size

🚀 Key Features

  • Local Embedding : Works with local embedding models like BERT and JINA
  • Cloud Embedding Models:: Supports OpenAI. Mistral and Cohere Support coming soon.
  • MultiModality : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
  • Rust : All the file processing is done in rust for speed and efficiency
  • Candle : We have taken care of hardware acceleration as well, with Candle.
  • Python Interface: Packaged as a Python library for seamless integration into your existing projects.
  • Scalable: Store embeddings in a vector database for easy retrieval and scalability.

🦀 Why Embed Anything

➡️Faster execution.
➡️Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages
➡️True multithreading
➡️Running language models or embedding models locally and efficiently
➡️Candle allows inferences on CUDA-enabled GPUs right out of the box.
➡️Decrease the memory usage of EmbedAnything.

⭐ Supported Models

We support a range of models, that can be supported by Candle, We have given a set of tested models but if you have specific usecase do mention it in the issue.

How to add custom model and Chunk Size.

jina_config = JinaConfig(
    model_id="Custom link given below", revision="main", chunk_size=100
)
embed_config = EmbedConfig(jina=jina_config)
Model Custom link
Jina jinaai/jina-embeddings-v2-base-en
jinaai/jina-embeddings-v2-small-en
Bert sentence-transformers/all-MiniLM-L6-v2
sentence-transformers/all-MiniLM-L12-v2
sentence-transformers/paraphrase-MiniLM-L6-v2
Clip openai/clip-vit-base-patch32
Whisper Most OpenAI Whisper from huggingface supported.

🧑‍🚀 Getting Started

💚 Installation

pip install embed-anything

Usage

To use local embedding: we support Bert and Jina

import embed_anything
data = embed_anything.embed_file("file_path.pdf", embeder= "Bert")
embeddings = np.array([data.embedding for data in data])

For multimodal embedding: we support CLIP

Requirements Directory with pictures you want to search for example we have test_files with images of cat, dogs etc

import embed_anything
data = embed_anything.embed_directory("directory_path", embeder= "Clip")
embeddings = np.array([data.embedding for data in data])

query = ["photo of a dog"]
query_embedding = np.array(embed_anything.embed_query(query, embeder= "Clip")[0].embedding)
similarities = np.dot(embeddings, query_embedding)
max_index = np.argmax(similarities)
Image.open(data[max_index].text).show()

Audio Embedding using Whisper

requirements: Audio .wav files.

import embed_anything
from embed_anything import JinaConfig, EmbedConfig, AudioDecoderConfig
import time

start_time = time.time()

# choose any whisper or distilwhisper model from https://huggingface.co/distil-whisper or https://huggingface.co/collections/openai/whisper-release-6501bba2cf999715fd953013
audio_decoder_config = AudioDecoderConfig(
    decoder_model_id="openai/whisper-tiny.en",
    decoder_revision="main",
    model_type="tiny-en",
    quantized=False,
)
jina_config = JinaConfig(
    model_id="jinaai/jina-embeddings-v2-small-en", revision="main", chunk_size=100
)

config = EmbedConfig(jina=jina_config, audio_decoder=audio_decoder_config)
data = embed_anything.embed_file(
    "test_files/audio/samples_hp0.wav", embeder="Audio", config=config
)
print(data[0].metadata)
end_time = time.time()
print("Time taken: ", end_time - start_time)

🚧 Contributing to EmbedAnything

First of all, thank you for taking the time to contribute to this project. We truly appreciate your contributions, whether it's bug reports, feature suggestions, or pull requests. Your time and effort are highly valued in this project. 🚀

This document provides guidelines and best practices to help you to contribute effectively. These are meant to serve as guidelines, not strict rules. We encourage you to use your best judgment and feel comfortable proposing changes to this document through a pull request.

  • Roadmap
  • Quick Start
  • Guidelines
  • RoadMap

    One of the aims of EmbedAnything is to allow AI engineers to easily use state of the art embedding models on typical files and documents. A lot has already been accomplished here and these are the formats that we support right now and a few more have to be done.
    ✅ Markdown, PDFs, and Website
    ✅ WAV File
    ✅ JPG, PNG, webp
    ✅Add whisper for audio embeddings
    ✅Custom model upload, anything that is available in candle
    ✅Custom chunk size
    ✅Pinecone Adapter, to directly save it on it.
    ✅Zero-shot application

    Yet to do be done
    ☑️Vector Database: Add functionalities to integrate with any Vector Database
    ☑️Graph embedding -- build deepwalks embeddings depth first and word to vec
    ☑️Asynchronous chunks training

    ✔️ Code of Conduct:

    Please read our [Code of Conduct] to understand the expectations we have for all contributors participating in this project. By participating, you agree to abide by our Code of Conduct.

    Quick Start

    You can quickly get started with contributing by searching for issues with the labels "Good First Issue" or "Help Needed" in the [Issues Section]. If you think you can contribute, comment on the issue and we will assign it to you.

    To set up your development environment, please follow the steps mentioned below :

    1. Fork the repository from dev, We don't allow direct contribution to main

    Contributing Guidelines

    🔍 Reporting Bugs

    1. Title describing the issue clearly and concisely with relevant labels
    2. Provide a detailed description of the problem and the necessary steps to reproduce the issue.
    3. Include any relevant logs, screenshots, or other helpful information supporting the issue.

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    embed_anything-0.1.24.tar.gz (866.2 kB view details)

    Uploaded Source

    Built Distributions

    embed_anything-0.1.24-cp312-none-win_amd64.whl (10.9 MB view details)

    Uploaded CPython 3.12 Windows x86-64

    embed_anything-0.1.24-cp312-cp312-manylinux_2_34_x86_64.whl (14.3 MB view details)

    Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.24-cp312-cp312-macosx_11_0_arm64.whl (7.3 MB view details)

    Uploaded CPython 3.12 macOS 11.0+ ARM64

    embed_anything-0.1.24-cp312-cp312-macosx_10_12_x86_64.whl (7.5 MB view details)

    Uploaded CPython 3.12 macOS 10.12+ x86-64

    embed_anything-0.1.24-cp311-none-win_amd64.whl (10.9 MB view details)

    Uploaded CPython 3.11 Windows x86-64

    embed_anything-0.1.24-cp311-cp311-manylinux_2_34_x86_64.whl (14.3 MB view details)

    Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.24-cp311-cp311-macosx_11_0_arm64.whl (7.3 MB view details)

    Uploaded CPython 3.11 macOS 11.0+ ARM64

    embed_anything-0.1.24-cp311-cp311-macosx_10_12_x86_64.whl (7.5 MB view details)

    Uploaded CPython 3.11 macOS 10.12+ x86-64

    embed_anything-0.1.24-cp310-none-win_amd64.whl (10.9 MB view details)

    Uploaded CPython 3.10 Windows x86-64

    embed_anything-0.1.24-cp310-cp310-manylinux_2_34_x86_64.whl (14.3 MB view details)

    Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.24-cp310-cp310-macosx_11_0_arm64.whl (7.3 MB view details)

    Uploaded CPython 3.10 macOS 11.0+ ARM64

    embed_anything-0.1.24-cp39-none-win_amd64.whl (10.9 MB view details)

    Uploaded CPython 3.9 Windows x86-64

    embed_anything-0.1.24-cp39-cp39-manylinux_2_34_x86_64.whl (14.3 MB view details)

    Uploaded CPython 3.9 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.24-cp39-cp39-macosx_11_0_arm64.whl (7.3 MB view details)

    Uploaded CPython 3.9 macOS 11.0+ ARM64

    embed_anything-0.1.24-cp38-none-win_amd64.whl (10.9 MB view details)

    Uploaded CPython 3.8 Windows x86-64

    File details

    Details for the file embed_anything-0.1.24.tar.gz.

    File metadata

    • Download URL: embed_anything-0.1.24.tar.gz
    • Upload date:
    • Size: 866.2 kB
    • Tags: Source
    • Uploaded using Trusted Publishing? Yes
    • Uploaded via: maturin/1.7.0

    File hashes

    Hashes for embed_anything-0.1.24.tar.gz
    Algorithm Hash digest
    SHA256 99f56cede5281fbb0a1f0cfb4b19c36b3a79d7d189014073aeb82232f712a7ea
    MD5 e2d18495fc6fb0ed7dec82a94461c908
    BLAKE2b-256 a9d791a278bd9a0a11654c51001d2b96881505bc602c21f5817ba31535b43796

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp312-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp312-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 ace63a422188512bff0d0400334a46bcb428cd3113b308b73de94321b4bc6d4e
    MD5 005a7381ad37836ea2a91defeb062b9d
    BLAKE2b-256 ffc3c9a01ded655dfaf80c9e6db6b9fd606555f722e28d689ec356ca052da16d

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp312-cp312-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp312-cp312-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 f328476a57f44468d0897cd891269bd3c902e440bfac9cf3c8b6c7511a26cdc1
    MD5 c7d9ffba5f3ada83d842edf278f6fa43
    BLAKE2b-256 ee82198b382c161ff7854be402210beb053c301d05b0b474be2d967a571d4680

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp312-cp312-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp312-cp312-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 ca326d64daaf2377ee08cc05502616df7b958267b36f53844cb7431f60d054ed
    MD5 95205fc652bb371ba9eceb150e4b6659
    BLAKE2b-256 e5d096c5db3194dd3ddf09be4d085a982c82f388bfde02d571e2a8c1d017d60c

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp312-cp312-macosx_10_12_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp312-cp312-macosx_10_12_x86_64.whl
    Algorithm Hash digest
    SHA256 1da90479a1f9b09ec82f994333bd24c7dd456a60ed63ab4c98e6d22e2e080984
    MD5 bf6d827e17583a812883aab571ff557d
    BLAKE2b-256 32ded37fbdb5a042c7babbf83e2803c5f710ad063508ab6436f24313a2aa05b3

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp311-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp311-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 4e338af3c5dc411204dd5011dfe156b65534a4f4980917cfe891a65819c1f9d8
    MD5 356ef35962f73195d15ae83c55274ee0
    BLAKE2b-256 c0bac9d199b7f5c62a79998d884126b1a84d1636d9814812f3a9f8394704254b

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp311-cp311-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp311-cp311-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 5d82c34009c7e7c2521ee114151d66fe8a60cc75fdc14639de498ac514a4bc8e
    MD5 f7f672d8d4f26d26e2c03711e0af6577
    BLAKE2b-256 843b44d88292e13de2cb4239b8a61a1f9c1c54339ea94fd6b0b3daee9de8733a

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp311-cp311-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp311-cp311-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 91ecf94bf7775b1986bb539aad6cc387d286ea6f677b78507d778f53e5577cf7
    MD5 b4a46ad8e0b111e0fe9cfefc78dd9fe4
    BLAKE2b-256 f211b44b7fb17b779b26a3555694e0f9fb53ac068793418afadd211e96893ec6

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp311-cp311-macosx_10_12_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp311-cp311-macosx_10_12_x86_64.whl
    Algorithm Hash digest
    SHA256 5a54f4394269fa9f5ee2da350443bd69420029f982c476b74611903a21ef6aa2
    MD5 a4721b2e2badca8658a93fe6d7647b28
    BLAKE2b-256 5dda8ce6c0196ccfce430c9976c0493c4315d816422e50883e3597a0dd1c1c88

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp310-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp310-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 c9c3f4ed92a6861f1e82d0e1ab3b10c5815581ea9990aad27b34f7f6c7f1c40a
    MD5 a43eae966bb2a44dd8ba831598ae4b98
    BLAKE2b-256 31f8e5490612dcd152821edd47c426840d2409676e0b31192bc45914bf3e1e99

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp310-cp310-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp310-cp310-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 18ed4a032680833cb535a65e6983e9e095f3756d5ccf7ee7d242633b23ca3828
    MD5 05b6a87fb6267260c4efd072bb943016
    BLAKE2b-256 3d8d5204d3045f5234f39aa4906e1a75cf1d0d9ca0f9f86cafb534d646b6cdca

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp310-cp310-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp310-cp310-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 f32f46b9d6dabaebde1e01b634c9a786d32fbf6bc5e868c8cd04cc88ae1a2906
    MD5 8c630166f62cec516a2ac93df7eff6d9
    BLAKE2b-256 8c01eb5542ef110fb5f044e033bc043b7ab9e2f4704249d6b109437777e59312

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp39-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp39-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 77f8456876cbe68ed84a6909ed8e3e65ff292f2613f1f407ae9ec4d0ef428f5e
    MD5 46db605fba7e6ccb5315133cd5a61fce
    BLAKE2b-256 f3fd0c86f1cd2be2dc05a573c02ef7c42630936c9b18835e3411dde1ba4f5b5b

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp39-cp39-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp39-cp39-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 9ab0e8d712212f36e510a9a8aafbed23ed6b26bba2e298d7396d26ab302879f7
    MD5 7c80bad6e440e1eebbbeb53a6dd21233
    BLAKE2b-256 c8f612aff0deecfb174cac581cb8f94fc516db95229dbc3bbbb6379e9e44136d

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp39-cp39-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp39-cp39-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 4e995a1da0083db2df5378c744f756e9ab70299b60a071effda396235e47075a
    MD5 4aa38494ce80b8b0ffb9944824f912cd
    BLAKE2b-256 51610e4d4c55169c390bbdd2bed2c270dc24273e89928e23e4466de7fed9789e

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.24-cp38-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.24-cp38-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 2e7c8fcd36c0b988545f0354dd9c5817c1f68ee048e7ff54ed5988d913392791
    MD5 51e9655f89010ae0d92a759e071e2108
    BLAKE2b-256 52b574581278de7c4f49235110f1e366318db2cc9e27a7346792374efa163918

    See more details on using hashes here.

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page