Skip to main content

Embed anything at lightning speed

Project description

Downloads Open in Colab license license license

Minimalist and Robust Framework for local and multimodal embeddings built in Rust 🦀
Explore the docs »

View Demo · Examples · Request Feature

EmbedAnything is a powerful Python library designed to streamline the creation and management of embedding pipelines. Whether you're working with text, images, audio, PDFs, websites, or other media, EmbedAnything simplifies the process of generating embeddings from various sources and storing them in a vector database.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing

🚀 Key Features

  • Local Embedding : Works with local embedding models like BERT and JINA
  • MultiModality : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
  • Rust : All the file processing is done in rust for speed and efficiency
  • Candle : We have taken care of hardware acceleration as well, with Candle.
  • Python Interface: Packaged as a Python library for seamless integration into your existing projects.
  • Scalable: Store embeddings in a vector database for easy retrieval and scalability.
  • OpenAI Supports OpenAI and Whisper embeddings

🦀The Benefit of Rust for Speed

By using Rust for its core functionalities, EmbedAnything offers significant speed advantages:

➡️Faster execution.
➡️Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages
➡️True multithreading.

🤗Why Candle? by Hugging face

➡️Running language models or embedding models locally and efficiently
➡️Candle allows inferences on CUDA-enabled GPUs right out of the box.
➡️Decrease the memory usage of EmbedAnything.

🧑‍🚀 Getting Started

💚 Installation

pip install embed-anything

Usage

To use local embedding: we support Bert and Jina

import embed_anything
data = embed_anything.embed_file("file_path.pdf", embeder= "Bert")
embeddings = np.array([data.embedding for data in data])

For multimodal embedding: we support CLIP

Requirements Directory with pictures you want to search for example we have test_files with images of cat, dogs etc

import embed_anything
data = embed_anything.embed_directory("directory_path", embeder= "Clip")
embeddings = np.array([data.embedding for data in data])

query = "photo of a dog"
query_embedding = np.array(embed_anything.embed_query(query, embeder= "Clip")[0].embedding)
similarities = np.dot(embeddings, query_embedding)
max_index = np.argmax(similarities)
Image.open(data[max_index].text).show()

For OpenAI- Whisper

requirements: Please check if you already have the OpenAI key in the Environment variable.

import embed_anything
import time

start_time = time.time()
data = embed_anything.embed_file(
    "file_path.wav", embeder="Whisper-Bert"
)
print(data[0].metadata)
end_time = time.time()
print("Time taken: ", end_time - start_time)

🚧 Contributing to EmbedAnything

First of all, thank you for taking the time to contribute to this project. We truly appreciate your contributions, whether it's bug reports, feature suggestions, or pull requests. Your time and effort are highly valued in this project. 🚀

This document provides guidelines and best practices to help you to contribute effectively. These are meant to serve as guidelines, not strict rules. We encourage you to use your best judgment and feel comfortable proposing changes to this document through a pull request.

  • Roadmap
  • Quick Start
  • Guidelines
  • RoadMap

    One of the aims of EmbedAnything is to allow AI engineers to easily use state of the art embedding models on typical files and documents. A lot has already been accomplished here and these are the formats that we support right now and a few more have to be done.
    ✅ Markdown, PDFs, and Website
    ✅ WAV File
    ✅ JPG, PNG, webp
    ✅Add whisper for audio embeddings

    Yet to do be done
    ☑️Vector Database: Add functionalities to integrate with any Vector Database
    ☑️Graph embedding -- build deepwalks embeddings depth first and word to vec
    ☑️Zero-shot application
    ☑️Asynchronous chunks training

    ✔️ Code of Conduct:

    Please read our [Code of Conduct] to understand the expectations we have for all contributors participating in this project. By participating, you agree to abide by our Code of Conduct.

    Quick Start

    You can quickly get started with contributing by searching for issues with the labels "Good First Issue" or "Help Needed" in the [Issues Section]. If you think you can contribute, comment on the issue and we will assign it to you.

    To set up your development environment, please follow the steps mentioned below :

    1. Fork the repository from dev, We don't allow direct contribution to main

    Contributing Guidelines

    🔍 Reporting Bugs

    1. Title describing the issue clearly and concisely with relevant labels
    2. Provide a detailed description of the problem and the necessary steps to reproduce the issue.
    3. Include any relevant logs, screenshots, or other helpful information supporting the issue.

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    embed_anything-0.1.21.tar.gz (13.2 MB view details)

    Uploaded Source

    Built Distributions

    embed_anything-0.1.21-cp312-none-win_amd64.whl (10.8 MB view details)

    Uploaded CPython 3.12 Windows x86-64

    embed_anything-0.1.21-cp312-cp312-manylinux_2_34_x86_64.whl (26.5 MB view details)

    Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.21-cp312-cp312-macosx_11_0_arm64.whl (7.2 MB view details)

    Uploaded CPython 3.12 macOS 11.0+ ARM64

    embed_anything-0.1.21-cp312-cp312-macosx_10_12_x86_64.whl (7.4 MB view details)

    Uploaded CPython 3.12 macOS 10.12+ x86-64

    embed_anything-0.1.21-cp311-none-win_amd64.whl (10.8 MB view details)

    Uploaded CPython 3.11 Windows x86-64

    embed_anything-0.1.21-cp311-cp311-manylinux_2_34_x86_64.whl (16.9 MB view details)

    Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.21-cp311-cp311-macosx_11_0_arm64.whl (7.2 MB view details)

    Uploaded CPython 3.11 macOS 11.0+ ARM64

    embed_anything-0.1.21-cp311-cp311-macosx_10_12_x86_64.whl (7.4 MB view details)

    Uploaded CPython 3.11 macOS 10.12+ x86-64

    embed_anything-0.1.21-cp310-none-win_amd64.whl (10.8 MB view details)

    Uploaded CPython 3.10 Windows x86-64

    embed_anything-0.1.21-cp310-cp310-manylinux_2_34_x86_64.whl (16.5 MB view details)

    Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.21-cp310-cp310-macosx_11_0_arm64.whl (7.2 MB view details)

    Uploaded CPython 3.10 macOS 11.0+ ARM64

    embed_anything-0.1.21-cp39-none-win_amd64.whl (10.8 MB view details)

    Uploaded CPython 3.9 Windows x86-64

    embed_anything-0.1.21-cp39-cp39-manylinux_2_34_x86_64.whl (16.4 MB view details)

    Uploaded CPython 3.9 manylinux: glibc 2.34+ x86-64

    embed_anything-0.1.21-cp39-cp39-macosx_11_0_arm64.whl (7.2 MB view details)

    Uploaded CPython 3.9 macOS 11.0+ ARM64

    embed_anything-0.1.21-cp38-none-win_amd64.whl (10.8 MB view details)

    Uploaded CPython 3.8 Windows x86-64

    File details

    Details for the file embed_anything-0.1.21.tar.gz.

    File metadata

    • Download URL: embed_anything-0.1.21.tar.gz
    • Upload date:
    • Size: 13.2 MB
    • Tags: Source
    • Uploaded using Trusted Publishing? Yes
    • Uploaded via: maturin/1.7.0

    File hashes

    Hashes for embed_anything-0.1.21.tar.gz
    Algorithm Hash digest
    SHA256 1c68abac9837c572bc0ea9668c1c22af789ff24a7f3b27cc60c0b5485657fc52
    MD5 f9da657a181aeca8c3e151efbcbbfac7
    BLAKE2b-256 d8b2c62b3234547e7b35af4bb5507e903653a62be7d24a38a397ec94a499b7fa

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp312-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp312-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 34eb65de69c191aa5443abcc08089dc5accc26e72718fd2db55c0c8d37d90609
    MD5 835e7ee729281eddc138748ecd2d9990
    BLAKE2b-256 d7983e67c23c8a98b9904409831c2276ff4a5843bbfc0d057afecb0527ffdda3

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp312-cp312-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp312-cp312-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 9653c34707df17c46e87babc80f02f646add2d022bc037d9098ae82bf82c0bb1
    MD5 3479fe6649688a6f9d542bcb8b26cb03
    BLAKE2b-256 ce3663b17c5f68ce7d0cf20d8a5a2f8ee4342704877a37e2f810feccc5d8d535

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp312-cp312-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp312-cp312-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 329d9c9e7c7c96507ff0fbf20931b711ec611f3c3032366f5b885ac4d5560642
    MD5 d7adfbda4e869b9ad489ff84a0e8f8ae
    BLAKE2b-256 6c9d4b052d5ebe7d17d2afeb135bc6cba7a61889d0390a6056c7cf8f62d1c357

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp312-cp312-macosx_10_12_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp312-cp312-macosx_10_12_x86_64.whl
    Algorithm Hash digest
    SHA256 fa1e85d435894519357d28bc1752fca41e4bde4424fc171ee367aa9519879b44
    MD5 05bb071b686d531d87eb9d752b1b7162
    BLAKE2b-256 8b528c27f1080a33b4fedb6d918567250b1681bd69ddb82088674dd54892d00a

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp311-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp311-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 fdf315541b6e6bb9c3fd4cf2e123d3b839bc4a3c1edba8dfb3c3281b538a82b3
    MD5 88f62818328faa58504511154457369d
    BLAKE2b-256 0b9d358bfa8ba2cbdb53700b668da062d66db9717246a659655d6e857380cb98

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp311-cp311-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp311-cp311-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 ba66ff4fed6aee51643efb9c3a9c65ae1e7c3803a622e377e8a9a03f946f9d36
    MD5 5ee34874e84359415c6121d9e1ac0727
    BLAKE2b-256 8300a5ded741e5f03e8a162e152ad9dd69b39448b02436e9b8ca979e02cd1a10

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp311-cp311-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp311-cp311-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 9e6aa4a61d457d23da905b5e4ebdbfc6cf1d65451d5b8334811c2e800a267295
    MD5 6864d4c9b03e3f11d505deed1a4325cb
    BLAKE2b-256 72dce9381cc518d60a5bcaa16fb006b133c0355b1fc0eba942e8273c10a05e5c

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp311-cp311-macosx_10_12_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp311-cp311-macosx_10_12_x86_64.whl
    Algorithm Hash digest
    SHA256 6eeb89710d038ec471d57242832d0230d7d078be79937131cce7db640fd091a9
    MD5 d5494e95ace3d9894d6297f6f03f39a4
    BLAKE2b-256 88a7b61d6b46e7daa96fed74173053c9ee5f14a50998fca4719e11d9b516964b

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp310-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp310-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 c5a7f0ea68718d6355178d6c2795b6393390801d21949f6c5f5fac5a85e1af2a
    MD5 cdfa1161f51d7bf579b142c77bb1ff83
    BLAKE2b-256 89d0928efd1f13aeacd789c52cd2a9428544f96ead777f7690df600ef2468e37

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp310-cp310-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp310-cp310-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 39e372648013b6d44e6671e3be86ff59d95a474ee5f17482d5c1834c9761f902
    MD5 d090134cdb7adcf2beb11edad0c7dd60
    BLAKE2b-256 29acbe6d3ff0bfa95e91dccf3a688a6859f4bc74dd85d9036e7b10170fdefe6c

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp310-cp310-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp310-cp310-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 7fdd7e528457d77c738607eaa6fe835675d8b6f94128e908aa2be496859968e6
    MD5 076a52ed19bab49ee833cf1170e4094e
    BLAKE2b-256 ff4a6513a26559eccb358cee0f04a7f5f85ab74c3db8741f9ae2f5f27eaaa320

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp39-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp39-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 747b2b5b543c624fa794ce21d981f09bc18b923a19fb7aeea8b14b82eec83700
    MD5 c07458e83ee280854846bb0fd52def30
    BLAKE2b-256 fb2d11f5c5a5f6dea6d3e204ddd318421bb7435a7e3b73b79ce3402aef616aba

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp39-cp39-manylinux_2_34_x86_64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp39-cp39-manylinux_2_34_x86_64.whl
    Algorithm Hash digest
    SHA256 1e29bd272b0343186c1d45f4e4b0a141263c522cabd9f9503427f83772bd5b73
    MD5 5c49fde4c091f38da5a01cb61644a091
    BLAKE2b-256 e5d43249d69dcfbc341aa356467d1bc0679b6fe3595400940c431030b611ea66

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp39-cp39-macosx_11_0_arm64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp39-cp39-macosx_11_0_arm64.whl
    Algorithm Hash digest
    SHA256 50dc5084d35f83e12799e7adebdcd64a71b0cf7c75dd94a74491e8c1348d6c91
    MD5 cfdd821edd2a7c126f8488a0d48e62c1
    BLAKE2b-256 e0782c1b482588562ec07f7a2d0ac130f8bf827255630aa6de820f31ddf509b4

    See more details on using hashes here.

    File details

    Details for the file embed_anything-0.1.21-cp38-none-win_amd64.whl.

    File metadata

    File hashes

    Hashes for embed_anything-0.1.21-cp38-none-win_amd64.whl
    Algorithm Hash digest
    SHA256 232ceb38d372b114e7c28934c35da4c055219785a44fe89f4e41f369b3cacc0d
    MD5 a7bcb17337a2e2c8df0daad63171081d
    BLAKE2b-256 5908eb581b8efb870ae8e41a863eb71f377b25d079104b2bf3ff2301c14e3cf3

    See more details on using hashes here.

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page