Skip to main content

Embed anything at lightning speed

Project description

Minimalist Framework for building local and multimodal embeddings built in Rust 🦀

Downloads Open in Colab license license license

EmbedAnything is a powerful python library designed to streamline the creation and management of embedding pipelines. Whether you're working with text, images, audio, or any other type of data., EmbedAnything makes it easy to generate embeddings from multiple sources and store them efficiently in a vector database.

🦀The Benefit of Rust for Speed

By using Rust for its core functionalities, EmbedAnything offers significant speed advantages:

➡️Rust is Compiled: Unlike Python, Rust compiles directly to machine code, resulting in faster execution.
➡️Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages
➡️Rust achieves true multithreading.

🚀Why Candle?...

➡️Running language models or embedding models locally can be difficult, especially when you want to deploy a product that utilizes these models.
➡️If you use the transformers library from Hugging Face in Python, you will depend on PyTorch for tensor operations.
➡️ This, in turn, has a dependency on Libtorch, which means that you will need to include the entire Libtorch library with your product.
➡️Also, Candle allows inferences on CUDA-enabled GPUs right out of the box. We will soon post on how we use Candle to increase the performance and decrease the memory usage of EmbedAnything.

Examples

  1. Image Search: Open in Colab

Watch the demo

🚀 Key Features

  • Local Embedding Works with local embedding models like AllminiLM
  • MultiModality Works with text and image and will soon expand to audio
  • Python Interface: Packaged as a Python library for seamless integration into your existing projects.
  • Efficient: Optimized for speed and performance, with core functionality written in Rust.
  • Scalable: Store embeddings in a vector database for easy retrieval and scalability.
  • OpenAI Works with openai as well

💚 Installation

pip install embed-anything

🧑‍🚀 Getting Started

For local models

To use local embedding: we support Bert and Jina

import embed_anything
data = embed_anything.embed_file("filename.pdf", embeder= "Bert")
embeddings = np.array([data.embedding for data in data])

For multimodal embedding: we support CLIP

Requirements Directory with pictures you want to search for example we have test_files with images of cat, dogs etc

import embed_anything
data = embed_anything.embed_directory("test_files", embeder= "Clip")
embeddings = np.array([data.embedding for data in data])

query = "photo of a dog"
query_embedding = np.array(embed_anything.embed_query(query, embeder= "Clip")[0].embedding)
similarities = np.dot(embeddings, query_embedding)
max_index = np.argmax(similarities)
Image.open(data[max_index].text).show()

For OpenAI

  1. Please check if you already have the OpenAI key in the Environment variable.

If you are using embed-anything==0.1.7 version (latest version)

import embed_anything
data = embed_anything.embed_file("filename.pdf", embeder= "OpenAI")
embeddings = np.array([data.embedding for data in data])

🚧 Contributing to EmbedAnything

First of all, thank you for taking the time to contribute to this project. We truly appreciate your contributions, whether it's bug reports, feature suggestions, or pull requests. Your time and effort are highly valued in this project. 🚀

This document provides guidelines and best practices to help you to contribute effectively. These are meant to serve as guidelines, not strict rules. We encourage you to use your best judgment and feel comfortable proposing changes to this document through a pull request.

Table of Content:

  1. [Code of conduct]
  2. [Quick Start]
  3. [RoadMap]

RoadMap

☑️Graph embedding -- build deepwalks embeddings depth first and word to vec
☑️Add whisper for audio embeddings
☑️Zero-shot application
☑️Asynchronous chunks training

✔️ Code of Conduct:

Please read our [Code of Conduct] to understand the expectations we have for all contributors participating in this project. By participating, you agree to abide by our Code of Conduct.

🚀 Quick Start

You can quickly get started with contributing by searching for issues with the labels "Good First Issue" or "Help Needed" in the [Issues Section]. If you think you can contribute, comment on the issue and we will assign it to you.

To set up your development environment, please follow the steps mentioned below :

  1. Fork the repository and create a clone of the fork
  2. Create a branch for a feature or a bug you are working on in your fork
  3. If you are working with OpenAI make sure you have the keys

Contributing Guidelines

🔍 Reporting Bugs

  1. Title describing the issue clearly and concisely with relevant labels
  2. Provide a detailed description of the problem and the necessary steps to reproduce the issue.
  3. Include any relevant logs, screenshots, or other helpful information supporting the issue.

💡 New Feature or Suggesting Enhancements

☑️ ToDo

  • Vector Database Add functionalities to integrate with any Vector Database

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embed_anything-0.1.20.tar.gz (14.0 MB view details)

Uploaded Source

Built Distributions

embed_anything-0.1.20-cp312-none-win_amd64.whl (10.8 MB view details)

Uploaded CPython 3.12 Windows x86-64

embed_anything-0.1.20-cp312-cp312-manylinux_2_34_x86_64.whl (26.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

embed_anything-0.1.20-cp312-cp312-macosx_11_0_arm64.whl (7.2 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

embed_anything-0.1.20-cp312-cp312-macosx_10_12_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.12 macOS 10.12+ x86-64

embed_anything-0.1.20-cp311-none-win_amd64.whl (10.8 MB view details)

Uploaded CPython 3.11 Windows x86-64

embed_anything-0.1.20-cp311-cp311-manylinux_2_34_x86_64.whl (16.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

embed_anything-0.1.20-cp311-cp311-macosx_11_0_arm64.whl (7.2 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

embed_anything-0.1.20-cp311-cp311-macosx_10_12_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11 macOS 10.12+ x86-64

embed_anything-0.1.20-cp310-none-win_amd64.whl (10.8 MB view details)

Uploaded CPython 3.10 Windows x86-64

embed_anything-0.1.20-cp310-cp310-manylinux_2_34_x86_64.whl (16.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

embed_anything-0.1.20-cp310-cp310-macosx_11_0_arm64.whl (7.2 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

embed_anything-0.1.20-cp39-none-win_amd64.whl (10.8 MB view details)

Uploaded CPython 3.9 Windows x86-64

embed_anything-0.1.20-cp39-cp39-manylinux_2_34_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.34+ x86-64

embed_anything-0.1.20-cp39-cp39-macosx_11_0_arm64.whl (7.2 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

embed_anything-0.1.20-cp38-none-win_amd64.whl (10.8 MB view details)

Uploaded CPython 3.8 Windows x86-64

File details

Details for the file embed_anything-0.1.20.tar.gz.

File metadata

  • Download URL: embed_anything-0.1.20.tar.gz
  • Upload date:
  • Size: 14.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.0

File hashes

Hashes for embed_anything-0.1.20.tar.gz
Algorithm Hash digest
SHA256 1fc1a569c3a8d74ec6c7ce662d9b8672a599039a515e001193452cd29bdb438a
MD5 3e1703a37729620e19affa1dff29ffd2
BLAKE2b-256 f23c52b6b59644fb25b3fbbe959c46c54945bad6a2df4c6fc9aa4ee4638d7ecd

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp312-none-win_amd64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 0441b5fe190189546c21d7d8774463ce0fda796f7856579344b6604bfb8eb3cc
MD5 12932b5bd36253c40be173a3aee44539
BLAKE2b-256 190e3ec234b75803e011c3c16e51964421ec727274dbe80cf089d8bc9fab7072

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 fd37bcebf71c61643146ddc12c1092dc980bccaa18c9783233f5a79227078cb6
MD5 c1d2a799d2fb693d3593486d1185d0f3
BLAKE2b-256 a141b06f2a809787858273954f44aa945cd27dad312d24fc12c47d5d751a94ff

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f6da00164c1b2ff487ade73924f0c2bfd17c071bcd31592a4f4e95e3236eccce
MD5 dadc296da8d845d7b14d3a1299f689f0
BLAKE2b-256 33642cff15fc2d2ac264fdfb427a96c91c3a23cb71c02ab3de9a69c85aab0f68

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 927e7804bbf590c5696a12972195f9128937351160c6b55c93a0e60859684fbf
MD5 08e8b4a642fd885af5fbdbf8fdaaf082
BLAKE2b-256 8c26ef39495d894bed50c6342fdb59aadf16fd83e5470abf27505e6db0791261

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 db7aaf7c87efab7f17a669dd10598d625b5c70f0ebc1973442831e674e015b70
MD5 5e8fbebc78b90a416a57c61e442b84ce
BLAKE2b-256 466c5c0a3571e124e77863bf9fcfd46db083c9a0a84ba2128bf8ba79bfd4fdbd

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 64bee42c903d532c93eec134c641bbbce8af8ea5ff33e65dde61d2c673e5df05
MD5 63dfa2cd2a1551fbb89318db2298f573
BLAKE2b-256 a413edfe6269592dcc57ac94140d46df2585bb0ab5754b04befe7c28fc01e6cf

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 df1c643eaf71e6e669bf9f09d0cfe0f5897728421ac6607425f7ce32dfa8eb8f
MD5 a5687b10a36f66346f464dc6f0f7ef77
BLAKE2b-256 d8d71b5185a8bfd301b22234d5222c2de50b1fed0498e4d609f97a3453817d99

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e87b48e2b9c6ebd819e2ae6e1b38b82129c1c424bb7259c1755c3be123d7a2b2
MD5 77d6bd6e5bef65a7f16d49cdf4c95dc3
BLAKE2b-256 da09e185a85f71319f70b7b57d901db6a0924e2d81f23d2a24e31e2b56749948

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 ccdf115c4936d199929110388ff032000c2df25ab9ecb8d2caecee4131ada53a
MD5 fc618c5656add09003f88facb52dc213
BLAKE2b-256 9b21d30f0cc7857fbe434f0265337d2db4cffdc561648bdaf40817f16b8f63ae

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d1dd1b972dd1ab57efd5f7db58d24f2032dd4f155f33296d69979c3ee637e3f5
MD5 72799075fc839ea7c218b05dac3b9f1c
BLAKE2b-256 cfcb777f5a445a973984a6b770dcaaacb9c961f37f356559dd8bef9bfd0d5d32

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 64e9fd334ac426abb34e3ebefd49bf8c09b197968746465180b0b0ea5df814e4
MD5 54a8f3aef5c7ff3c1a14581d86ecc35f
BLAKE2b-256 86004cefa1a2b50b2c1cdaf19c8a5d66614fe4b6a94a8ff4ac6f1b82cf551a64

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 f7979966aae85880f7d9509f026ad088de49a746974264ff58352f4cda008bec
MD5 21eaea681158074bf378d307abf15512
BLAKE2b-256 8f56ab133d334d39707b6bfe3b30d4f6f613b409644e44615f3150bc30490e96

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 3896d53797b802797b1deca271c0c2fb81f4e7815356c697c62456c58b701237
MD5 02454419a21efe1a05ede7aa51829ac1
BLAKE2b-256 a4b0f4fff4238ffb907df479778e262e391e3b07225199b50ff931060d7b4c53

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fd7c94c8d55cf97e9e7903d98b1ddc62f8ecdec53f79ed03d65169a9110cb0cf
MD5 bae8e4edd1e0932d6619e931e6d63185
BLAKE2b-256 f6677de526f6dc66e77349034fcf02ad55ebaed349548af256bc03329a4c387c

See more details on using hashes here.

File details

Details for the file embed_anything-0.1.20-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for embed_anything-0.1.20-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 6ebaf78f80e2b6f8b660765be557894c385c086d39c2b07799d4bebf8a505dad
MD5 632cb44032b6309256629b1b6a7c5f20
BLAKE2b-256 4c9da4f60ec542fb6016bae9a9b0fb31222dc3ca356d25573f644c1a6942fe97

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page