A Python library using Rust and PyO3 to index Arrow tables
Project description
DF Embedder
DF Embedder allows you to effortlessly turn your DataFrames into fast vector stores in 3 lines of code.
df = pl.read_csv("tmdb.csv")
arrow_table = df.to_arrow()
embedder = DfEmbedder(database_name="tmdb_db")
Description
DF Embedder is a high-performance Python library (with a Rust backend) for indexing and embedding Apache Arrow compatible DataFrames (like Polars or Pandas) into low latency vector databases based on Lance files.
- Rust: For blazing-fast, multi-threaded embedding and indexing.
- Apache Arrow: To seamlessly work with data from libraries like Polars, Pandas (via PyArrow), etc.
- Static Embeddings: Uses efficient static embedding model for generating text embedding 100X faster.
- Lance Format: For optimized storage and fast vector similarity searches.
- PyO3: To provide a clean and easy-to-use Python API.
How fast is DF Embedder? benchamrks are often misleading and users should run their own analysis. To give a general idea, I was able to index about 1.2M rows from the TMDB movie dataset in about 100 seconds, using a machine with 10 CPU cores. Thats reading, embedding, indexing and writing more than 10K rows per second. And there are still ways to improve its performance by further tunning its params.
Usage
import polars as pl # could also use Pandas or DuckDB
import pyarrow as pa # Although not directly used, good practice to import
from dfembed import DfEmbedder
# Load data from a CSV using Polars
df = pl.read_csv("tmdb.csv")
# transform to PyArrow Table format
arrow_table = df.to_arrow()
# Configure database path, and optional performance params
embedder = DfEmbedder(
num_threads=8, # Use 8 threads for embedding or defaults to avail num of cores
write_buffer_size=3500, # Buffer 3500 embeddings before writing
database_name="tmdb_db", # Path to the Lance database directory
)
table_name = "tmdb_table"
embedder.index_table(arrow_table, table_name=table_name)
# get 10 most similar items
query = "adventures jungle animals"
results = embedder.find_similar(query=query, table_name=table_name, k=10)
How It Works
- The
DfEmbedderPython class acts as a user-friendly wrapper. - It initializes and manages an instance of the
DfEmbedderRuststruct, implemented in Rust. - When
index_tableis called with a PyArrowTable:- The Rust backend receives the Arrow data.
- It uses a static embedding model (configured internally) to generate vector embeddings for the specified text data, potentially using multiple threads for speed.
- The embeddings, along with original data, are written efficiently to a Lance dataset within the specified database directory and table name.
- When
find_similaris called:- The query string is embedded using the same static model.
- The Rust backend uses Lance's optimized search capabilities to find the
knearest neighbors to the query vector within the specified table. - The results (e.g., identifiers or relevant data) are returned to Python.
License
MIT
GitHub Actions CI/CD
This project uses GitHub Actions for continuous integration and deployment, automatically building wheels for multiple platforms and Python versions.
Automated Builds
The CI/CD pipeline automatically:
-
Builds wheels for:
- Linux (manylinux2014)
- macOS (Intel x86_64 and Apple Silicon ARM64)
- Windows
- Python versions 3.8, 3.9, 3.10, 3.11, and 3.12
-
Tests the built wheels on each platform to ensure they work correctly
-
Publishes to PyPI when a new tag is pushed (format:
v*, e.g.,v0.1.2)
Workflow Files
.github/workflows/build.yml: Builds wheels for all platforms and Python versions.github/workflows/test.yml: Tests the built wheels to ensure they work correctly
Releasing a New Version
To release a new version:
- Update the version in
Cargo.tomlandpyproject.toml - Commit and push your changes
- Create and push a new tag with the format
v{version}(e.g.,v0.1.2):git tag v0.1.2 git push origin v0.1.2
- The GitHub Actions workflow will automatically build wheels and publish them to PyPI
Manual Builds
You can also manually trigger the build workflow from the GitHub Actions tab in your repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dfembed-0.1.1.tar.gz.
File metadata
- Download URL: dfembed-0.1.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86daa09e7427c269dde9d6e420f50d5b548e8fd6c3202ae713aa1cf2d28cc216
|
|
| MD5 |
3c043d7d2e683357d45967f7099ef7fc
|
|
| BLAKE2b-256 |
5f5c04fb922e1c56904cc911b30fd29a018f3e205a8fbe4a5e39d9313834ba59
|
File details
Details for the file dfembed-0.1.1-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 35.6 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73e3e3f737b77ff881d8f223c64470c8f777a4e987792f88d239a7e110972382
|
|
| MD5 |
1f5af56f465a36a3992f833ff4ac066c
|
|
| BLAKE2b-256 |
a107f9b019c9bb9475ebdb7aea37efccc479f260b3695df9d6d18591eab34500
|
File details
Details for the file dfembed-0.1.1-cp313-cp313-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp313-cp313-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 44.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a0a0f4dbc8d7ac13c5133837ae869cec58f6954fbc7a21e92ff8f8aba4e36d0
|
|
| MD5 |
a5fdbfe92a810d3cb73f38b4741611c0
|
|
| BLAKE2b-256 |
50f9d2b340413009fcc2a5b9cda143b2df3367003aa3c6f96c9384e657edffbe
|
File details
Details for the file dfembed-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 37.0 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b81c402b20a97a971fe3cdb3afa6bdd00594f8b3a0b21cb8003eb10f20007f91
|
|
| MD5 |
fb6829a57270cb59bf59a49af7730409
|
|
| BLAKE2b-256 |
c2fe60f2f8a08a84078741a4b50317255ef464771e2d4c0d800c85799e9ae4b2
|
File details
Details for the file dfembed-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
- Upload date:
- Size: 39.2 MB
- Tags: CPython 3.13, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
881f91d784a69a239f57b0ff62fce98642111b97311b7348ee2704f5b1f5c819
|
|
| MD5 |
e2db99586ab14093c9b396fc03259473
|
|
| BLAKE2b-256 |
ab271c312e7c2ca4e9f8080265da8470f2fd5844263ed2e50553350151f0e1c8
|
File details
Details for the file dfembed-0.1.1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 35.6 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
552b97295a6cbc775c251195f61798dfe87d54295257a6425c600cf5f4c245b3
|
|
| MD5 |
ea7a0a3ce1567ab1a6fd1a8974f37194
|
|
| BLAKE2b-256 |
5980e49fe91c1094cf0310c9d7f2aebe269b7a84acdf027dcff3156f503391b7
|
File details
Details for the file dfembed-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 37.0 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd15a2791ce5b6173c2d0c4abe197ccb23aeabbeb54c765d13884f46249a7b78
|
|
| MD5 |
a50fe2a16193b8234a95f4665de0c961
|
|
| BLAKE2b-256 |
1373ff31b056e7a0b938da98e846db3685d651a00ed5e399385a78b1175a1e43
|
File details
Details for the file dfembed-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 39.2 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e80fc26ef515d5583b3aa9920ec116b92b6d8f164702d5c7b50925272ff79c59
|
|
| MD5 |
5d27a9372966fab784d87edccc46cbb1
|
|
| BLAKE2b-256 |
de201cd2d653d1d60d840b55bd6f0d0f45bb5f121e17f38c5916aa6e867f7f2c
|
File details
Details for the file dfembed-0.1.1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 35.6 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dc96b23d592a3b41260c021d2bec0db9eba021ea692bade59be577066816087
|
|
| MD5 |
b07400ffb8a2c81447599663becd68e8
|
|
| BLAKE2b-256 |
add951122168adcc1897e4341e093106bda688eef9676b14dcd13b47e69d2192
|
File details
Details for the file dfembed-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 37.0 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5aff8303880cdce07307f24c7826c5e4ba083ac110ec0c5c029cec67beb19d74
|
|
| MD5 |
39e85c08c06b3b5ab618dc29586f1341
|
|
| BLAKE2b-256 |
6f66828669dde58b75bd4873a99134afd2ebb0285a935249c30d3f86a593d4ba
|
File details
Details for the file dfembed-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 39.2 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
720ec48678e0a87b8ee7318b48693729af8a4e99b6a6dd74366a1f6ec40e6fe6
|
|
| MD5 |
8e083b5a75083ff01c604b74bc5f1a36
|
|
| BLAKE2b-256 |
c3dc56a496de42390b71fb3231e9db4a381f11fe8a94a1e9585de6d812d42263
|
File details
Details for the file dfembed-0.1.1-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 35.6 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
022e3776a9d26818dd488c06df5091cd375969596834d7fd16f33982a258d4ef
|
|
| MD5 |
e640efcafa734679f20b2a2925aae743
|
|
| BLAKE2b-256 |
35a3b9b7ae8f60c6f3fed0fab9c3b7446f7597ff24a35d47ee81804ffed59edd
|
File details
Details for the file dfembed-0.1.1-cp310-cp310-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp310-cp310-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 44.7 MB
- Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b086a6a65c926bc45a4a3812643dc1d1d63f33d2f128c2ecfbc64982c624c8b
|
|
| MD5 |
b3e564310d67e747e8cae07511076449
|
|
| BLAKE2b-256 |
e6aaf64fab45fe35bac1f8af4a1b4fb75c7e70e3a99701991dc93b7fe931b804
|
File details
Details for the file dfembed-0.1.1-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 35.6 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e6b48dd45d0986f260f401a6b370d4af34735b32b8376db57aa0148522ff6a5
|
|
| MD5 |
6fbd5a8fd84e586cc2fc5bf58933b646
|
|
| BLAKE2b-256 |
79ac13255dc397d3d6324a71c8f87816655574433906bb5003f49e5fb575e251
|
File details
Details for the file dfembed-0.1.1-cp38-cp38-win_amd64.whl.
File metadata
- Download URL: dfembed-0.1.1-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 35.6 MB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
794e7f578a00890c24f15aa9cf0fcd761551f60a7dc29b0e8a6ec45bb497fba1
|
|
| MD5 |
3609d41d72a663495affaf9c05aaf135
|
|
| BLAKE2b-256 |
e9a0d17e45b8a72af90e7a3d066c31c72f5f8ba1ea8620f8d8e3edf65cef97d4
|