A library for packaging together data + documentation into an agent-friendly duckdb artifact.
Project description
DataGentry
🎩
🧐 🦆
A small library for creating efficient file-specific agents / RAG systems with duckdb.
Overview
Data Gentry packages together:
- Loading data files and data documentation into a duckdb database with pre-built vector and full-text indices on the data dictionary's contents.
- Simple interfaces for chunking + embedding documents and loading data, allowing the user to customize how the duckdb artifact is created.
- Out-of-the-box chunking: Semchunk
- Out-of-the-box embedding: Bedrock
- Hybrid BM-25 / HNSW retrieval on the generated database.
The project is currently in a "proof-of-concept/playing around" phase, but in my mind could help to solve the problem that existing semantic layers are often tightly-coupled to vendors like Databricks or Snowflake, increasing vendor lock-in and coupling to spark workloads that are often overkill for the size of the data in question.
TODO:
- Support vector similarity metrics other than cosine similarity
- Implement a set of tools to allow an agent to work with the artifact
- Convenience functionality to auto-load from fs (/httpfs)?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_gentry-0.1.2.tar.gz.
File metadata
- Download URL: data_gentry-0.1.2.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b96dcdad519168f49662668b3461218e3ec8d0fa57901531cc52f0c5a0218ad8
|
|
| MD5 |
797aa33935fc2114f725e063d4cadeeb
|
|
| BLAKE2b-256 |
4e0c2fbba67705ef565bfe92edb9ec945edd40180e7de6f04706c921d51e332f
|
File details
Details for the file data_gentry-0.1.2-py3-none-any.whl.
File metadata
- Download URL: data_gentry-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c29b3cb6f406611feb0fc1c310a0346090e7fd4afb653977ec77c2c42daac04
|
|
| MD5 |
3df99f66f0e3302ea9c872639818330c
|
|
| BLAKE2b-256 |
4d7f044364fe20c091b44959dff22f08ad2468e2630db7672b3dfe85cc1dd0aa
|