Skip to main content

A library for packaging together data + documentation into an agent-friendly duckdb artifact.

Project description

DataGentry

🎩
🧐 🦆

A small library for creating efficient file-specific agents / RAG systems with duckdb.

Overview

Data Gentry packages together:

  • Loading data files and data documentation into a duckdb database with pre-built vector and full-text indices on the data dictionary's contents.
  • Simple interfaces for chunking + embedding documents and loading data, allowing the user to customize how the duckdb artifact is created.
    • Out-of-the-box chunking: Semchunk
    • Out-of-the-box embedding: Bedrock
  • Hybrid BM-25 / HNSW retrieval on the generated database.

The project is currently in a "proof-of-concept/playing around" phase, but in my mind could help to solve the problem that existing semantic layers are often tightly-coupled to vendors like Databricks or Snowflake, increasing vendor lock-in and coupling to spark workloads that are often overkill for the size of the data in question.

TODO:

  • Support vector similarity metrics other than cosine similarity
  • Implement a set of tools to allow an agent to work with the artifact
  • Convenience functionality to auto-load from fs (/httpfs)?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_gentry-0.1.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_gentry-0.1.2-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file data_gentry-0.1.2.tar.gz.

File metadata

  • Download URL: data_gentry-0.1.2.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_gentry-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b96dcdad519168f49662668b3461218e3ec8d0fa57901531cc52f0c5a0218ad8
MD5 797aa33935fc2114f725e063d4cadeeb
BLAKE2b-256 4e0c2fbba67705ef565bfe92edb9ec945edd40180e7de6f04706c921d51e332f

See more details on using hashes here.

File details

Details for the file data_gentry-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: data_gentry-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_gentry-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2c29b3cb6f406611feb0fc1c310a0346090e7fd4afb653977ec77c2c42daac04
MD5 3df99f66f0e3302ea9c872639818330c
BLAKE2b-256 4d7f044364fe20c091b44959dff22f08ad2468e2630db7672b3dfe85cc1dd0aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page