Skip to main content

Knowledge-Aware Re-embedding Algorithm - Efficient RAG knowledge base updates

Project description

KARA - Knowledge-Aware Re-embedding Algorithm

License: CC BY 4.0 PyPI version Python Support

KARA is a Python library for efficient document updates in RAG systems. It minimizes embedding operations by intelligently reusing existing chunks when documents are updated.

Installation

pip install kara-toolkit

Quick Start

from kara import KARAUpdater, RecursiveCharacterChunker

# Initialize
updater = KARAUpdater(
    chunker=RecursiveCharacterChunker(chunk_size=1000),
    epsilon=0.1
)

# Process initial documents
updater.initialize(["Your document content..."])

# Update with new content
result = updater.update(["Updated document content..."])
print(f"Efficiency: {result.efficiency_ratio:.1%}")

How It Works

KARA formulates the chunking problem as a DAG (Directed Acyclic Graph) for a single document where each node represents a position in the document splits, and edges represent possible chunks. It then uses Dijkstra's algorithm to find the optimal chunking path.

Examples

See the examples/ directory for more usage examples.

License

CC BY 4.0 License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kara_toolkit-0.1.0.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kara_toolkit-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file kara_toolkit-0.1.0.tar.gz.

File metadata

  • Download URL: kara_toolkit-0.1.0.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kara_toolkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6cef576ac20c9d50e45ad92218076f979d93debf7e383adfd53453547d36bc84
MD5 9681305ff989dce34208c008dd9473b3
BLAKE2b-256 ba8c48ee8e2d5f94f34607e31318be64278ac9c052702e19df8f23bfcd53ad02

See more details on using hashes here.

Provenance

The following attestation bundles were made for kara_toolkit-0.1.0.tar.gz:

Publisher: publish.yml on mzakizadeh/kara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kara_toolkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kara_toolkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kara_toolkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7453757d065104bae35fe755acbd21945177c51d30745bb5d1007940122d8936
MD5 c699ea85f1e0636a0258c783dab8be42
BLAKE2b-256 637c2c1a08ea53409e37a50d673a4cd7b8201f423d39f7d9b753944170f3db7b

See more details on using hashes here.

Provenance

The following attestation bundles were made for kara_toolkit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mzakizadeh/kara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page