Skip to main content

Quackling enables document-native generative AI applications

Project description

Quackling

Quackling

PyPI version Python Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT

Quackling enables document-native generative AI applications, such as RAG, based on Docling.

Features

  • 🧠 Enables rich gen AI applications by providing capabilities on native document level — not just plain text / Markdown!
  • ⚡️ Leverages Docling's conversion quality and speed.
  • ⚙️ Integrates with standard LLM application frameworks, such as LlamaIndex, for building powerful applications like RAG.

Doc-native RAG

Installation

To use Quackling, simply install quackling from your package manager, e.g. pip:

pip install quackling

Usage

Quackling offers core capabilities (quackling.core), as well as framework integration components e.g. for LlamaIndex (quackling.llama_index). Below you find examples of both.

Basic RAG

Below you find a basic RAG pipeline using LlamaIndex.

[!NOTE] To use as is, first pip install llama-index-embeddings-huggingface llama-index-llms-huggingface-api additionally to quackling to install the models. Otherwise, you can set EMBED_MODEL & LLM as desired, e.g. using local models.

import os

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from quackling.llama_index.node_parsers.hier_node_parser import HierarchicalNodeParser
from quackling.llama_index.readers.docling_reader import DoclingReader

DOCS = ["https://arxiv.org/pdf/2311.18481"]
QUERY = "What is DocQA?"
EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
LLM = HuggingFaceInferenceAPI(
    token=os.getenv("HF_TOKEN"),
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
)

index = VectorStoreIndex.from_documents(
    documents=DoclingReader(parse_type=DoclingReader.ParseType.JSON).load_data(DOCS),
    embed_model=EMBED_MODEL,
    transformations=[HierarchicalNodeParser()],
)
query_engine = index.as_query_engine(llm=LLM)
response = query_engine.query(QUERY)
# > DocQA is a question-answering conversational assistant [...]

Chunking

You can also use Quackling with any pipeline, i.e. independently of frameworks like LlamaIndex. For instance, to split the document to chunks based on document structure and returning pointers to Docling document's nodes:

from docling.document_converter import DocumentConverter
from quackling.core.chunkers.hierarchical_chunker import HierarchicalChunker

doc = DocumentConverter().convert_single("https://arxiv.org/pdf/2206.01062")
chunks = list(HierarchicalChunker().chunk(doc))
# > [
# >     ChunkWithMetadata(
# >         path='$.main-text[0]',
# >         text='DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis',
# >         page=1,
# >         bbox=[107.59, 672.38, 505.18, 709.08]
# >     ),
# >     [...]
# > ]

More examples

Check out the examples — showcasing different variants of RAG incl. vector ingestion & retrieval:

Contributing

Please read Contributing to Quackling for details.

References

If you use Quackling in your projects, please consider citing the following:

@software{Docling,
author = {Deep Search Team},
month = {7},
title = {{Docling}},
url = {https://github.com/DS4SD/docling},
version = {main},
year = {2024}
}

License

The Quackling codebase is under MIT license. For individual component usage, please refer to the component licenses found in the original packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quackling-0.1.0.tar.gz (11.8 kB view hashes)

Uploaded Source

Built Distribution

quackling-0.1.0-py3-none-any.whl (12.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page