Skip to main content

Quackling enables document-native generative AI applications

Project description

[!IMPORTANT]

👉 Now part of Docling!

Quackling

Quackling

PyPI version Python Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT

Easily build document-native generative AI applications, such as RAG, leveraging Docling's efficient PDF extraction and rich data model — while still using your favorite framework, 🦙 LlamaIndex or 🦜🔗 LangChain.

Features

  • 🧠 Enables rich gen AI applications by providing capabilities on native document level — not just plain text / Markdown!
  • ⚡️ Leverages Docling's conversion quality and speed.
  • ⚙️ Plug-and-play integration with LlamaIndex and LangChain for building powerful applications like RAG.

Doc-native RAG

Installation

To use Quackling, simply install quackling from your package manager, e.g. pip:

pip install quackling

Usage

Quackling offers core capabilities (quackling.core), as well as framework integration components (quackling.llama_index and quackling.langchain). Below you find examples of both.

Basic RAG

Here is a basic RAG pipeline using LlamaIndex:

[!NOTE] To use as is, first pip install llama-index-embeddings-huggingface llama-index-llms-huggingface-api additionally to quackling to install the models. Otherwise, you can set EMBED_MODEL & LLM as desired, e.g. using local models.

import os

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from quackling.llama_index.node_parsers import HierarchicalJSONNodeParser
from quackling.llama_index.readers import DoclingPDFReader

DOCS = ["https://arxiv.org/pdf/2206.01062"]
QUESTION = "How many pages were human annotated?"
EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
LLM = HuggingFaceInferenceAPI(
    token=os.getenv("HF_TOKEN"),
    model_name="mistralai/Mistral-7B-Instruct-v0.3",
)

index = VectorStoreIndex.from_documents(
    documents=DoclingPDFReader(parse_type=DoclingPDFReader.ParseType.JSON).load_data(DOCS),
    embed_model=EMBED_MODEL,
    transformations=[HierarchicalJSONNodeParser()],
)
query_engine = index.as_query_engine(llm=LLM)
result = query_engine.query(QUESTION)
print(result.response)
# > 80K pages were human annotated

Chunking

You can also use Quackling as a standalone with any pipeline. For instance, to split the document to chunks based on document structure and returning pointers to Docling document's nodes:

from docling.document_converter import DocumentConverter
from quackling.core.chunkers import HierarchicalChunker

doc = DocumentConverter().convert_single("https://arxiv.org/pdf/2408.09869").output
chunks = list(HierarchicalChunker().chunk(doc))
# > [
# >     ChunkWithMetadata(
# >         path='$.main-text[4]',
# >         text='Docling Technical Report\n[...]',
# >         page=1,
# >         bbox=[117.56, 439.85, 494.07, 482.42]
# >     ),
# >     [...]
# > ]

More examples

LlamaIndex

LangChain

Contributing

Please read Contributing to Quackling for details.

References

If you use Quackling in your projects, please consider citing the following:

@techreport{Docling,
  author = "Deep Search Team",
  month = 8,
  title = "Docling Technical Report",
  url = "https://arxiv.org/abs/2408.09869",
  eprint = "2408.09869",
  doi = "10.48550/arXiv.2408.09869",
  version = "1.0.0",
  year = 2024
}

License

The Quackling codebase is under MIT license. For individual component usage, please refer to the component licenses found in the original packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quackling-0.4.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

quackling-0.4.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file quackling-0.4.1.tar.gz.

File metadata

  • Download URL: quackling-0.4.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for quackling-0.4.1.tar.gz
Algorithm Hash digest
SHA256 9f3be8538c89258c774b047f762d71f92f5a0a776bab7630e81a9b0eb041d4a2
MD5 6c1c9da8e8ff9a0d08992570dea4a9d4
BLAKE2b-256 315842f51fb771fe4d7602a31074736aa12aea560cfe548d30abd9497e2e9279

See more details on using hashes here.

File details

Details for the file quackling-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: quackling-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for quackling-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e1e07fcbb964e7e2a96f68bd4ce6b432a341b61f1c0ea1ca298e61e85b14e9a2
MD5 8c2ceb51d9af0a27528d28a51078960a
BLAKE2b-256 80980f5892f4d9c9c7dc6956f4b3a5315f81ee5b1acf5b7d1fb0f99fd22b0d64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page