Quackling enables document-native generative AI applications
Project description
[!IMPORTANT]
👉 Now part of Docling!
Quackling
Easily build document-native generative AI applications, such as RAG, leveraging Docling's efficient PDF extraction and rich data model — while still using your favorite framework, 🦙 LlamaIndex or 🦜🔗 LangChain.
Features
- 🧠 Enables rich gen AI applications by providing capabilities on native document level — not just plain text / Markdown!
- ⚡️ Leverages Docling's conversion quality and speed.
- ⚙️ Plug-and-play integration with LlamaIndex and LangChain for building powerful applications like RAG.
Installation
To use Quackling, simply install quackling
from your package manager, e.g. pip:
pip install quackling
Usage
Quackling offers core capabilities (quackling.core
), as well as framework integration components (quackling.llama_index
and quackling.langchain
). Below you find examples of both.
Basic RAG
Here is a basic RAG pipeline using LlamaIndex:
[!NOTE] To use as is, first
pip install llama-index-embeddings-huggingface llama-index-llms-huggingface-api
additionally toquackling
to install the models. Otherwise, you can setEMBED_MODEL
&LLM
as desired, e.g. using local models.
import os
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from quackling.llama_index.node_parsers import HierarchicalJSONNodeParser
from quackling.llama_index.readers import DoclingPDFReader
DOCS = ["https://arxiv.org/pdf/2206.01062"]
QUESTION = "How many pages were human annotated?"
EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
LLM = HuggingFaceInferenceAPI(
token=os.getenv("HF_TOKEN"),
model_name="mistralai/Mistral-7B-Instruct-v0.3",
)
index = VectorStoreIndex.from_documents(
documents=DoclingPDFReader(parse_type=DoclingPDFReader.ParseType.JSON).load_data(DOCS),
embed_model=EMBED_MODEL,
transformations=[HierarchicalJSONNodeParser()],
)
query_engine = index.as_query_engine(llm=LLM)
result = query_engine.query(QUESTION)
print(result.response)
# > 80K pages were human annotated
Chunking
You can also use Quackling as a standalone with any pipeline. For instance, to split the document to chunks based on document structure and returning pointers to Docling document's nodes:
from docling.document_converter import DocumentConverter
from quackling.core.chunkers import HierarchicalChunker
doc = DocumentConverter().convert_single("https://arxiv.org/pdf/2408.09869").output
chunks = list(HierarchicalChunker().chunk(doc))
# > [
# > ChunkWithMetadata(
# > path='$.main-text[4]',
# > text='Docling Technical Report\n[...]',
# > page=1,
# > bbox=[117.56, 439.85, 494.07, 482.42]
# > ),
# > [...]
# > ]
More examples
LlamaIndex
- Milvus basic RAG (dense embeddings)
- Milvus hybrid RAG (dense & sparse embeddings combined e.g. via RRF) & reranker model usage
- Milvus RAG also fetching native document metadata for search results
- Local node transformations (e.g. embeddings)
- ...
LangChain
Contributing
Please read Contributing to Quackling for details.
References
If you use Quackling in your projects, please consider citing the following:
@techreport{Docling,
author = "Deep Search Team",
month = 8,
title = "Docling Technical Report",
url = "https://arxiv.org/abs/2408.09869",
eprint = "2408.09869",
doi = "10.48550/arXiv.2408.09869",
version = "1.0.0",
year = 2024
}
License
The Quackling codebase is under MIT license. For individual component usage, please refer to the component licenses found in the original packages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file quackling-0.4.1.tar.gz
.
File metadata
- Download URL: quackling-0.4.1.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f3be8538c89258c774b047f762d71f92f5a0a776bab7630e81a9b0eb041d4a2 |
|
MD5 | 6c1c9da8e8ff9a0d08992570dea4a9d4 |
|
BLAKE2b-256 | 315842f51fb771fe4d7602a31074736aa12aea560cfe548d30abd9497e2e9279 |
File details
Details for the file quackling-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: quackling-0.4.1-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1e07fcbb964e7e2a96f68bd4ce6b432a341b61f1c0ea1ca298e61e85b14e9a2 |
|
MD5 | 8c2ceb51d9af0a27528d28a51078960a |
|
BLAKE2b-256 | 80980f5892f4d9c9c7dc6956f4b3a5315f81ee5b1acf5b7d1fb0f99fd22b0d64 |