A tool to convert documents into knowledge graphs using Docling.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Docling Graph

Docling-Graph turns documents into validated Pydantic objects, then builds a directed knowledge graph with explicit semantic relationships.

This transformation enables high-precision use cases in chemistry, finance, and legal domains, where AI must capture exact entity connections (compounds and reactions, instruments and dependencies, properties and measurements) rather than rely on approximate text embeddings.

This toolkit supports two extraction paths: local VLM extraction via Docling, and LLM-based extraction routed through LiteLLM for local runtimes (vLLM, Ollama) and API providers (Mistral, OpenAI, Gemini, IBM WatsonX), all orchestrated through a flexible, config-driven pipeline.

Key Capabilities

✍🏻 Input formats: Docling’s supported inputs: PDF, images, markdown, Office, HTML, and more.
🧠 Extraction: LLM or VLM backends, with chunking and processing modes.
💎 Graphs: Pydantic → NetworkX directed graphs with stable IDs and edge metadata.
📦 Export: CSV, Cypher, and other KG-friendly formats.
🔍 Visualization: Interactive HTML and Markdown reports.

Latest Changes

🪜 Multi-pass extraction: Delta and staged contracts (experimental).
📐 Structured extraction: LLM output is schema-enforced by default; see CLI and API to disable.
✨ LiteLLM: Single interface for vLLM, OpenAI, Mistral, WatsonX, and more.
🐛 Trace capture: Debug exports for extraction and fallback diagnostics.

Coming Soon

🧩 Interactive Template Builder: Guided workflows for building Pydantic templates.
🧲 Ontology-Based Templates: Match content to the best Pydantic template using semantic similarity.
💾 Graph Database Integration: Export data straight into Neo4j, ArangoDB, and similar databases.

Quick Start

Requirements

Python 3.10 or higher

Installation

pip install docling-graph

This installs the core package with VLM support and LiteLLM for LLM providers. For detailed installation instructions (including optional extras and GPU setup), see Installation Guide.

API Key Setup (Remote Inference)

export OPENAI_API_KEY="..."        # OpenAI
export MISTRAL_API_KEY="..."       # Mistral
export GEMINI_API_KEY="..."        # Google Gemini

# IBM WatsonX
export WATSONX_API_KEY="..."       # IBM WatsonX API Key
export WATSONX_PROJECT_ID="..."    # IBM WatsonX Project ID
export WATSONX_URL="..."           # IBM WatsonX URL (optional)

Basic Usage

CLI

# Initialize configuration
docling-graph init

# Convert document from URL (each line except the last must end with \)
docling-graph convert "https://arxiv.org/pdf/2207.02720" \
    --template "docs.examples.templates.rheology_research.ScholarlyRheologyPaper" \
    --processing-mode "many-to-one" \
    --extraction-contract "staged" \
    --debug

# Visualize results
docling-graph inspect outputs

Python API - Default Behavior

from docling_graph import run_pipeline, PipelineContext
from docs.examples.templates.rheology_research import ScholarlyRheologyPaper

# Create configuration
config = {
    "source": "https://arxiv.org/pdf/2207.02720",
    "template": ScholarlyRheologyPaper,
    "backend": "llm",
    "inference": "remote",
    "processing_mode": "many-to-one",
    "extraction_contract": "staged",  # robust for smaller models
    "provider_override": "mistral",
    "model_override": "mistral-medium-latest",
    "structured_output": True,  # default
    "use_chunking": True,
}

# Run pipeline - returns data directly, no files written to disk
context: PipelineContext = run_pipeline(config)

# Access results
graph = context.knowledge_graph
models = context.extracted_models
metadata = context.graph_metadata

print(f"Extracted {len(models)} model(s)")
print(f"Graph: {graph.number_of_nodes()} nodes, {graph.number_of_edges()} edges")

For debugging, use --debug with the CLI to save intermediate artifacts to disk; see Trace Data & Debugging. For more examples, see Examples.

Pydantic Templates

Templates define both the extraction schema and the resulting graph structure.

from pydantic import BaseModel, Field
from docling_graph.utils import edge

class Person(BaseModel):
    """Person entity with stable ID."""
    model_config = {
        'is_entity': True,
        'graph_id_fields': ['last_name', 'date_of_birth']
    }
    
    first_name: str = Field(description="Person's first name")
    last_name: str = Field(description="Person's last name")
    date_of_birth: str = Field(description="Date of birth (YYYY-MM-DD)")

class Organization(BaseModel):
    """Organization entity."""
    model_config = {'is_entity': True}
    
    name: str = Field(description="Organization name")
    employees: list[Person] = edge("EMPLOYS", description="List of employees")

For complete guidance, see:

Documentation

Comprehensive documentation can be found on Docling Graph's Page.

Documentation Structure

The documentation follows the docling-graph pipeline stages:

Introduction - Overview and core concepts
Installation - Setup and environment configuration
Schema Definition - Creating Pydantic templates
Pipeline Configuration - Configuring the extraction pipeline
Extraction Process - Document conversion and extraction
Graph Management - Exporting and visualizing graphs
CLI Reference - Command-line interface guide
Python API - Programmatic usage
Examples - Working code examples
Advanced Topics - Performance, testing, error handling
API Reference - Detailed API documentation
Community - Contributing and development guide

Contributing

We welcome contributions! Please see:

Contributing Guidelines - How to contribute
Development Guide - Development setup

Development Setup

# Clone and setup
git clone https://github.com/docling-project/docling-graph
cd docling-graph

# Install with dev dependencies
uv sync --extra dev

# Run Execute pre-commit checks
uv run pre-commit run --all-files

License

MIT License - see LICENSE for details.

Acknowledgments

Docling Graph builds on outstanding open-source projects:

Docling - document conversion and VLM extraction
Pydantic - schema definition and validation
NetworkX - graph construction and analysis
LiteLLM - unified LLM provider interface
SpaCy - semantic entity resolution in delta extraction
Cytoscape - interactive graph visualization

IBM ❤️ Open Source AI

Docling Graph has been brought to you by IBM.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ayoub-ibm

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.5.1

May 1, 2026

This version

1.5.0

Apr 19, 2026

1.4.4

Feb 18, 2026

1.4.3

Feb 18, 2026

1.4.0

Feb 18, 2026

1.3.1

Feb 16, 2026

1.3.0

Feb 15, 2026

1.2.4

Feb 10, 2026

1.2.3

Jan 26, 2026

1.2.2

Jan 26, 2026

1.2.1

Jan 25, 2026

1.2.0

Jan 25, 2026

1.1.0

Jan 24, 2026

1.0.0

Jan 23, 2026

0.4.1

Jan 22, 2026

0.4.0

Jan 22, 2026

0.3.0

Jan 22, 2026

0.2.5

Jan 21, 2026

0.2.4

Jan 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_graph-1.5.0.tar.gz (192.9 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docling_graph-1.5.0-py3-none-any.whl (224.9 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file docling_graph-1.5.0.tar.gz.

File metadata

Download URL: docling_graph-1.5.0.tar.gz
Upload date: Apr 19, 2026
Size: 192.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_graph-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`b832fe3295a98c7ce98c68b3fcd7e68a4e0d25f286ff80acfff354789909292b`
MD5	`cf5b950fe6ce7c489f9d2fc41d75046d`
BLAKE2b-256	`3507d45b379d7286ee58248af65c5c71376a8aade0487c0dd6ad2e9a4088997c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_graph-1.5.0.tar.gz:

Publisher: release.yml on docling-project/docling-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docling_graph-1.5.0.tar.gz
- Subject digest: b832fe3295a98c7ce98c68b3fcd7e68a4e0d25f286ff80acfff354789909292b
- Sigstore transparency entry: 1340583193
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: docling-project/docling-graph@990b1872210e7c276399511a90bd0fea9748d242
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/docling-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@990b1872210e7c276399511a90bd0fea9748d242
- Trigger Event: push

File details

Details for the file docling_graph-1.5.0-py3-none-any.whl.

File metadata

Download URL: docling_graph-1.5.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 224.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_graph-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b3931875d56e591a0ef96ecac87cf8afe4424b2ef6f59fbee8ecc7d8bb024d8`
MD5	`88e286e0864683a76171ae4bc7f78549`
BLAKE2b-256	`a6b1e8e6c5f5edb3c174937a0ebf62567284e37139baeb2c0b965a477c8cc04a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_graph-1.5.0-py3-none-any.whl:

Publisher: release.yml on docling-project/docling-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docling_graph-1.5.0-py3-none-any.whl
- Subject digest: 8b3931875d56e591a0ef96ecac87cf8afe4424b2ef6f59fbee8ecc7d8bb024d8
- Sigstore transparency entry: 1340583201
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: docling-project/docling-graph@990b1872210e7c276399511a90bd0fea9748d242
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/docling-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@990b1872210e7c276399511a90bd0fea9748d242
- Trigger Event: push

docling-graph 1.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Docling Graph

Key Capabilities

Latest Changes

Coming Soon

Quick Start

Requirements

Installation

API Key Setup (Remote Inference)

Basic Usage

CLI

Python API - Default Behavior

Pydantic Templates

Documentation

Documentation Structure

Contributing

Development Setup

License

Acknowledgments

IBM ❤️ Open Source AI

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance