Skip to main content

A python library to define and validate data types in Docling.

Project description

Docling Core

PyPI version Python uv Code style: black Imports: isort Checked with mypy Pydantic v2 pre-commit License MIT

Docling Core is a library that defines core data types and transformations in Docling.

Installation

To use Docling Core, simply install docling-core from your package manager, e.g. pip:

pip install docling-core

Development setup

To develop for Docling Core, you need Python 3.10 through 3.14 and the uv package. You can then install it from your local clone's root directory:

uv sync --all-extras

To run the pytest suite, execute:

uv run pytest -s test

Main features

Docling Core provides the foundational DoclingDocument data model and API, as well as additional APIs for tasks like serialization and chunking, which are key to developing generative AI applications using Docling.

DoclingDocument

Docling Core defines the DoclingDocument as a Pydantic model, allowing for advanced data model control, customizability, and interoperability.

In addition to specifying the schema, it provides a handy API for building documents, as well as for basic operations, e.g. exporting to various formats, like Markdown, HTML, and others.

👉 More details:

Serialization

Different users can have varying requirements when it comes to serialization. To address this, the Serialization API introduces a design that allows easy extension, while providing feature-rich built-in implementations (on which the respective DoclingDocument helpers are actually based).

👉 More details:

Chunking

Similarly to above, the Chunking API provides built-in chunking capabilities as well as a design that enables easy extension, this way tackling customization requirements of different use cases.

👉 More details:

Profiling

The Profiling API enables extraction of comprehensive statistics from DoclingDocument objects, both for individual documents and collections. It provides metrics on document structure (pages, tables, pictures, text items) along with statistical distributions (deciles, histograms) and visualization capabilities for analyzing document collections at scale.

👉 More details:

Contributing

Please read Contributing to Docling Core for details.

References

If you use Docling Core in your projects, please consider citing the following:

@techreport{Docling,
  author = "Deep Search Team",
  month = 8,
  title = "Docling Technical Report",
  url = "https://arxiv.org/abs/2408.09869",
  eprint = "2408.09869",
  doi = "10.48550/arXiv.2408.09869",
  version = "1.0.0",
  year = 2024
}

License

The Docling Core codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_core-2.77.1.tar.gz (328.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docling_core-2.77.1-py3-none-any.whl (283.9 kB view details)

Uploaded Python 3

File details

Details for the file docling_core-2.77.1.tar.gz.

File metadata

  • Download URL: docling_core-2.77.1.tar.gz
  • Upload date:
  • Size: 328.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_core-2.77.1.tar.gz
Algorithm Hash digest
SHA256 d93c7cdc0de4bbf36ef74fb4c3c3d49bb8420ff27201f3b66908672326835b47
MD5 3463efa6e9fa38c2fb1dad38a104c688
BLAKE2b-256 5c5b2c57066e2900b815d177e73d19e78a7766a3e4da3e5762df48b83493a135

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_core-2.77.1.tar.gz:

Publisher: pypi.yml on docling-project/docling-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docling_core-2.77.1-py3-none-any.whl.

File metadata

  • Download URL: docling_core-2.77.1-py3-none-any.whl
  • Upload date:
  • Size: 283.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_core-2.77.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e38df7143e2ecfe69ecf05278e8e25063a9ec1b6d0b5e28e3b8f1db7cc5ed72
MD5 0db786d60ebed38ce4cbcb5efd03fbed
BLAKE2b-256 f0ed00dc4f21b9b47a6e89e026f0aeaa4d5aab03fe8135867aeeff66bd153fe8

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_core-2.77.1-py3-none-any.whl:

Publisher: pypi.yml on docling-project/docling-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page