Skip to main content

Simple package to extract text with coordinates from programmatic PDFs

Project description

Docling Parse

PyPI version PyPI - Python Version Poetry Pybind11 Platforms License MIT

Simple package to extract text with coordinates from programmatic PDFs. This package is part of the Docling conversion.

Quick start

Install the package from Pypi

pip install docling-parse

Convert a PDF

from docling_parse.docling_parse import pdf_parser

parser = pdf_parser()
doc = parser.find_cells("mydoc.pdf")

for i, page in enumerate(doc["pages"]):
    for j, cell in enumerate(page["cells"]):
        print(i, "\t", j, "\t", cell["content"]["rnormalized"])

Use the CLI

$ docling-parse -h
usage: docling-parse [-h] -p PDF

Process a PDF file.

options:
  -h, --help         show this help message and exit
  -p PDF, --pdf PDF  Path to the PDF file

Development

CXX

To build the parse, simply run the following command in the root folder,

rm -rf build; cmake -B ./build; cd build; make

You can run the parser from your build folder with

./parse.exe <input-file> <optional-logging:true>

If you dont have an input file, then a template input file will be printed on the terminal.

Python

To build the package, simply run (make sure poetry is installed),

poetry build

To test the package, run,

poetry run pytest ./tests/test_parse.py

Contributing

Please read Contributing to Docling Parse for details.

References

If you use Docling in your projects, please consider citing the following:

@software{Docling,
author = {Deep Search Team},
month = {7},
title = {{Docling}},
url = {https://github.com/DS4SD/docling},
version = {main},
year = {2024}
}

License

The Docling Parse codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

docling_parse-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.8 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

docling_parse-0.1.0-cp312-cp312-macosx_14_0_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.12 macOS 14.0+ x86-64

docling_parse-0.1.0-cp312-cp312-macosx_14_0_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.12 macOS 14.0+ ARM64

docling_parse-0.1.0-cp312-cp312-macosx_13_6_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.12 macOS 13.6+ x86-64

docling_parse-0.1.0-cp312-cp312-macosx_13_6_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.12 macOS 13.6+ ARM64

docling_parse-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.8 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

docling_parse-0.1.0-cp311-cp311-macosx_14_0_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.11 macOS 14.0+ x86-64

docling_parse-0.1.0-cp311-cp311-macosx_14_0_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.11 macOS 14.0+ ARM64

docling_parse-0.1.0-cp311-cp311-macosx_13_6_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.11 macOS 13.6+ x86-64

docling_parse-0.1.0-cp311-cp311-macosx_13_6_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.11 macOS 13.6+ ARM64

docling_parse-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.8 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

docling_parse-0.1.0-cp310-cp310-macosx_14_0_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.10 macOS 14.0+ x86-64

docling_parse-0.1.0-cp310-cp310-macosx_14_0_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.10 macOS 14.0+ ARM64

docling_parse-0.1.0-cp310-cp310-macosx_13_6_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.10 macOS 13.6+ x86-64

docling_parse-0.1.0-cp310-cp310-macosx_13_6_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.10 macOS 13.6+ ARM64

docling_parse-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.8 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

docling_parse-0.1.0-cp39-cp39-macosx_14_0_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.9 macOS 14.0+ x86-64

docling_parse-0.1.0-cp39-cp39-macosx_14_0_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.9 macOS 14.0+ ARM64

docling_parse-0.1.0-cp39-cp39-macosx_13_6_x86_64.whl (8.1 MB view hashes)

Uploaded CPython 3.9 macOS 13.6+ x86-64

docling_parse-0.1.0-cp39-cp39-macosx_13_6_arm64.whl (8.1 MB view hashes)

Uploaded CPython 3.9 macOS 13.6+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page