Skip to main content

SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.

Project description

Docowling

Docowling

Docs PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2

Docowling is a fork of the Docling, an IBM project, developed to enhance functionalities and add new document processing capabilities.

Why Docowling?

Like an owl watching for all prey, docowling is a fork intended to attack all types of documents.

Docowling

Features

  • 📄 Converts popular formats (CSV, PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) to HTML, Markdown and JSON with embedded/referenced images
  • 🧩 Unified DoclingDocument format for standardized representation
  • 🤖 Ready-to-use integrations with LangChain, LlamaIndex, Crew AI & Haystack
  • 💻 Intuitive CLI for efficient batch processing with customizable export parameters

Coming Soon

  • 📄 More formats compatibility
  • 🤖 Optimize integrations with LangChain, Crew AI & Weaviate

Installation

To use Docowling, simply install docowling from your package manager, e.g. pip or uv:

pip install docowling
uv pip install docowling

Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.

Getting started

To convert individual documents, use convert(), for example:

from docowling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docowling Technical Report[...]"
from docowling.document_converter import DocumentConverter

source = "/content/drive/MyDrive/TESLA.csv"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  
# output: "| Date     |      Open |      High [...]"

License

The Docowling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

IBM ❤️ Thanks

Thank you IBM for creating Docling, the base of Docowling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docowling-1.0.17.tar.gz (87.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docowling-1.0.17-py3-none-any.whl (116.2 kB view details)

Uploaded Python 3

File details

Details for the file docowling-1.0.17.tar.gz.

File metadata

  • Download URL: docowling-1.0.17.tar.gz
  • Upload date:
  • Size: 87.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.2 Windows/11

File hashes

Hashes for docowling-1.0.17.tar.gz
Algorithm Hash digest
SHA256 ea2d976f4b978221c0a922e25c4fa45f4df30f2e63a39ba25395031b2ab1c87f
MD5 9d17dedf6d9198ba681713f86957916c
BLAKE2b-256 3a7f81bf59a1d6aacae53882a430db1247511a71d15ac0b64a4dc56b56ff486c

See more details on using hashes here.

File details

Details for the file docowling-1.0.17-py3-none-any.whl.

File metadata

  • Download URL: docowling-1.0.17-py3-none-any.whl
  • Upload date:
  • Size: 116.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.2 Windows/11

File hashes

Hashes for docowling-1.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 4a146c051ecd2b12068fce2a163c2b606a9026ec55348be65d9416ff5117a177
MD5 e60915e7f68da15ba6e5d52a428de6cd
BLAKE2b-256 e82809096e2e365f72183d51bd32283f07edb788e4d95084fa8e0c1999d7e880

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page