SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
Project description
Docowling
Docowling is a fork of the Docling, an IBM project, developed to enhance functionalities and add new document processing capabilities.
Why Docowling?
Like an owl watching for all prey, docowling is a fork intended to attack all types of documents.
Features
- 📄 Converts popular formats (CSV, PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) to HTML, Markdown and JSON with embedded/referenced images
- 🧩 Unified DoclingDocument format for standardized representation
- 🤖 Ready-to-use integrations with LangChain, LlamaIndex, Crew AI & Haystack
- 💻 Intuitive CLI for efficient batch processing with customizable export parameters
Coming Soon
- 📄 More formats compatibility
- 🤖 Optimize integrations with LangChain, Crew AI & Weaviate
Installation
To use Docowling, simply install docowling from your package manager, e.g. pip or uv:
pip install docowling
uv pip install docowling
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
Getting started
To convert individual documents, use convert(), for example:
from docowling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docowling Technical Report[...]"
from docowling.document_converter import DocumentConverter
source = "/content/drive/MyDrive/TESLA.csv" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
# output: "| Date | Open | High [...]"
License
The Docowling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.
IBM ❤️ Thanks
Thank you IBM for creating Docling, the base of Docowling.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docowling-1.0.17.tar.gz.
File metadata
- Download URL: docowling-1.0.17.tar.gz
- Upload date:
- Size: 87.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.12.2 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea2d976f4b978221c0a922e25c4fa45f4df30f2e63a39ba25395031b2ab1c87f
|
|
| MD5 |
9d17dedf6d9198ba681713f86957916c
|
|
| BLAKE2b-256 |
3a7f81bf59a1d6aacae53882a430db1247511a71d15ac0b64a4dc56b56ff486c
|
File details
Details for the file docowling-1.0.17-py3-none-any.whl.
File metadata
- Download URL: docowling-1.0.17-py3-none-any.whl
- Upload date:
- Size: 116.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.12.2 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a146c051ecd2b12068fce2a163c2b606a9026ec55348be65d9416ff5117a177
|
|
| MD5 |
e60915e7f68da15ba6e5d52a428de6cd
|
|
| BLAKE2b-256 |
e82809096e2e365f72183d51bd32283f07edb788e4d95084fa8e0c1999d7e880
|