Skip to main content

Native Python bindings for OfficeMD document extraction

Project description

officemd

Fast Office document extraction for LLMs and agents. Converts DOCX, XLSX, CSV, PPTX, and PDF into clean markdown, structured JSON IR, and Docling output.

Install

uv add officemd
# or
pip install officemd

For the CLI without adding to a project:

uvx officemd markdown report.docx

CLI

officemd markdown report.docx
officemd markdown budget.xlsx --sheets "Summary,Q1"
officemd render report.docx
officemd diff old.docx new.docx

SDK

from pathlib import Path
from officemd import extract_ir_json, markdown_from_bytes, docling_from_bytes

content = Path("report.docx").read_bytes()

# Markdown
print(markdown_from_bytes(content, format="docx"))

# Structured JSON IR
print(extract_ir_json(content, format="docx"))

# Docling JSON
print(docling_from_bytes(content, format="docx"))

Supported Formats

Format Extension Markdown JSON IR Docling
Word .docx yes yes yes
Excel .xlsx yes yes yes
CSV .csv yes yes -
PowerPoint .pptx yes yes yes
PDF .pdf yes yes -

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

officemd-0.1.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

officemd-0.1.0-cp312-abi3-win_amd64.whl (2.2 MB view details)

Uploaded CPython 3.12+Windows x86-64

officemd-0.1.0-cp312-abi3-manylinux_2_34_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.34+ x86-64

officemd-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ ARM64

officemd-0.1.0-cp312-abi3-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

officemd-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file officemd-0.1.0.tar.gz.

File metadata

  • Download URL: officemd-0.1.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for officemd-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2269007b17f0e462baf4fcfda471a8c24b735dcc43ea408d6bd4027324a87321
MD5 cd4f4a01229f73285c02c5307d2e2de7
BLAKE2b-256 aa4c178de1e16e93d6ddebf4be4fe7f3cb454f520631080504ec242dd88ef233

See more details on using hashes here.

Provenance

The following attestation bundles were made for officemd-0.1.0.tar.gz:

Publisher: release.yml on ThomAub/officemd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file officemd-0.1.0-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: officemd-0.1.0-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for officemd-0.1.0-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ccc0ede560117e0a15bacb19fa0ad9c9c86fc07e4b3fbf0bca80f946c62b4b1b
MD5 4ce9723575e5363d18c4cb683e359c7a
BLAKE2b-256 de9b19d9ff0d03e34f96754109345db1ddcdbb66208478368cd37e10f192881d

See more details on using hashes here.

Provenance

The following attestation bundles were made for officemd-0.1.0-cp312-abi3-win_amd64.whl:

Publisher: release.yml on ThomAub/officemd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file officemd-0.1.0-cp312-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for officemd-0.1.0-cp312-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e285424e462b950ce1b9465bec569098e7631e446ca49084721a2f36e2c5328e
MD5 3a964183378486c5db9cb184364e0103
BLAKE2b-256 485c93082bc07046b1feb52e420e1a7cab781f6beae93e38630b1e324c411c15

See more details on using hashes here.

Provenance

The following attestation bundles were made for officemd-0.1.0-cp312-abi3-manylinux_2_34_x86_64.whl:

Publisher: release.yml on ThomAub/officemd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file officemd-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for officemd-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2a15a7b486bd7a6130c4e99cf3e1f899158b647dd0c67ab9de6ef8d82a3bfc61
MD5 05c9b4e1149da72d560e5bb468d81593
BLAKE2b-256 b292fd02d6521937226ddfa24fbe8f6caa04553f168aec350e4a7d0dc6e0d84e

See more details on using hashes here.

Provenance

The following attestation bundles were made for officemd-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on ThomAub/officemd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file officemd-0.1.0-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: officemd-0.1.0-cp312-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.12+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for officemd-0.1.0-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 88994107d5c4c8457decac5454d02b40be5621e6f78d5f6d2109a44407af28ab
MD5 12eb4f0359ba3f6e5b1f13f28867c009
BLAKE2b-256 232de275180b661b9b78e9e8b579d2b1c19b14b54163afa93a98ea2a7e05e256

See more details on using hashes here.

File details

Details for the file officemd-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for officemd-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ca6439f6fe7ee2c1087125a54da753490e033ca114cdf5a000a8c4030fd8f27d
MD5 3131e7cb59571bd3168dbc441d75d96d
BLAKE2b-256 6f7380047f6e7ceaa0a9936ad564a7d6070b7e1906e9abc9fbf3ffbea92fa953

See more details on using hashes here.

Provenance

The following attestation bundles were made for officemd-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on ThomAub/officemd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page