Skip to main content

A markitdown plugin use to customize markdownify

Project description

markitdown-office-extension

A markitdown plugin use to customize markdownif. Supports docx, pptx, xlsx, epub, and html file formats.

Usage

from typing import Any, Optional
from io import BytesIO
try:
    from markitdown import MarkItDown
except ImportError:
    from markitdown_no_magika import MarkItDown
from markitdown_office_extension.markdown_converter import MarkdownConverter


class CustomMarkdownConverter(MarkdownConverter):
    def convert_img(
        self,
        el: Any,
        text: str,
        convert_as_inline: Optional[bool] = False,
        **kwargs,
    ) -> str:
        if (src := el.attrs.get("src", None)) is not None:
            # process extracted image such as upload to s3
            # in example, we print image attr only
            print("image alt: {alt}, title: {title}, src: {src}".format(
                alt=el.attrs.get("alt", ""),
                title=el.attrs.get("title", ""),
                src=src,
            ))

            # ... or modify image attr such as `src`
            el.attrs["src"] = "https://example.com/assets/example.png"

            # if not set keep_data_uris, or keep_data_uris is False,
            # markitdown won't display whole image uri
            kwargs["keep_data_uris"] = True

        return super().convert_img(el, text, convert_as_inline, **kwargs)

converter = MarkItDown(enable_plugins=True, markdownify=CustomMarkdownConverter)
document = converter.convert(BytesIO(bytes(
    "![title](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAArYAAAOdCAYAAABwHy)",
    encoding="utf-8"
)))

print(document) # ![](https://example.com/assets/example.png)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markitdown_office_extension-0.1.2.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markitdown_office_extension-0.1.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file markitdown_office_extension-0.1.2.tar.gz.

File metadata

File hashes

Hashes for markitdown_office_extension-0.1.2.tar.gz
Algorithm Hash digest
SHA256 38cae4da3f075a216b6bdcc4fabee00b866bf987d0b72202e588e75942211986
MD5 bbee67ebfc7b7b8ce36c4d0ff92b4f37
BLAKE2b-256 710e3e901a1f0583e56257baab0852b94c1648aca6d2993702f7ac66d8d9455e

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitdown_office_extension-0.1.2.tar.gz:

Publisher: publish.yml on qzwxsaedc/markitdown-office-extension

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markitdown_office_extension-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for markitdown_office_extension-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4959b490e9fd0702dbad60c3bfa0e8f1c7764b6034337e6ada64a23826bba468
MD5 6f847a8cd302366a4302b668b40eae61
BLAKE2b-256 a9d8a5baa833f9d3dce8a628ad772620391d17168c0a3ab4f4ba7e9abb4ece0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitdown_office_extension-0.1.2-py3-none-any.whl:

Publisher: publish.yml on qzwxsaedc/markitdown-office-extension

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page