Skip to main content

Extract images from markdown files and highlight text chunks with bounding boxes

Project description

Markitdown Reference Image

A Python package for extracting images from markdown files and highlighting specific text chunks with bounding boxes.

Features

  • Find specific text chunks in markdown content
  • Convert markdown to HTML and capture as image
  • Draw bounding boxes around found text chunks
  • Optionally add scores to the bounding boxes
  • Save processed images to specified paths or temporary files

Installation

pip install markitdown-reference-image

Usage

Basic Usage

Extract and highlight text from a markdown file:

from markitdown_reference_image import MarkitdownImageExtractor

# Initialize the extractor
extractor = MarkitdownImageExtractor()

# Extract image with highlighted text
image_path = extractor.extract_with_highlight(
    markdown_file="test_document.md",
    chunk_text="important information that we want to highlight",
    output_path="assets/example_basic.png",
    score=0.95
)

print(f"Image saved to: {image_path}")

Result:

Basic Example


With Score Display

Add a similarity score to show retrieval confidence (great for RAG systems):

image_path = extractor.extract_with_highlight(
    markdown_file="test_document.md",
    chunk_text="Text Finding: Find specific text chunks in markdown content",
    output_path="assets/example_with_score.png",
    score=0.88  # RAG similarity score
)

Result:

With Score


Custom Styling

Customize colors and appearance to match your brand:

image_path = extractor.extract_with_highlight(
    markdown_file="test_document.md",
    chunk_text="Image Extraction",
    output_path="assets/example_custom_styling.png",
    score=0.92,
    box_color=(0, 255, 0),        # Green box
    box_width=5,                   # Thicker border
    score_color=(255, 255, 0),     # Yellow text
    score_bg_color=(0, 128, 0)     # Dark green background
)

Result:

Custom Styling


Using Temporary Output

No output path? No problem - a temporary file is created:

# Temporary file will be created automatically
image_path = extractor.extract_with_highlight(
    markdown_file="test_document.md",
    chunk_text="visual references from markdown"
    # No output_path specified - uses temporary file
)

print(f"Temporary image: {image_path}")
# Example: /tmp/tmp8x3k9m2p.png

Result:

Temporary Output


Command Line Interface

Quick testing and automation via CLI:

# Basic usage
markitdown-extract test_document.md "important information" -o output.png

# With score
markitdown-extract test_document.md "Text Finding" -o output.png -s 0.88

# Custom styling
markitdown-extract test_document.md "Image Extraction" \
  -o output.png \
  -s 0.92 \
  --box-color 0 255 0 \
  --box-width 5 \
  --score-color 255 255 0 \
  --score-bg-color 0 128 0

Examples

The package includes comprehensive examples in the markitdown_reference_image/examples/ directory:

Quick Example Runner

python run_examples.py

Individual Examples

# Basic usage examples
python -m markitdown_reference_image.examples.basic_extraction
python -m markitdown_reference_image.examples.with_score
python -m markitdown_reference_image.examples.custom_styling

# Advanced usage examples  
python -m markitdown_reference_image.examples.batch_processing
python -m markitdown_reference_image.examples.component_usage
python -m markitdown_reference_image.examples.error_handling

# Command-line examples
python -m markitdown_reference_image.examples.cli_basic
python -m markitdown_reference_image.examples.cli_with_score
python -m markitdown_reference_image.examples.cli_custom_styling

Available Examples:

Basic Usage:

  • basic_extraction.py - Basic image extraction
  • with_score.py - Adding scores to bounding boxes
  • custom_styling.py - Custom styling options

Advanced Usage:

  • batch_processing.py - Batch processing multiple files
  • component_usage.py - Using individual components
  • error_handling.py - Proper error handling

Command Line:

  • cli_basic.py - Basic CLI usage
  • cli_with_score.py - CLI with score display
  • cli_custom_styling.py - CLI with custom styling

Development

Setup

  1. Clone the repository:
git clone https://github.com/yourusername/markitdown-reference-image.git
cd markitdown-reference-image
  1. Install in development mode:
pip install -e .
  1. Install development dependencies:
pip install -r requirements-dev.txt

Running Tests

pytest

Code Formatting

black .

Linting

flake8 .
mypy .

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for your changes
  5. Run the test suite
  6. Submit a pull request

Changelog

See CHANGELOG.md for a list of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markitdown_reference_image-0.1.1.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markitdown_reference_image-0.1.1-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file markitdown_reference_image-0.1.1.tar.gz.

File metadata

File hashes

Hashes for markitdown_reference_image-0.1.1.tar.gz
Algorithm Hash digest
SHA256 66613e0d98b49bcf1971a5c9327b4df4b11face53d4dfef1b870d5e0217204df
MD5 7e2d680313a306c9ccb99990994c112d
BLAKE2b-256 3db68e0b7a62be424206d818f6935d9a6b21decb6a788f32a239092528207daf

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitdown_reference_image-0.1.1.tar.gz:

Publisher: publish.yml on Naveenkumarar/markitdown-reference-image

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markitdown_reference_image-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for markitdown_reference_image-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd1d70e0d7dc216ea798ca5e015fbf44d6137d4584bea2f313515abbc8f95f65
MD5 f95896b5e1900ea76a682b9ab8212e49
BLAKE2b-256 947b66e174f98023da117bc936bfa099fa801c6dcefd4a6b554e3689d4cd64b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitdown_reference_image-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Naveenkumarar/markitdown-reference-image

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page