Skip to main content

A library for Visual Document Testing

Project description

robotframework-doctestlibrary


Robot Framework DocTest library.
Simple Automated Visual Document Testing.

See keyword documentation for

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Compare two Images and highlight differences
    Compare Images    Reference.jpg    Candidate.jpg

Optional LLM for image comparison

*** Settings ***
Library    DocTest.Ai
Library    DocTest.VisualTest
Library    DocTest.PdfTest

*** Test Cases ***
Review Visual Differences With LLM
    Compare Images With LLM    Reference.pdf    Candidate.pdf    llm_override=${True}

Extract Text From Document With LLM
    ${text}=    Get Text With LLM    Candidate.pdf    prompt=Return text and table contents
    Log    ${text}

Image Should Contain Object With LLM
    Image Should Contain    Candidate.png    Missing product logo

Count Items With LLM
    ${count}=    Get Item Count From Image    Candidate.png    item_description=number of pallets
    Should Be True    ${count} >= 0

DocTest Library presentation at robocon.io 2021

Installation instructions

pip install --upgrade robotframework-doctestlibrary

Optional LLM-Assisted Comparisons

You can optionally rely on a large language model to review detected differences and decide whether a comparison should pass. This path is fully opt-in; nothing changes for users who skip these dependencies.

  1. Install the optional extra only when needed:

    pip install "robotframework-doctestlibrary[ai]"
    
  2. Create a .env file at the repository root (values here override existing environment variables):

    # OpenAI-compatible endpoints
    OPENAI_API_KEY=sk-...
    DOCTEST_LLM_MODEL=gpt-5,gpt-4o
    DOCTEST_LLM_VISION_MODEL=gpt-5-mini,gpt-4o-mini
    

    Azure OpenAI deployments can be configured with:

    AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
    AZURE_OPENAI_API_KEY=...
    AZURE_OPENAI_DEPLOYMENT=gpt-4o
    AZURE_OPENAI_API_VERSION=2024-06-01
    DOCTEST_LLM_PROVIDER=azure
    

    See .env.example for a combined template covering both providers.

  3. Use the dedicated keywords (or pass llm_enabled=${True} to existing ones):

    *** Test Cases ***
    Review Visual Differences With LLM
        Compare Images With LLM    Reference.pdf    Candidate.pdf    llm_override=${True}
    
    Review Pdf Structure With LLM
        Compare Pdf Documents With LLM    reference.pdf    candidate.pdf    compare=structure
    

Set llm_override=${True} when an LLM approval should override SSIM/DeepDiff failures. Without the override flag the AI feedback is logged for investigation while the original assertion result is preserved.

Pass llm_prompt= (or the specialty variants llm_visual_prompt= / llm_pdf_prompt=) to customise the prompt sent to the model for a particular comparison.

Only Python 3.X or newer is supported. Tested with Python 3.8/3.11/3.12

Install robotframework-doctestlibrary

Installation via pip from PyPI (recommended)

  • pip install --upgrade robotframework-doctestlibrary

Installation via pip from GitHub

  • pip install git+https://github.com/manykarim/robotframework-doctestlibrary.git

or

  • git clone https://github.com/manykarim/robotframework-doctestlibrary.git
  • cd robotframework-doctestlibrary
  • pip install -e .

Install dependencies

Install Tesseract, Ghostscript, GhostPCL, ImageMagick binaries and barcode libraries (libdmtx, zbar) on your system.
Hint: Since 0.2.0 Ghostscript, GhostPCL and ImageMagick are only needed for rendering .ps and .pclfiles.
Rendering and content parsing of .pdf is done via MuPDF
In the future there might be a separate pypi package for .pcl and .ps files to get rid of those dependencies.

Linux

apt-get install imagemagick tesseract-ocr ghostscript libdmtx0b libzbar0

Windows

Some special instructions for Windows

Rename executable for GhostPCL to pcl6.exe (only needed for .pcl support)

The executable for GhostPCL gpcl6win64.exe needs to be renamed to pcl6.exe

Otherwise it will not be possible to render .pcl files successfully for visual comparison.

Add tesseract, ghostscript and imagemagick to system path in windows (only needed for OCR, .pcl and .ps support)

  • C:\Program Files\ImageMagick-7.0.10-Q16-HDRI
  • C:\Program Files\Tesseract-OCR
  • C:\Program Files\gs\gs9.53.1\bin
  • C:\Program Files\gs\ghostpcl-9.53.1-win64

(The folder names and versions on your system might be different)

That means: When you open the CMD shell you can run the commands

  • magick.exe
  • tesseract.exe
  • gswin64.exe
  • pcl6.exe

successfully from any folder/location

Windows error message regarding pylibdmtx

How to solve ImportError for pylibdmtx

If you see an ugly ImportError when importing pylibdmtx on Windows you will most likely need the Visual C++ Redistributable Packages for Visual Studio 2013. Install vcredist_x64.exe if using 64-bit Python, vcredist_x86.exe if using 32-bit Python.

ImageMagick

The library might return the error File could not be converted by ImageMagick to OpenCV Image: <path to the file> when comparing PDF files. This is due to ImageMagick permissions. Verify this as follows with the sample.pdf in the testdata directory:

convert sample.pdf sample.jpg 
convert-im6.q16: attempt to perform an operation not allowed by the security policy

Solution is to copy the policy.xml from the repository to the ImageMagick installation directory.

Docker

You can also use the docker images or create your own Docker Image docker build -t robotframework-doctest . Afterwards you can, e.g., start the container and run the povided examples like this:

  • Windows
    • docker run -t -v "%cd%":/opt/test -w /opt/test robotframework-doctest robot atest/Compare.robot
  • Linux
    • docker run -t -v $PWD:/opt/test -w /opt/test robotframework-doctest robot atest/Compare.robot

Gitpod.io

Open in Gitpod
Try out the library using Gitpod

Examples

Have a look at

for more examples.

Testing with Robot Framework

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Compare two Images and highlight differences
    Compare Images    Reference.jpg    Candidate.jpg

Use masks/placeholders to exclude parts from visual comparison

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Compare two Images and ignore parts by using masks
    Compare Images    Reference.jpg    Candidate.jpg    placeholder_file=masks.json

Compare two PDF Docments and ignore parts by using masks
    Compare Images    Reference.jpg    Candidate.jpg    placeholder_file=masks.json

Compare two Farm images with date pattern
    Compare Images    Reference.jpg    Candidate.jpg    placeholder_file=testdata/pattern_mask.json

Compare two Farm images with area mask as list
    ${top_mask}    Create Dictionary    page=1    type=area    location=top    percent=10
    ${bottom_mask}    Create Dictionary    page=all    type=area    location=bottom    percent=10
    ${masks}    Create List    ${top_mask}    ${bottom_mask}
    Compare Images    Reference.jpg    Candidate.jpg    mask=${masks}

Compare two Farm images with area mask as string
    Compare Images    Reference.jpg    Candidate.jpg    mask=top:10;bottom:10

Different Mask Types to Ignore Parts When Comparing

Areas, Coordinates, Text Patterns
[
    {
    "page": "all",
    "name": "Date Pattern",
    "type": "pattern",
    "pattern": ".*[0-9]{2}-[a-zA-Z]{3}-[0-9]{4}.*"
    },
    {
    "page": "1",
    "name": "Top Border",
    "type": "area",
    "location": "top",
    "percent":  5
    },
    {
    "page": "1",
    "name": "Left Border",
    "type": "area",
    "location": "left",
    "percent":  5
    },
    {
    "page": 1,
    "name": "Top Rectangle",
    "type": "coordinates",
    "x": 0,
    "y": 0,
    "height": 10,
    "width": 210,
    "unit": "mm"
    }
]

Accept visual different by checking move distance or text content

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
    Compare Images    Reference.jpg    Candidate.jpg    move_tolerance=20

Accept if parts are moved up to 20 pixels by reading PDF Data
    Compare Images    Reference.pdf    Candidate.pdf    move_tolerance=20    get_pdf_content=${true}

Accept differences if text content is the same via OCR
    Compare Images    Reference.jpg    Candidate.jpg    check_text_content=${true}

Accept differences if text content is the same from PDF Data
    Compare Images    Reference.pdf    Candidate.pdf    check_text_content=${true}    get_pdf_content=${true}

Different options to detect moved parts/objects

*** Settings ***
Library    DocTest.VisualTest   movement_detection=orb

*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
    Compare Images    Reference.jpg    Candidate.jpg    move_tolerance=20
*** Settings ***
Library    DocTest.VisualTest   movement_detection=template

*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
    Compare Images    Reference.jpg    Candidate.jpg    move_tolerance=20
*** Settings ***
Library    DocTest.VisualTest   movement_detection=classic

*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
    Compare Images    Reference.jpg    Candidate.jpg    move_tolerance=20

Options for taking additional screenshots, screenshot format and render resolution

Take additional screenshots or reference and candidate file.

*** Settings ***
Library    DocTest.VisualTest   take_screenshots=${true}    screenshot_format=png

Take diff screenshots to highlight differences

*** Settings ***
Library    DocTest.VisualTest   show_diff=${true}    DPI=300

Experimental usage of Open CV East Text Detection to improve OCR

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Compare two Farm images with date pattern and east detection
    Compare Images    Reference.jpg    Candidate.jpg    placeholder_file=masks.json    ocr_engine=east

Check content of PDF files

*** Settings ***
Library    DocTest.PdfTest

*** Test Cases ***
Check if list of strings exists in PDF File
    @{strings}=    Create List    First String    Second String
    PDF Should Contain Strings    ${strings}    Candidate.pdf
    
Compare two PDF Files and only check text content
    Compare Pdf Documents    Reference.pdf    Candidate.pdf    compare=text

Compare two  PDF Files and only check text content and metadata
    Compare Pdf Documents    Reference.pdf    Candidate.pdf    compare=text,metadata
    
Compare two  PDF Files and check all possible content
    Compare Pdf Documents    Reference.pdf    Candidate.pdf

Ignoring dynamic fields in PDF comparisons

Both Compare Pdf Documents and Compare Pdf Structure now share the same helper options for masking boilerplate data:

  • mask accepts inline JSON/dicts, Robot lists, or a path to a JSON file generated by the screenshot tooling. Add unit=pt when coordinates are already expressed in PDF points.
  • text_mask_patterns lets you provide one or more regular expressions (string or @{list}) to drop matching lines from both PDFs before diffing. This is handy for timestamps, order numbers, etc.
  • ignore_ligatures=${True} normalises glyphs such as / to their ASCII representation so small font engine differences do not fail the comparison.
*** Settings ***
Library    DocTest.PdfTest

*** Test Cases ***
Compare invoices while ignoring IDs
    ${mask}=    Set Variable    {"pages":[{"page":1,"rectangles":[{"left":410,"top":36,"right":560,"bottom":78}],"unit":"pt"}]}
    Compare Pdf Structure    ${CURDIR}/baseline.pdf    ${CURDIR}/candidate.pdf    mask=${mask}
    ...    text_mask_patterns=\\bINV-\d{6}\\b    ignore_ligatures=${True}    position_tolerance=5

Normalizing special characters in PDF comparisons

PDFs may contain special Unicode characters that look identical to standard ASCII but fail string comparisons. Common examples include non-breaking spaces (\u00A0) instead of regular spaces, or typographic dashes (\u2013, \u2014) instead of hyphens.

Use the character_replacements parameter to normalize these characters:

*** Settings ***
# Apply character replacements to all keywords in the test suite
Library    DocTest.PdfTest    character_replacements={'\u00A0': ' '}
Library    DocTest.VisualTest    character_replacements={'\u00A0': ' ', '\u2013': '-'}

*** Test Cases ***
PDF comparison with normalized whitespace
    Compare Pdf Documents    reference.pdf    candidate.pdf    compare=text
    ...    character_replacements={'\u00A0': ' '}

Check PDF contains text with normalized characters
    PDF Should Contain Strings    Expected Text    candidate.pdf
    ...    character_replacements={'\u00A0': ' '}

Structure comparison with character normalization
    Compare Pdf Structure    reference.pdf    candidate.pdf
    ...    character_replacements={'\u00A0': ' ', '\u2013': '-'}

For VisualTest text extraction keywords that use assertion operators, use the Set Character Replacements keyword:

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Get text with normalized characters
    Set Character Replacements    {'\u00A0': ' '}
    ${text}=    Get Text    document.pdf    ==    Expected text
    Set Character Replacements    ${NONE}    # Clear when done

Common character replacements:

Character Unicode Description Replacement
\u00A0 Non-breaking space (regular space)
\u2013 En dash - (hyphen)
\u2014 Em dash - (hyphen)
\u2010 Unicode hyphen - (hyphen)

Comparing PDFs with different page layouts (font/size changes)

When comparing PDF documents where font or size changes cause text to reflow across pages, use the ignore_page_boundaries option to compare only text content and order, ignoring page structure:

*** Settings ***
Library    DocTest.PdfTest

*** Test Cases ***
Compare PDFs ignoring page breaks due to font change
    [Documentation]    Reference has 2 pages, candidate has 3 pages due to larger font
    ...                Both contain identical text in the same order
    Compare Pdf Structure    reference.pdf    candidate_larger_font.pdf
    ...    ignore_page_boundaries=${True}

Compare PDF Documents with cross-page text matching
    Compare Pdf Documents    reference.pdf    candidate.pdf
    ...    compare=structure    ignore_page_boundaries=${True}

For finer control over what gets compared, use the check_geometry and check_block_count options:

Parameter Default Description
ignore_page_boundaries ${False} Flatten text across all pages and compare only content and order. Automatically disables geometry and block count checks.
check_geometry ${True} When ${False}, skip line position/size comparison. Useful when layout differs but text matches.
check_block_count ${True} When ${False}, skip block count validation per page. Useful when text blocks are merged/split differently.
*** Settings ***
Library    DocTest.PdfTest

*** Test Cases ***
Compare PDF content only (ignore positions)
    [Documentation]    Same text content, but positions differ due to formatting
    Compare Pdf Structure    reference.pdf    reformatted.pdf
    ...    check_geometry=${False}

Compare PDF without block structure validation
    [Documentation]    Text blocks may be merged or split differently
    Compare Pdf Structure    reference.pdf    candidate.pdf
    ...    check_block_count=${False}

Pure content comparison (most flexible)
    [Documentation]    Only compare that text content and order match
    Compare Pdf Structure    reference.pdf    candidate.pdf
    ...    check_geometry=${False}    check_block_count=${False}

Note: When comparisons fail, a single summary warning is shown at the top of log.html (e.g., "Comparison failed: 5 difference(s) found"). Individual differences are logged as INFO messages within the keyword output for detailed inspection without cluttering the log summary.

Ignore Watermarks for Visual Comparisons

Store the watermark in a separate B/W image or PDF.
Watermark area needs to be filled with black color.
Watermark content will be subtracted from Visual Comparison result.

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Compare two Images and ignore jpg watermark
    Compare Images    Reference.jpg    Candidate.jpg    watermark_file=Watermark.jpg

Compare two Images and ignore pdf watermark
    Compare Images    Reference.pdf    Candidate.pdf    watermark_file=Watermark.pdf

Compare two Images and ignore watermark folder
    Compare Images    Reference.pdf    Candidate.pdf    watermark_file=${CURDIR}${/}watermarks

Watermarks can also be passed on Library import. This setting will apply to all Test Cases in Test Suite

*** Settings ***
Library    DocTest.VisualTest   watermark_file=${CURDIR}${/}watermarks

*** Test Cases ***
Compare two Images and ignore watermarks
    Compare Images    Reference.jpg    Candidate.jpg

Get Text From Documents or Images

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Get Text Content And Compare
    ${text}    Get Text From Document    Reference.pdf
    List Should Contain Value    ${text}    Test String

Get Barcodes From Documents or Images

*** Settings ***
Library    DocTest.VisualTest

*** Test Cases ***
Get Text Content And Compare
    ${text}    Get Barcodes From Document    reference.jpg
    List Should Contain Value    ${text}    123456789

Using pabot to run tests in parallel

Document Testing can be run in parallel using pabot.
However, you need to pass the additional arguments --artifacts and --artifactsinsubfolders to the pabot command, to move the screenshots to the correct subfolder.
Otherwise the screenshots will not be visible in the log.html

pabot --testlevelsplit --processes 8 --artifacts png,jpg,pdf,xml --artifactsinsubfolders /path/to/your/tests/

Visual Testing of Web Applications

I experimented a bit and tried to use this library for Visual Testing of Web Applications.
Please have a look at this pilot example here

Development

Feel free to create issues or pull requests.
I'm always happy for any feedback.

Core team

In order of appearance.

  • Many Kasiriha
  • April Wang

Contributors

This project is community driven and becomes a reality only through the work of all the people who contribute.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robotframework_doctestlibrary-0.31.0.tar.gz (96.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robotframework_doctestlibrary-0.31.0-py3-none-any.whl (106.2 kB view details)

Uploaded Python 3

File details

Details for the file robotframework_doctestlibrary-0.31.0.tar.gz.

File metadata

  • Download URL: robotframework_doctestlibrary-0.31.0.tar.gz
  • Upload date:
  • Size: 96.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.4 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for robotframework_doctestlibrary-0.31.0.tar.gz
Algorithm Hash digest
SHA256 75a95c635e9dc1bad1d5f16e91db8daf7c9fa020831b904205e094ed31fac200
MD5 13e07f98294373d581919ee89fed3073
BLAKE2b-256 415b8d41b1625613b0c97fa190aec039c7146a377b54097172947900d0ef65a9

See more details on using hashes here.

File details

Details for the file robotframework_doctestlibrary-0.31.0-py3-none-any.whl.

File metadata

File hashes

Hashes for robotframework_doctestlibrary-0.31.0-py3-none-any.whl
Algorithm Hash digest
SHA256 128ce372419a4fe1dff2a1661f9b8c6ca1a4457307c6c79d77ed74fb4613351b
MD5 41b2f1c43367f681d336324d515d5e97
BLAKE2b-256 5ae98ac5b3e7b9dc9fa11808f626c988f6e77e8ca40b63d60ebc0687bb77fe3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page