A library for Visual Document Testing
Project description
robotframework-doctestlibrary
Robot Framework DocTest library.
Simple Automated Visual Document Testing.
See keyword documentation for
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Compare two Images and highlight differences
Compare Images Reference.jpg Candidate.jpg
Optional LLM for image comparison
*** Settings ***
Library DocTest.Ai
Library DocTest.VisualTest
Library DocTest.PdfTest
*** Test Cases ***
Review Visual Differences With LLM
Compare Images With LLM Reference.pdf Candidate.pdf llm_override=${True}
Extract Text From Document With LLM
${text}= Get Text With LLM Candidate.pdf prompt=Return text and table contents
Log ${text}
Image Should Contain Object With LLM
Image Should Contain Candidate.png Missing product logo
Count Items With LLM
${count}= Get Item Count From Image Candidate.png item_description=number of pallets
Should Be True ${count} >= 0
Installation instructions
pip install --upgrade robotframework-doctestlibrary
Optional LLM-Assisted Comparisons
You can optionally rely on a large language model to review detected differences and decide whether a comparison should pass. This path is fully opt-in; nothing changes for users who skip these dependencies.
-
Install the optional extra only when needed:
pip install "robotframework-doctestlibrary[ai]"
-
Create a
.envfile at the repository root (values here override existing environment variables):# OpenAI-compatible endpoints OPENAI_API_KEY=sk-... DOCTEST_LLM_MODEL=gpt-5,gpt-4o DOCTEST_LLM_VISION_MODEL=gpt-5-mini,gpt-4o-mini
Azure OpenAI deployments can be configured with:
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ AZURE_OPENAI_API_KEY=... AZURE_OPENAI_DEPLOYMENT=gpt-4o AZURE_OPENAI_API_VERSION=2024-06-01 DOCTEST_LLM_PROVIDER=azure
See
.env.examplefor a combined template covering both providers. -
Use the dedicated keywords (or pass
llm_enabled=${True}to existing ones):*** Test Cases *** Review Visual Differences With LLM Compare Images With LLM Reference.pdf Candidate.pdf llm_override=${True} Review Pdf Structure With LLM Compare Pdf Documents With LLM reference.pdf candidate.pdf compare=structure
Set llm_override=${True} when an LLM approval should override SSIM/DeepDiff failures.
Without the override flag the AI feedback is logged for investigation while the original
assertion result is preserved.
Pass llm_prompt= (or the specialty variants llm_visual_prompt= / llm_pdf_prompt=) to
customise the prompt sent to the model for a particular comparison.
Only Python 3.X or newer is supported. Tested with Python 3.8/3.11/3.12
Install robotframework-doctestlibrary
Installation via pip from PyPI (recommended)
pip install --upgrade robotframework-doctestlibrary
Installation via pip from GitHub
pip install git+https://github.com/manykarim/robotframework-doctestlibrary.git
or
git clone https://github.com/manykarim/robotframework-doctestlibrary.gitcd robotframework-doctestlibrarypip install -e .
Install dependencies
Install Tesseract, Ghostscript, GhostPCL, ImageMagick binaries and barcode libraries (libdmtx, zbar) on your system.
Hint: Since 0.2.0 Ghostscript, GhostPCL and ImageMagick are only needed for rendering .ps and .pclfiles.
Rendering and content parsing of .pdf is done via MuPDF
In the future there might be a separate pypi package for .pcl and .ps files to get rid of those dependencies.
Linux
apt-get install imagemagick tesseract-ocr ghostscript libdmtx0b libzbar0
Windows
- https://github.com/UB-Mannheim/tesseract/wiki
- https://ghostscript.com/releases/gsdnld.html
- https://ghostscript.com/releases/gpcldnld.html
- https://imagemagick.org/script/download.php
Some special instructions for Windows
Rename executable for GhostPCL to pcl6.exe (only needed for .pcl support)
The executable for GhostPCL gpcl6win64.exe needs to be renamed to pcl6.exe
Otherwise it will not be possible to render .pcl files successfully for visual comparison.
Add tesseract, ghostscript and imagemagick to system path in windows (only needed for OCR, .pcl and .ps support)
- C:\Program Files\ImageMagick-7.0.10-Q16-HDRI
- C:\Program Files\Tesseract-OCR
- C:\Program Files\gs\gs9.53.1\bin
- C:\Program Files\gs\ghostpcl-9.53.1-win64
(The folder names and versions on your system might be different)
That means: When you open the CMD shell you can run the commands
magick.exetesseract.exegswin64.exepcl6.exe
successfully from any folder/location
Windows error message regarding pylibdmtx
How to solve ImportError for pylibdmtx
If you see an ugly ImportError when importing pylibdmtx on
Windows you will most likely need the Visual C++ Redistributable Packages for
Visual Studio 2013. Install vcredist_x64.exe if using 64-bit Python, vcredist_x86.exe if using 32-bit Python.
ImageMagick
The library might return the error File could not be converted by ImageMagick to OpenCV Image: <path to the file> when comparing PDF files.
This is due to ImageMagick permissions. Verify this as follows with the sample.pdf in the testdata directory:
convert sample.pdf sample.jpg
convert-im6.q16: attempt to perform an operation not allowed by the security policy
Solution is to copy the policy.xml from the repository to the ImageMagick installation directory.
Docker
You can also use the docker images or create your own Docker Image
docker build -t robotframework-doctest .
Afterwards you can, e.g., start the container and run the povided examples like this:
- Windows
docker run -t -v "%cd%":/opt/test -w /opt/test robotframework-doctest robot atest/Compare.robot
- Linux
docker run -t -v $PWD:/opt/test -w /opt/test robotframework-doctest robot atest/Compare.robot
Gitpod.io
Try out the library using Gitpod
Examples
Have a look at
for more examples.
Testing with Robot Framework
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Compare two Images and highlight differences
Compare Images Reference.jpg Candidate.jpg
Use masks/placeholders to exclude parts from visual comparison
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Compare two Images and ignore parts by using masks
Compare Images Reference.jpg Candidate.jpg placeholder_file=masks.json
Compare two PDF Docments and ignore parts by using masks
Compare Images Reference.jpg Candidate.jpg placeholder_file=masks.json
Compare two Farm images with date pattern
Compare Images Reference.jpg Candidate.jpg placeholder_file=testdata/pattern_mask.json
Compare two Farm images with area mask as list
${top_mask} Create Dictionary page=1 type=area location=top percent=10
${bottom_mask} Create Dictionary page=all type=area location=bottom percent=10
${masks} Create List ${top_mask} ${bottom_mask}
Compare Images Reference.jpg Candidate.jpg mask=${masks}
Compare two Farm images with area mask as string
Compare Images Reference.jpg Candidate.jpg mask=top:10;bottom:10
Different Mask Types to Ignore Parts When Comparing
Areas, Coordinates, Text Patterns
[
{
"page": "all",
"name": "Date Pattern",
"type": "pattern",
"pattern": ".*[0-9]{2}-[a-zA-Z]{3}-[0-9]{4}.*"
},
{
"page": "1",
"name": "Top Border",
"type": "area",
"location": "top",
"percent": 5
},
{
"page": "1",
"name": "Left Border",
"type": "area",
"location": "left",
"percent": 5
},
{
"page": 1,
"name": "Top Rectangle",
"type": "coordinates",
"x": 0,
"y": 0,
"height": 10,
"width": 210,
"unit": "mm"
}
]
Accept visual different by checking move distance or text content
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
Compare Images Reference.jpg Candidate.jpg move_tolerance=20
Accept if parts are moved up to 20 pixels by reading PDF Data
Compare Images Reference.pdf Candidate.pdf move_tolerance=20 get_pdf_content=${true}
Accept differences if text content is the same via OCR
Compare Images Reference.jpg Candidate.jpg check_text_content=${true}
Accept differences if text content is the same from PDF Data
Compare Images Reference.pdf Candidate.pdf check_text_content=${true} get_pdf_content=${true}
Different options to detect moved parts/objects
*** Settings ***
Library DocTest.VisualTest movement_detection=orb
*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
Compare Images Reference.jpg Candidate.jpg move_tolerance=20
*** Settings ***
Library DocTest.VisualTest movement_detection=template
*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
Compare Images Reference.jpg Candidate.jpg move_tolerance=20
*** Settings ***
Library DocTest.VisualTest movement_detection=classic
*** Test Cases ***
Accept if parts are moved up to 20 pixels by pure visual check
Compare Images Reference.jpg Candidate.jpg move_tolerance=20
Options for taking additional screenshots, screenshot format and render resolution
Take additional screenshots or reference and candidate file.
*** Settings ***
Library DocTest.VisualTest take_screenshots=${true} screenshot_format=png
Take diff screenshots to highlight differences
*** Settings ***
Library DocTest.VisualTest show_diff=${true} DPI=300
Experimental usage of Open CV East Text Detection to improve OCR
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Compare two Farm images with date pattern and east detection
Compare Images Reference.jpg Candidate.jpg placeholder_file=masks.json ocr_engine=east
Check content of PDF files
*** Settings ***
Library DocTest.PdfTest
*** Test Cases ***
Check if list of strings exists in PDF File
@{strings}= Create List First String Second String
PDF Should Contain Strings ${strings} Candidate.pdf
Compare two PDF Files and only check text content
Compare Pdf Documents Reference.pdf Candidate.pdf compare=text
Compare two PDF Files and only check text content and metadata
Compare Pdf Documents Reference.pdf Candidate.pdf compare=text,metadata
Compare two PDF Files and check all possible content
Compare Pdf Documents Reference.pdf Candidate.pdf
Ignoring dynamic fields in PDF comparisons
Both Compare Pdf Documents and Compare Pdf Structure now share the same helper options for masking boilerplate data:
maskaccepts inline JSON/dicts, Robot lists, or a path to a JSON file generated by the screenshot tooling. Addunit=ptwhen coordinates are already expressed in PDF points.text_mask_patternslets you provide one or more regular expressions (string or@{list}) to drop matching lines from both PDFs before diffing. This is handy for timestamps, order numbers, etc.ignore_ligatures=${True}normalises glyphs such asfi/flto their ASCII representation so small font engine differences do not fail the comparison.
*** Settings ***
Library DocTest.PdfTest
*** Test Cases ***
Compare invoices while ignoring IDs
${mask}= Set Variable {"pages":[{"page":1,"rectangles":[{"left":410,"top":36,"right":560,"bottom":78}],"unit":"pt"}]}
Compare Pdf Structure ${CURDIR}/baseline.pdf ${CURDIR}/candidate.pdf mask=${mask}
... text_mask_patterns=\\bINV-\d{6}\\b ignore_ligatures=${True} position_tolerance=5
Normalizing special characters in PDF comparisons
PDFs may contain special Unicode characters that look identical to standard ASCII but fail string comparisons. Common examples include non-breaking spaces (\u00A0) instead of regular spaces, or typographic dashes (\u2013, \u2014) instead of hyphens.
Use the character_replacements parameter to normalize these characters:
*** Settings ***
# Apply character replacements to all keywords in the test suite
Library DocTest.PdfTest character_replacements={'\u00A0': ' '}
Library DocTest.VisualTest character_replacements={'\u00A0': ' ', '\u2013': '-'}
*** Test Cases ***
PDF comparison with normalized whitespace
Compare Pdf Documents reference.pdf candidate.pdf compare=text
... character_replacements={'\u00A0': ' '}
Check PDF contains text with normalized characters
PDF Should Contain Strings Expected Text candidate.pdf
... character_replacements={'\u00A0': ' '}
Structure comparison with character normalization
Compare Pdf Structure reference.pdf candidate.pdf
... character_replacements={'\u00A0': ' ', '\u2013': '-'}
For VisualTest text extraction keywords that use assertion operators, use the Set Character Replacements keyword:
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Get text with normalized characters
Set Character Replacements {'\u00A0': ' '}
${text}= Get Text document.pdf == Expected text
Set Character Replacements ${NONE} # Clear when done
Common character replacements:
| Character | Unicode | Description | Replacement |
|---|---|---|---|
|
\u00A0 |
Non-breaking space | (regular space) |
– |
\u2013 |
En dash | - (hyphen) |
— |
\u2014 |
Em dash | - (hyphen) |
‐ |
\u2010 |
Unicode hyphen | - (hyphen) |
Comparing PDFs with different page layouts (font/size changes)
When comparing PDF documents where font or size changes cause text to reflow across pages, use the ignore_page_boundaries option to compare only text content and order, ignoring page structure:
*** Settings ***
Library DocTest.PdfTest
*** Test Cases ***
Compare PDFs ignoring page breaks due to font change
[Documentation] Reference has 2 pages, candidate has 3 pages due to larger font
... Both contain identical text in the same order
Compare Pdf Structure reference.pdf candidate_larger_font.pdf
... ignore_page_boundaries=${True}
Compare PDF Documents with cross-page text matching
Compare Pdf Documents reference.pdf candidate.pdf
... compare=structure ignore_page_boundaries=${True}
For finer control over what gets compared, use the check_geometry and check_block_count options:
| Parameter | Default | Description |
|---|---|---|
ignore_page_boundaries |
${False} |
Flatten text across all pages and compare only content and order. Automatically disables geometry and block count checks. |
check_geometry |
${True} |
When ${False}, skip line position/size comparison. Useful when layout differs but text matches. |
check_block_count |
${True} |
When ${False}, skip block count validation per page. Useful when text blocks are merged/split differently. |
*** Settings ***
Library DocTest.PdfTest
*** Test Cases ***
Compare PDF content only (ignore positions)
[Documentation] Same text content, but positions differ due to formatting
Compare Pdf Structure reference.pdf reformatted.pdf
... check_geometry=${False}
Compare PDF without block structure validation
[Documentation] Text blocks may be merged or split differently
Compare Pdf Structure reference.pdf candidate.pdf
... check_block_count=${False}
Pure content comparison (most flexible)
[Documentation] Only compare that text content and order match
Compare Pdf Structure reference.pdf candidate.pdf
... check_geometry=${False} check_block_count=${False}
Note: When comparisons fail, a single summary warning is shown at the top of log.html (e.g., "Comparison failed: 5 difference(s) found"). Individual differences are logged as INFO messages within the keyword output for detailed inspection without cluttering the log summary.
Ignore Watermarks for Visual Comparisons
Store the watermark in a separate B/W image or PDF.
Watermark area needs to be filled with black color.
Watermark content will be subtracted from Visual Comparison result.
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Compare two Images and ignore jpg watermark
Compare Images Reference.jpg Candidate.jpg watermark_file=Watermark.jpg
Compare two Images and ignore pdf watermark
Compare Images Reference.pdf Candidate.pdf watermark_file=Watermark.pdf
Compare two Images and ignore watermark folder
Compare Images Reference.pdf Candidate.pdf watermark_file=${CURDIR}${/}watermarks
Watermarks can also be passed on Library import. This setting will apply to all Test Cases in Test Suite
*** Settings ***
Library DocTest.VisualTest watermark_file=${CURDIR}${/}watermarks
*** Test Cases ***
Compare two Images and ignore watermarks
Compare Images Reference.jpg Candidate.jpg
Get Text From Documents or Images
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Get Text Content And Compare
${text} Get Text From Document Reference.pdf
List Should Contain Value ${text} Test String
Get Barcodes From Documents or Images
*** Settings ***
Library DocTest.VisualTest
*** Test Cases ***
Get Text Content And Compare
${text} Get Barcodes From Document reference.jpg
List Should Contain Value ${text} 123456789
Using pabot to run tests in parallel
Document Testing can be run in parallel using pabot.
However, you need to pass the additional arguments --artifacts and --artifactsinsubfolders to the pabot command, to move the screenshots to the correct subfolder.
Otherwise the screenshots will not be visible in the log.html
pabot --testlevelsplit --processes 8 --artifacts png,jpg,pdf,xml --artifactsinsubfolders /path/to/your/tests/
Visual Testing of Web Applications
I experimented a bit and tried to use this library for Visual Testing of Web Applications.
Please have a look at this pilot example here
Development
Feel free to create issues or pull requests.
I'm always happy for any feedback.
Core team
In order of appearance.
- Many Kasiriha
- April Wang
Contributors
This project is community driven and becomes a reality only through the work of all the people who contribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file robotframework_doctestlibrary-0.31.0.tar.gz.
File metadata
- Download URL: robotframework_doctestlibrary-0.31.0.tar.gz
- Upload date:
- Size: 96.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.4 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75a95c635e9dc1bad1d5f16e91db8daf7c9fa020831b904205e094ed31fac200
|
|
| MD5 |
13e07f98294373d581919ee89fed3073
|
|
| BLAKE2b-256 |
415b8d41b1625613b0c97fa190aec039c7146a377b54097172947900d0ef65a9
|
File details
Details for the file robotframework_doctestlibrary-0.31.0-py3-none-any.whl.
File metadata
- Download URL: robotframework_doctestlibrary-0.31.0-py3-none-any.whl
- Upload date:
- Size: 106.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.4 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
128ce372419a4fe1dff2a1661f9b8c6ca1a4457307c6c79d77ed74fb4613351b
|
|
| MD5 |
41b2f1c43367f681d336324d515d5e97
|
|
| BLAKE2b-256 |
5ae98ac5b3e7b9dc9fa11808f626c988f6e77e8ca40b63d60ebc0687bb77fe3b
|