PDF Accessibility Validator
Project description
avalpdf - PDF Accessibility Validator
A command-line tool for validating PDF accessibility, analyzing document structure, and generating detailed reports.
Features
Document structure analysis and support
- Document structure analysis
- Support for both local and remote PDF files
Document tags and metadata validation
- Document tagging status
- Title presence
- Language declaration (Italian)
Heading hierarchy validation
- H1 presence
- Correct heading levels sequence
Figure alt text validation
- Missing alternative text detection
- Complex or problematic alt text patterns
Tables structure validation
- Header presence and proper structure
- Empty cells detection
- Duplicate headers check
- Multiple header rows warning
- Empty tables detection
Lists structure validation
- Proper list tagging
- Detection of untagged lists (consecutive paragraphs with bullets/numbers)
- Misused list types (numbered items in unordered lists)
- List hierarchy consistency
Links validation
- Detection of non-descriptive links
- Raw URL text warnings
- Email and institutional domain exceptions
Formatting issues detection
- Excessive underscores (used for underlining)
- Spaced capital letters (like "T E S T")
- Extra spaces used for layout (3+ consecutive spaces)
Empty elements detection
- Empty paragraphs
- Whitespace-only elements
- Empty headings
- Empty spans
- Empty table cells
Output formats
- Detailed JSON structure
- Simplified JSON
- Accessibility validation report
- Consolidated batch report for multiple files
- Console reports with color-coded structure visualization
Scoring and reporting
- Weighted scoring system based on accessibility criteria
- Detailed issue categorization (issues, warnings, successes)
Batch processing
- Process multiple files with glob patterns (e.g.,
*.pdf) - Directory scanning
- Concise progress display for multiple files
- Consolidated batch report with aggregated statistics
- Parallel processing for faster validation on multi-core systems
Installation
Using pip
pip install avalpdf
Or uv
uv tool install avalpdf
Updates
Using pip
pip install avalpdf --upgrade
Or uv
uv tool install avalpdf --upgrade
Usage
After installation, you can run avalpdf from any directory.
Quick start
Simply run
avalpdf thesis.pdf
or
avalpdf https://example.com/document.pdf
to get a report like this
and a preview of the structure
Details
# Basic validation with console output
avalpdf document.pdf
# Display version information
avalpdf --version
Multi-file Analysis
avalpdf supports analyzing multiple PDF files in a single command using parallel processing:
# Multiple files specified directly
avalpdf file1.pdf file2.pdf file3.pdf
# Using wildcard pattern (use quotes on some shells)
avalpdf "*.pdf"
# Process all PDFs in a specific directory
avalpdf "reports/quarterly/*.pdf"
# Analyze all PDFs in the current directory
avalpdf *.pdf
# Specify a directory to scan
avalpdf /path/to/documents/
# Mix of patterns and specific files
avalpdf annual_report.pdf "monthly/*.pdf" project_docs/specs.pdf
When processing multiple files, avalpdf automatically uses parallel processing to take advantage of multi-core systems, significantly improving performance for large batches of documents.
When using wildcards on Unix/Linux shells, you may need to quote the pattern if you want avalpdf to handle the expansion rather than the shell.
Multi-file Output
When analyzing multiple files, avalpdf displays a concise progress view:
[1/5] ✅ document1.pdf: 0 issues, 2 warnings
[2/5] ❌ document2.pdf: 3 issues, 5 warnings
[3/5] ⚠️ document3.pdf: Error - Failed to open PDF
[4/5] ✅ document4.pdf: 0 issues, 0 warnings
[5/5] ❌ document5.pdf: 2 issues, 1 warnings
📊 Batch Processing Summary:
• Total files processed: 5
• Files with issues: 2
• Total issues: 5
• Total warnings: 8
• Average accessibility score: 82.5%
✨ Batch processing complete!
By default, a consolidated batch report is saved when processing multiple files. This JSON file contains:
- Analysis results for each file
- Metadata and accessibility score for each file
- Aggregated statistics across all files
- Timestamp of the analysis
To specify the output location for the batch report, you have multiple options:
# Specify output directory (report will have a timestamp-based name)
avalpdf *.pdf -o /path/to/output/
# Specify exact filename (including path)
avalpdf *.pdf -o /path/to/output/report.json
# Alternative: specify output directory and custom filename
avalpdf *.pdf -o /path/to/output --batch-report=my_report.json
When -o points to a file ending with .json, it will be used as the exact batch report path. Otherwise, it's treated as a directory.
Analyzing Batch Reports
The batch report JSON file can be analyzed with command-line tools to extract useful information. For example, you can convert the batch report to CSV format for analysis in spreadsheet software:
avalpdf_batch_report_20250323_012754.json jq '.files[] | {filename, poducer: .metadata.producer, creator: .metadata.creator, standard: .metadata.standard, n_issues: .issues_count, n_warnings: .warnings_count, accessibility_score}' | mlr --j2c cat | vd
This command uses:
jqto extract specific fields from each file entrymiller(mlr) to convert JSON to CSVvisidata(vd) to view and analyze the data interactively
You can modify the jq query to extract different fields based on your analysis needs.
Common Multi-file Scenarios
# Analyze all PDFs in a directory, save individual reports
avalpdf "reports/*.pdf" --report
# Analyze multiple files silently and save batch report
avalpdf file1.pdf file2.pdf file3.pdf --quiet
# Process files in different directories
avalpdf "team1/*.pdf" "team2/*.pdf" "shared/*.pdf"
# Analyze all PDFs in a directory and subdirectories
# (use find in Unix/Linux or dir /s in Windows to collect paths)
find . -name "*.pdf" | xargs avalpdf
Command Line Options
--full: Save full JSON structure--simple: Save simplified JSON structure--report: Save validation report--batch-report[=FILENAME]: Save consolidated batch report when processing multiple files. Optionally specify filename--output-dir,-o: Specify output directory--show-structure: Display document structure--show-validation: Display validation results--quiet,-q: Suppress console output--rich: Use enhanced visual formatting for document structure--tree: Use tree view instead of panel view with Rich formatting--version,-v: Display the version number and exit
Examples
- Quick accessibility check:
avalpdf thesis.pdf
- Generate all reports:
avalpdf report.pdf --full --simple --report -o ./analysis
- Silent operation with report generation:
avalpdf document.pdf --report -q
- Analyze multiple files:
avalpdf *.pdf
- Analyze directory:
avalpdf documents/
- Process specific file pattern and save reports in output directory:
avalpdf "invoices/2023_*.pdf" -o validation_results --report
- Quiet batch processing:
avalpdf *.pdf --quiet --batch-report -o reports
Batch Report Format
The consolidated batch report is saved as a JSON file with this structure:
{
"timestamp": "2023-05-20T14:30:45.123456",
"formatted_date": "2023-05-20 14:30:45",
"summary": {
"total_files": 3,
"files_with_issues": 1,
"total_issues": 3,
"total_warnings": 7,
"average_accessibility_score": 70.25,
"successful_files": 2,
"failed_files": 1
},
"files": [
{
"filename": "document1.pdf",
"path": "/path/to/document1.pdf",
"index": 1,
"metadata": {
"title": "Sample Document",
"tagged": "true",
"lang": "it",
"num_pages": "10"
},
"issues_count": 0,
"warnings_count": 2,
"accessibility_score": 95.5,
"success": true,
"has_issues": false
},
{
"filename": "document2.pdf",
"path": "/path/to/document2.pdf",
"index": 2,
"metadata": {
"title": "Another Document",
"tagged": "false",
"lang": "",
"num_pages": "5"
},
"issues_count": 3,
"warnings_count": 5,
"accessibility_score": 45.0,
"success": true,
"has_issues": true
},
{
"filename": "document3.pdf",
"path": "/path/to/document3.pdf",
"index": 3,
"success": false,
"error": "Failed to open PDF",
"issues_count": 0,
"warnings_count": 0,
"accessibility_score": 0.0
}
]
}
This structured format makes it easy to:
- Sort files by name, accessibility score, or issues count
- Filter files with issues or errors
- Process results using data analysis tools
- Generate custom reports from the consolidated data
Validation Output
The tool provides three types of findings:
- ✅ Successes: Correctly implemented accessibility features
- ⚠️ Warnings: Potential issues that need attention
- ❌ Issues: Problems that must be fixed
Report Format
{
"validation_results": {
"issues": ["..."],
"warnings": ["..."],
"successes": ["..."]
}
}
License
MIT License
Support
For issues or suggestions:
- Open an issue on GitHub
- Provide the PDF file (if possible) and the complete error message
- Include the command you used and your operating system information
Local development
uv venv .test
source .test/bin/activate
uv pip install -e . --upgrade
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file avalpdf-0.1.7.tar.gz.
File metadata
- Download URL: avalpdf-0.1.7.tar.gz
- Upload date:
- Size: 43.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29dc1066c1aa89baac6e157cca0e4657b87c83f5d8a5dedff9a7d6926555412a
|
|
| MD5 |
95406d90a6068c28d29a4eacc3dd4c20
|
|
| BLAKE2b-256 |
8455cbdd7fa1f35dcba63d60f469ff9b433e971a0fe591284b2ac37c48e83200
|
File details
Details for the file avalpdf-0.1.7-py3-none-any.whl.
File metadata
- Download URL: avalpdf-0.1.7-py3-none-any.whl
- Upload date:
- Size: 41.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c43c16e6b1020142af8787f69e6f982ae95d550489f39642b838ce9032d8ec3
|
|
| MD5 |
fc59bd116ab5455888732e2022a6e1f6
|
|
| BLAKE2b-256 |
01ea7e0cb523c742031031d38e675fd9833d43692731c8ae871b654f576f9200
|