Annex IV Review: analyze PDF documents for EU AI Act compliance

These details have not been verified by PyPI

Project description

Annex IV Review (annex4ac)

Анализ PDF документов на соответствие требованиям EU AI Act Annex IV и GDPR.

⚠️ Legal Disclaimer: This software is provided for informational and compliance assistance purposes only. It is not legal advice and should not be relied upon as such. Users are responsible for ensuring their documentation meets all applicable legal requirements and should consult with qualified legal professionals for compliance matters. The authors disclaim any liability for damages arising from the use of this software.

🔒 Data Protection: All processing occurs locally on your machine. No data leaves your system.

🚀 Quick‑start

# 1 Install (Python 3.9)
pip install annex4ac

# 2 Review single PDF document
from annex4ac import review_single_document
from pathlib import Path

issues = review_single_document(Path("technical_documentation.pdf"))
for issue in issues:
    print(f"{issue['type']}: {issue['message']}")

# 3 Review multiple PDF documents
from annex4ac import review_documents

issues = review_documents([
    Path("doc1.pdf"), 
    Path("doc2.pdf")
])

# 4 Analyze text content directly
from annex4ac import analyze_text

issues = analyze_text("AI system content...", "document.txt")

✨ Features

Advanced NLP Analysis

Intelligent negation detection: Uses spaCy and negspaCy for accurate analysis
Contradiction detection: Finds inconsistencies within and between documents
Section validation: Checks all 9 required Annex IV sections
GDPR compliance: Analyzes data protection and privacy issues

Compliance Checks

Missing sections: Identifies absent Annex IV sections (1-9)
High-risk classification: Detects high-risk use cases without proper labeling
Data protection: Checks GDPR compliance requirements
Transparency: Verifies explainability and bias detection mentions

Multiple Input Formats

PDF files: Supports PyPDF2, pdfplumber, and PyMuPDF
Text content: Direct text analysis
Batch processing: Review multiple documents simultaneously

📋 API Reference

Core Functions

`review_documents(pdf_files: List[Path], batch_size: int = 128) -> List[dict]`

Review multiple PDF documents for compliance issues.

Parameters:

pdf_files: List of Path objects pointing to PDF files
batch_size: Number of pages to process in each batch (default: 128)

Returns: List of structured issue dictionaries with keys: type, section, file, message

`review_single_document(pdf_file: Path) -> List[dict]`

Review a single PDF document for compliance issues.

`analyze_text(text: str, filename: str = "document") -> List[dict]`

Analyze text content for compliance issues.

`extract_text_from_pdf(pdf_path: Path) -> str`

Extract text from PDF file using available libraries.

HTTP API Support

`handle_multipart_review_request(headers: dict, body: bytes) -> dict`

Handle multipart/form-data request for document review.

`handle_text_review_request(text_content: str, filename: str = "document.txt") -> dict`

Handle text review request.

`create_review_response(issues: List[dict], processed_files: List[str]) -> dict`

Create structured response for review results.

🔍 Issue Types

Errors (Critical Issues)

Missing required Annex IV sections
Contradictions between documents
High-risk use cases without proper classification
GDPR violations (indefinite data retention, missing legal basis)

Warnings (Recommendations)

Missing transparency or explainability mentions
No bias detection or fairness measures
Missing security or robustness measures
Only negative mentions of compliance terms

📊 Example Output

============================================================
COMPLIANCE REVIEW RESULTS
============================================================

❌ ERRORS (2):
  1. [doc1.pdf] (Section 1) Missing content for Annex IV section 1 (system overview).
  2. [doc2.pdf] (Section 5) No mention of risk management procedures.

⚠️  WARNINGS (1):
  1. [doc1.pdf] No mention of transparency or explainability.

Found 3 total issue(s): 2 errors, 1 warnings

🛠 Requirements

Python 3.9
PDF Processing: PyPDF2, pdfplumber, or PyMuPDF
NLP Analysis: spaCy, negspaCy, nltk

📚 References

Annex IV HTML – https://artificialintelligenceact.eu/annex/4/
EU AI Act – https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689

📄 Licensing

This project is licensed under the MIT License - see the LICENSE file for details.

The software assists in preparing documentation, but does not confirm compliance with legal requirements or standards. The user is responsible for the final accuracy and compliance of the documents.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.3

Aug 2, 2025

1.1.2

Aug 2, 2025

1.1.1

Aug 2, 2025

1.1.0

Aug 2, 2025

1.0.9

Aug 2, 2025

1.0.8

Aug 2, 2025

1.0.7

Aug 2, 2025

1.0.6

Aug 2, 2025

1.0.5

Aug 2, 2025

1.0.4

Aug 2, 2025

1.0.3

Aug 2, 2025

1.0.2

Aug 2, 2025

1.0.1

Aug 2, 2025

1.0.0

Aug 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annex4review-1.1.3.tar.gz (16.4 kB view details)

Uploaded Aug 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

annex4review-1.1.3-py3-none-any.whl (14.7 kB view details)

Uploaded Aug 2, 2025 Python 3

File details

Details for the file annex4review-1.1.3.tar.gz.

File metadata

Download URL: annex4review-1.1.3.tar.gz
Upload date: Aug 2, 2025
Size: 16.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for annex4review-1.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f15463fc24a92fb0fe19a29ebacabf237701424f830c13d04a50f89f8e94cb80`
MD5	`e5c33e30f0b257ac74848358545a254d`
BLAKE2b-256	`ee0daf76b74d3ab293f7e1a75b6eb4d46558f3256f178d8139c09bc49438d3b7`

See more details on using hashes here.

File details

Details for the file annex4review-1.1.3-py3-none-any.whl.

File metadata

Download URL: annex4review-1.1.3-py3-none-any.whl
Upload date: Aug 2, 2025
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for annex4review-1.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`22d052e454a7c491354345f9dc7e4e6d49ea976411d24391f40b51011c7a0bfd`
MD5	`3b08194227a9fcd0b5d5a3e421a02694`
BLAKE2b-256	`3f63a7e1e1af29cd393df9ec52ae4bba10f1ece5bbeed6b308a37d9ad1b40dfc`

See more details on using hashes here.

annex4review 1.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Annex IV Review (annex4ac)

🚀 Quick‑start

✨ Features

Advanced NLP Analysis

Compliance Checks

Multiple Input Formats

📋 API Reference

Core Functions

review_documents(pdf_files: List[Path], batch_size: int = 128) -> List[dict]

review_single_document(pdf_file: Path) -> List[dict]

analyze_text(text: str, filename: str = "document") -> List[dict]

extract_text_from_pdf(pdf_path: Path) -> str

HTTP API Support

handle_multipart_review_request(headers: dict, body: bytes) -> dict

handle_text_review_request(text_content: str, filename: str = "document.txt") -> dict

create_review_response(issues: List[dict], processed_files: List[str]) -> dict

🔍 Issue Types

Errors (Critical Issues)

Warnings (Recommendations)

📊 Example Output

🛠 Requirements

📚 References

📄 Licensing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`review_documents(pdf_files: List[Path], batch_size: int = 128) -> List[dict]`

`review_single_document(pdf_file: Path) -> List[dict]`

`analyze_text(text: str, filename: str = "document") -> List[dict]`

`extract_text_from_pdf(pdf_path: Path) -> str`

`handle_multipart_review_request(headers: dict, body: bytes) -> dict`

`handle_text_review_request(text_content: str, filename: str = "document.txt") -> dict`

`create_review_response(issues: List[dict], processed_files: List[str]) -> dict`