Skip to main content

Annex IV Review: analyze PDF documents for EU AI Act compliance

Project description

Annex IV Review (annex4ac)

Анализ PDF документов на соответствие требованиям EU AI Act Annex IV и GDPR.

⚠️ Legal Disclaimer: This software is provided for informational and compliance assistance purposes only. It is not legal advice and should not be relied upon as such. Users are responsible for ensuring their documentation meets all applicable legal requirements and should consult with qualified legal professionals for compliance matters. The authors disclaim any liability for damages arising from the use of this software.

🔒 Data Protection: All processing occurs locally on your machine. No data leaves your system.


🚀 Quick‑start

# 1 Install (Python 3.9)
pip install annex4ac

# 2 Review single PDF document
from annex4ac import review_single_document
from pathlib import Path

issues = review_single_document(Path("technical_documentation.pdf"))
for issue in issues:
    print(f"{issue['type']}: {issue['message']}")

# 3 Review multiple PDF documents
from annex4ac import review_documents

issues = review_documents([
    Path("doc1.pdf"), 
    Path("doc2.pdf")
])

# 4 Analyze text content directly
from annex4ac import analyze_text

issues = analyze_text("AI system content...", "document.txt")

✨ Features

Advanced NLP Analysis

  • Intelligent negation detection: Uses spaCy and negspaCy for accurate analysis
  • Contradiction detection: Finds inconsistencies within and between documents
  • Section validation: Checks all 9 required Annex IV sections
  • GDPR compliance: Analyzes data protection and privacy issues

Compliance Checks

  • Missing sections: Identifies absent Annex IV sections (1-9)
  • High-risk classification: Detects high-risk use cases without proper labeling
  • Data protection: Checks GDPR compliance requirements
  • Transparency: Verifies explainability and bias detection mentions

Multiple Input Formats

  • PDF files: Supports PyPDF2, pdfplumber, and PyMuPDF
  • Text content: Direct text analysis
  • Batch processing: Review multiple documents simultaneously

📋 API Reference

Core Functions

review_documents(pdf_files: List[Path], batch_size: int = 128) -> List[dict]

Review multiple PDF documents for compliance issues.

Parameters:

  • pdf_files: List of Path objects pointing to PDF files
  • batch_size: Number of pages to process in each batch (default: 128)

Returns: List of structured issue dictionaries with keys: type, section, file, message

review_single_document(pdf_file: Path) -> List[dict]

Review a single PDF document for compliance issues.

analyze_text(text: str, filename: str = "document") -> List[dict]

Analyze text content for compliance issues.

extract_text_from_pdf(pdf_path: Path) -> str

Extract text from PDF file using available libraries.

HTTP API Support

handle_multipart_review_request(headers: dict, body: bytes) -> dict

Handle multipart/form-data request for document review.

handle_text_review_request(text_content: str, filename: str = "document.txt") -> dict

Handle text review request.

create_review_response(issues: List[dict], processed_files: List[str]) -> dict

Create structured response for review results.


🔍 Issue Types

Errors (Critical Issues)

  • Missing required Annex IV sections
  • Contradictions between documents
  • High-risk use cases without proper classification
  • GDPR violations (indefinite data retention, missing legal basis)

Warnings (Recommendations)

  • Missing transparency or explainability mentions
  • No bias detection or fairness measures
  • Missing security or robustness measures
  • Only negative mentions of compliance terms

📊 Example Output

============================================================
COMPLIANCE REVIEW RESULTS
============================================================

❌ ERRORS (2):
  1. [doc1.pdf] (Section 1) Missing content for Annex IV section 1 (system overview).
  2. [doc2.pdf] (Section 5) No mention of risk management procedures.

⚠️  WARNINGS (1):
  1. [doc1.pdf] No mention of transparency or explainability.

Found 3 total issue(s): 2 errors, 1 warnings

🛠 Requirements

  • Python 3.9
  • PDF Processing: PyPDF2, pdfplumber, or PyMuPDF
  • NLP Analysis: spaCy, negspaCy, nltk

📚 References


📄 Licensing

This project is licensed under the MIT License - see the LICENSE file for details.

The software assists in preparing documentation, but does not confirm compliance with legal requirements or standards. The user is responsible for the final accuracy and compliance of the documents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annex4review-1.1.3.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

annex4review-1.1.3-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file annex4review-1.1.3.tar.gz.

File metadata

  • Download URL: annex4review-1.1.3.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for annex4review-1.1.3.tar.gz
Algorithm Hash digest
SHA256 f15463fc24a92fb0fe19a29ebacabf237701424f830c13d04a50f89f8e94cb80
MD5 e5c33e30f0b257ac74848358545a254d
BLAKE2b-256 ee0daf76b74d3ab293f7e1a75b6eb4d46558f3256f178d8139c09bc49438d3b7

See more details on using hashes here.

File details

Details for the file annex4review-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: annex4review-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for annex4review-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 22d052e454a7c491354345f9dc7e4e6d49ea976411d24391f40b51011c7a0bfd
MD5 3b08194227a9fcd0b5d5a3e421a02694
BLAKE2b-256 3f63a7e1e1af29cd393df9ec52ae4bba10f1ece5bbeed6b308a37d9ad1b40dfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page