NLP-based compliance analysis for EU AI Act Annex IV documents

These details have not been verified by PyPI

Project description

annex4nlp

NLP-based compliance analysis for EU AI Act Annex IV documents.

This package provides advanced natural language processing capabilities for analyzing technical documentation for compliance with EU AI Act Annex IV and GDPR requirements.

⚠️ Legal Disclaimer: This software is provided for informational and compliance assistance purposes only. It is not legal advice and should not be relied upon as such. Users are responsible for ensuring their documentation meets all applicable legal requirements and should consult with qualified legal professionals for compliance matters.

🔒 Data Protection: All processing occurs locally on your machine. No data leaves your system.

🚀 Quick Start

# Install the package
pip install annex4nlp

# Analyze a single PDF file
annex4nlp document.pdf

# Analyze multiple PDF files
annex4nlp doc1.pdf doc2.pdf doc3.pdf

# Hide informational messages (negated terms)
annex4nlp document.pdf --hide-info

✨ Features

📄 PDF Text Extraction: Extract text from PDF documents using multiple libraries (PyPDF2, pdfplumber, PyMuPDF)
🔍 Compliance Analysis: Analyze documents for missing Annex IV sections and compliance issues
⚠️ Contradiction Detection: Detect contradictions within and across documents using NLP
🔒 GDPR Compliance: Check for GDPR compliance issues in technical documentation
⚡ Batch Processing: Efficient batch processing of multiple documents
🖥️ CLI Interface: Command-line interface for easy integration
🧠 Advanced NLP: Uses spaCy and negspaCy for intelligent analysis
📊 Detailed Reporting: Console output with error/warning/info classification

Installation

pip install annex4nlp

📖 Usage

CLI Usage

# Analyze a single PDF file
annex4nlp document.pdf

# Analyze multiple PDF files
annex4nlp doc1.pdf doc2.pdf doc3.pdf

# Hide informational messages (negated terms)
annex4nlp document.pdf --hide-info

# Get help
annex4nlp --help

Python API

from annex4nlp import review_documents
from pathlib import Path

# Analyze multiple PDF files
pdf_files = [Path("doc1.pdf"), Path("doc2.pdf")]
issues = review_documents(pdf_files)

for issue in issues:
    print(f"{issue['type']}: {issue['message']}")

Single Document Analysis

from annex4nlp import review_single_document

issues = review_single_document(Path("document.pdf"))

Text Analysis

from annex4nlp import analyze_text

text_content = "Your technical documentation text here..."
issues = analyze_text(text_content, "document_name")

API with Info Filtering

from annex4nlp import create_review_response

# Get all issues including info messages
response = create_review_response(issues, ["document.pdf"], hide_info=False)

# Filter out info messages
response = create_review_response(issues, ["document.pdf"], hide_info=True)

🔍 Analysis Capabilities

Annex IV Compliance

Section Validation: Checks for all required Annex IV sections (1-9)
High-risk Detection: Validates high-risk system declarations
Missing Elements: Identifies missing compliance elements
Content Analysis: Analyzes section content for completeness

GDPR Compliance

Personal Data: Personal data handling analysis
Legal Basis: Legal basis verification for data processing
Data Subject Rights: Checking for data subject rights mentions
Retention Periods: Validation of data retention periods
Consent Management: Analysis of consent mechanisms

Contradiction Detection

Internal Contradictions: Finds inconsistencies within single documents
Cross-document Issues: Detects contradictions between multiple documents
System Information: Identifies conflicts in system names and versions
Policy Inconsistencies: Finds conflicting policy statements

Advanced NLP Features

Negation Detection: Uses negspaCy for intelligent negation handling
Term Matching: Advanced term matching with spaCy
Semantic Analysis: Semantic analysis of compliance terms
Context Awareness: Context-aware analysis of technical documentation
Info Messages: Informational messages about negated terms (can be filtered with --hide-info)

Issue Types

The analysis categorizes issues into three types:

❌ ERRORS: Critical compliance issues that need immediate attention
- Missing Annex IV sections
- Internal contradictions within documents
- Cross-document contradictions
⚠️ WARNINGS: Potential issues that should be reviewed
- GDPR compliance concerns
- Missing transparency elements
- Incomplete policy statements
ℹ️ INFO: Informational messages about negated terms
- Terms found only with negation (e.g., "does not collect personal data")
- These may be intentional - use --hide-info to suppress

📦 Dependencies

typer[all]>=0.12 - CLI framework
spacy>=3.7.5 - Natural language processing
negspacy>=1.0.4 - Negation detection
PyPDF2>=3.0 - PDF text extraction
pdfplumber>=0.10 - PDF text extraction
PyMuPDF>=1.23 - PDF text extraction
nltk>=3.8 - Natural language toolkit
spacy-lookups-data>=1.0 - spaCy language data

📊 Example Output

Standard Output (with INFO messages)

============================================================
COMPLIANCE REVIEW RESULTS
============================================================

❌ ERRORS (4):
  1. [document.pdf] (Section 4) Missing content for Annex IV section 4 (performance metrics).
  2. [document.pdf] (Section 6) Missing content for Annex IV section 6 (changes and versions).
  3. [document.pdf] (Section 7) Missing content for Annex IV section 7 (standards applied).
  4. [document.pdf] (Section 8) Missing content for Annex IV section 8 (compliance declaration).

⚠️  WARNINGS (2):
  1. [document.pdf] Personal data use without mention of consent or lawful basis (possible GDPR issue).
  2. [document.pdf] No mention of data deletion or subject access rights (check GDPR compliance).

ℹ️  INFO (3):
  1. [document.pdf] Term 'personal data' negated on page 1.
  2. [document.pdf] Term 'post-market monitoring' negated on page 1.
  3. [document.pdf] Term 'authentication' negated on page 1.

     Note: These informational messages indicate terms found only with negation.
     This may be intentional - please verify if the negation is correct.
     Use --hide-info flag to suppress these messages.

Found 9 total issue(s): 4 errors, 2 warnings, 3 info

Output with `--hide-info` flag

============================================================
COMPLIANCE REVIEW RESULTS
============================================================

❌ ERRORS (4):
  1. [document.pdf] (Section 4) Missing content for Annex IV section 4 (performance metrics).
  2. [document.pdf] (Section 6) Missing content for Annex IV section 6 (changes and versions).
  3. [document.pdf] (Section 7) Missing content for Annex IV section 7 (standards applied).
  4. [document.pdf] (Section 8) Missing content for Annex IV section 8 (compliance declaration).

⚠️  WARNINGS (2):
  1. [document.pdf] Personal data use without mention of consent or lawful basis (possible GDPR issue).
  2. [document.pdf] No mention of data deletion or subject access rights (check GDPR compliance).

Found 6 total issue(s): 4 errors, 2 warnings

📄 License

MIT License - see LICENSE file for details.

Third-Party Licenses

This package uses several third-party libraries. See THIRD_PARTY_LICENSES.md for the complete list of licenses.

Key dependencies:

spaCy: MIT License
negspaCy: MIT License
PyPDF2: BSD 3-Clause License
pdfplumber: MIT License
PyMuPDF: GNU Affero General Public License v3.0
NLTK: Apache License 2.0
Typer: MIT License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.1

Aug 5, 2025

1.0.0

Aug 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annex4nlp-1.0.1.tar.gz (26.4 kB view details)

Uploaded Aug 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

annex4nlp-1.0.1-py3-none-any.whl (20.8 kB view details)

Uploaded Aug 5, 2025 Python 3

File details

Details for the file annex4nlp-1.0.1.tar.gz.

File metadata

Download URL: annex4nlp-1.0.1.tar.gz
Upload date: Aug 5, 2025
Size: 26.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for annex4nlp-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ca5cc5570000a37d080ec6a9aa40079041f95c83269f4cfc151f0b9208df1c15`
MD5	`7308a80a797ab04dafd216c8953e22ea`
BLAKE2b-256	`08ca2b497f8d2f72bffd18d977ac235b4d3e3eca78e02f1c35bf0f1e9475a4d3`

See more details on using hashes here.

File details

Details for the file annex4nlp-1.0.1-py3-none-any.whl.

File metadata

Download URL: annex4nlp-1.0.1-py3-none-any.whl
Upload date: Aug 5, 2025
Size: 20.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for annex4nlp-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4444d6a66452a438904fe24fe53d9e2c8b8a1894565cf4ed63428b5e85b1889a`
MD5	`45a03a1bc6d1808240b7901b497f826e`
BLAKE2b-256	`7cf0021e1fec2df0179fa61ca3705a6d1ccd8c54e847df0c323b70814e8f0616`

See more details on using hashes here.

annex4nlp 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

annex4nlp

🚀 Quick Start

✨ Features

Installation

📖 Usage

CLI Usage

Python API

Single Document Analysis

Text Analysis

API with Info Filtering

🔍 Analysis Capabilities

Annex IV Compliance

GDPR Compliance

Contradiction Detection

Advanced NLP Features

Issue Types

📦 Dependencies

📊 Example Output

Standard Output (with INFO messages)

Output with --hide-info flag

📄 License

Third-Party Licenses

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Output with `--hide-info` flag