MCP server for parsing and analyzing Portuguese SAF-T XML files
Project description
SAF-T MCP Server
Parse and analyze Portuguese SAF-T tax files with AI assistants
Getting Started · Available Tools · Configuration
Versao em Portugues · 13 tools · 152 tests
A Model Context Protocol (MCP) server that enables AI assistants like Claude, Cursor, and Windsurf to load, validate, and analyze Portuguese SAF-T (Standard Audit File for Tax Purposes) XML files. Load a SAF-T file and immediately query invoices, get revenue summaries, VAT breakdowns, and validate compliance with Portuguese tax rules.
What is SAF-T PT?
SAF-T PT is a mandatory XML file that all Portuguese companies must be able to export from their accounting/billing software. It contains the company's invoices, payments, customers, products, tax entries, and more. This MCP server turns that XML into a queryable data source for AI assistants.
Quick Start
Prerequisites
- Python 3.11+ and uv (recommended) or pip
- A SAF-T PT XML file exported from any Portuguese billing/accounting software (PHC, Sage, Primavera, etc.)
1. Clone and install
git clone https://github.com/bybloom-ai/saft-mcp.git
cd saft-mcp
uv sync
2. Add to your AI assistant
Claude Code
claude mcp add saft-mcp -- /path/to/saft-mcp/.venv/bin/python -m saft_mcp
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"saft-mcp": {
"command": "/path/to/saft-mcp/.venv/bin/python",
"args": ["-m", "saft_mcp"]
}
}
}
Cursor / VS Code / Other MCP clients
Add to your MCP client configuration:
{
"mcpServers": {
"saft-mcp": {
"command": "/path/to/saft-mcp/.venv/bin/python",
"args": ["-m", "saft_mcp"]
}
}
}
3. Start using it
Ask your AI assistant:
"Load my SAF-T file at ~/Documents/saft_2025.xml and give me a revenue summary"
The server will parse the file, extract all invoices and tax data, and make it available for querying through natural conversation.
Available Tools
saft_load
Load and parse a SAF-T PT XML file. This must be called first before using any other tool.
| Parameter | Type | Description |
|---|---|---|
file_path |
string | Path to the SAF-T XML file |
Returns company name, NIF, fiscal period, SAF-T version, and record counts (customers, products, invoices, payments).
Handles Windows-1252 and UTF-8 encodings, BOM stripping, and automatic namespace detection. Files under 50 MB are parsed with full DOM; larger files use streaming.
saft_validate
Validate the loaded file against the official XSD schema and Portuguese business rules.
| Parameter | Type | Default | Description |
|---|---|---|---|
rules |
list[string] | all | Specific rules to check |
Available rules:
| Rule | What it checks |
|---|---|
xsd |
XML structure against SAF-T PT 1.04_01 XSD schema |
numbering |
Sequential invoice numbering within each series |
nif |
NIF (tax ID) mod-11 check digit validation |
tax_codes |
Tax percentages match known Portuguese VAT rates |
atcud |
ATCUD unique document codes are present and well-formed |
hash_chain |
Hash continuity across invoice sequences |
control_totals |
Calculated totals match declared control totals |
Returns results with severity (error/warning), location, and fix suggestions.
saft_summary
Generate an executive summary of the loaded file. No parameters needed.
Returns:
- Revenue totals (gross, credit notes, net)
- Invoice and credit note counts
- VAT breakdown by rate
- Top 10 customers by revenue
- Document type distribution (FT, FR, NC, ND, FS)
saft_query_invoices
Search and filter invoices with full pagination.
| Parameter | Type | Default | Description |
|---|---|---|---|
date_from |
string | - | Start date (YYYY-MM-DD) |
date_to |
string | - | End date (YYYY-MM-DD) |
customer_nif |
string | - | Filter by tax ID (partial match) |
customer_name |
string | - | Filter by name (case-insensitive, partial) |
doc_type |
string | - | FT, FR, NC, ND, or FS |
min_amount |
number | - | Minimum gross total |
max_amount |
number | - | Maximum gross total |
status |
string | - | N (normal), A (cancelled), F (invoiced) |
limit |
integer | 50 | Results per page (max 500) |
offset |
integer | 0 | Pagination offset |
Returns matching invoices with document number, date, type, customer, amounts, status, and line count.
saft_tax_summary
Generate a VAT analysis grouped by rate, month, or document type.
| Parameter | Type | Default | Description |
|---|---|---|---|
date_from |
string | - | Start date (YYYY-MM-DD) |
date_to |
string | - | End date (YYYY-MM-DD) |
group_by |
string | rate |
Group by rate, month, or doc_type |
Returns taxable base, VAT amount, and gross total per group, plus overall totals.
saft_query_customers
Search and filter customer master data with revenue enrichment.
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
string | - | Company name (case-insensitive, partial) |
nif |
string | - | Tax ID (partial match) |
city |
string | - | Billing city (case-insensitive, partial) |
country |
string | - | Country code (exact, e.g. "PT", "ES") |
limit |
integer | 50 | Results per page (max 500) |
offset |
integer | 0 | Pagination offset |
Returns customers with invoice count and total revenue per customer.
saft_query_products
Search and filter the product catalog with sales statistics.
| Parameter | Type | Default | Description |
|---|---|---|---|
description |
string | - | Product description (case-insensitive, partial) |
code |
string | - | Product code (partial match) |
product_type |
string | - | P (product), S (service), O (other), I (import), E (export) |
group |
string | - | Product group (case-insensitive, partial) |
limit |
integer | 50 | Results per page (max 500) |
offset |
integer | 0 | Pagination offset |
Returns products with times sold, total quantity, and total revenue.
saft_get_invoice
Get full detail for a single invoice including all line items.
| Parameter | Type | Description |
|---|---|---|
invoice_no |
string | Exact invoice number (e.g. "FR 2025A15/90") |
Returns complete invoice with header, document totals, special regimes, and all lines with product, quantity, price, tax, exemptions, and references.
saft_anomaly_detect
Detect suspicious patterns and irregularities in the loaded file.
| Parameter | Type | Default | Description |
|---|---|---|---|
checks |
list[string] | all | Specific checks to run |
Available checks:
| Check | What it detects |
|---|---|
duplicate_invoices |
Same customer + amount + date combinations |
numbering_gaps |
Missing sequential numbers within each series |
weekend_invoices |
Invoices issued on Saturdays or Sundays |
unusual_amounts |
Invoice amounts > 3 standard deviations from the mean |
cancelled_ratio |
High cancellation rates per series |
zero_amount |
Invoices with zero gross total |
Returns anomalies with type, severity, description, and affected documents.
saft_compare
Compare the loaded SAF-T file against a second file (e.g. month-over-month, year-over-year).
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path |
string | - | Path to the second SAF-T XML file |
metrics |
list[string] | all | Metrics to compare |
Available metrics: revenue, customers, products, doc_types, vat.
Returns period labels and a changes dict with before/after/delta per metric. Includes top new/lost customers, top movers, and percentage changes.
saft_aging
Compute accounts receivable aging from invoices and payments.
| Parameter | Type | Default | Description |
|---|---|---|---|
reference_date |
string | today | Date to age from (YYYY-MM-DD) |
buckets |
list[int] | [30,60,90,120] | Aging bucket boundaries in days |
Returns per-customer aging with amounts in each bucket, sorted by total outstanding. Uses FIFO allocation of payments against invoices.
saft_export
Export data to CSV files for use in spreadsheets or other tools.
| Parameter | Type | Default | Description |
|---|---|---|---|
export_type |
string | - | invoices, customers, products, tax_summary, or anomalies |
file_path |
string | - | Output CSV file path |
filters |
dict | - | Optional filters (same as corresponding query tool) |
Returns file path, row count, and column names.
saft_stats
Generate a statistical overview of invoicing data.
| Parameter | Type | Default | Description |
|---|---|---|---|
date_from |
string | - | Start date (YYYY-MM-DD) |
date_to |
string | - | End date (YYYY-MM-DD) |
Returns invoice statistics (mean, median, std deviation), daily/weekly/monthly distributions, customer concentration (Pareto analysis), and top/bottom invoices.
Typical Workflow
1. saft_load -> Parse the XML file
2. saft_validate -> Check compliance (XSD + business rules)
3. saft_summary -> Get the big picture (revenue, top customers, VAT)
4. saft_query_invoices -> Drill into specific invoices
5. saft_get_invoice -> Full detail for a single invoice
6. saft_tax_summary -> VAT analysis by rate, month, or doc type
7. saft_anomaly_detect -> Flag suspicious patterns
8. saft_stats -> Statistical distributions and trends
9. saft_compare -> Diff against another SAF-T file
10. saft_export -> Export results to CSV
Example questions you can ask after loading a file:
- "How much revenue did the company make this year?"
- "Show me all credit notes above 500 euros"
- "What's the monthly VAT breakdown?"
- "Are there any validation errors in this file?"
- "List invoices for customer XPTO in Q3"
- "What percentage of revenue comes from the top 5 customers?"
- "Are there any suspicious patterns or anomalies?"
- "Compare this file against last month's SAF-T"
- "What's the accounts receivable aging?"
- "Export all invoices to CSV"
Configuration
All settings are configurable via environment variables with the SAFT_MCP_ prefix:
| Variable | Default | Description |
|---|---|---|
SAFT_MCP_STREAMING_THRESHOLD_BYTES |
52428800 (50 MB) | Files above this use streaming parser |
SAFT_MCP_MAX_FILE_SIZE_BYTES |
524288000 (500 MB) | Maximum file size accepted |
SAFT_MCP_SESSION_TIMEOUT_SECONDS |
1800 (30 min) | Session expiry after inactivity |
SAFT_MCP_MAX_CONCURRENT_SESSIONS |
5 | Maximum simultaneous loaded files |
SAFT_MCP_DEFAULT_QUERY_LIMIT |
50 | Default results per page |
SAFT_MCP_MAX_QUERY_LIMIT |
500 | Maximum results per page |
SAFT_MCP_LOG_LEVEL |
INFO | Logging level |
Architecture
AI Assistant (Claude, Cursor, etc.)
|
| MCP Protocol (stdio)
v
+------------------------------------------+
| saft-mcp server |
| |
| server.py FastMCP entry point |
| state.py Session management |
| |
| parser/ |
| detector.py Namespace detection |
| encoding.py Charset handling |
| full_parser.py DOM parse (< 50 MB) |
| models.py Pydantic data models |
| |
| tools/ |
| load.py saft_load |
| validate.py saft_validate |
| summary.py saft_summary |
| query_invoices.py saft_query_invoices|
| query_customers.py saft_query_customer|
| query_products.py saft_query_products|
| get_invoice.py saft_get_invoice |
| tax_summary.py saft_tax_summary |
| anomaly_detect.py saft_anomaly_detect|
| compare.py saft_compare |
| aging.py saft_aging |
| export.py saft_export |
| stats.py saft_stats |
| |
| validators/ |
| xsd_validator.py XSD 1.04_01 |
| business_rules.py Numbering, totals |
| nif.py NIF mod-11 |
| hash_chain.py Hash continuity |
| |
| schemas/ |
| saftpt1.04_01.xsd Official XSD |
+------------------------------------------+
Key design decisions:
- All monetary values use
Decimalto avoid floating-point rounding in tax calculations - lxml for XML parsing, with automatic XSD 1.1 feature stripping (the official Portuguese XSD uses
xs:assertandxs:allwith unbounded children, which lxml's XSD 1.0 engine cannot handle natively) - Pydantic v2 models validated against real PHC Corporate exports
- Namespace auto-detection by scanning the first 4 KB of the file (never hardcoded)
- Windows-1252 encoding handled natively via the XML declaration
Development
# Install with dev dependencies
uv sync --extra dev
# Run tests (152 tests)
pytest
# Lint
ruff check src/ tests/
# Format
ruff format src/ tests/
# Type check
mypy src/
Project structure
saft-mcp/
src/saft_mcp/ # Source code
server.py # FastMCP entry point, tool registration
config.py # Settings (pydantic-settings, env vars)
state.py # Session store, parsed file state
exceptions.py # SaftError hierarchy
parser/ # XML parsing (encoding, detection, models)
tools/ # One file per MCP tool
validators/ # XSD, business rules, NIF, hash chain
schemas/ # Official XSD file
tests/ # Mirrors src/ structure
pyproject.toml # Project config (hatch build, ruff, mypy, pytest)
Roadmap
-
saft_query_customers-- search and filter customer master data -
saft_query_products-- search and filter product catalog -
saft_get_invoice-- full invoice detail with line items -
saft_anomaly_detect-- flag duplicate invoices, numbering gaps, unusual amounts -
saft_compare-- diff two SAF-T files (e.g. month-over-month) -
saft_aging-- accounts receivable aging analysis -
saft_export-- export data to CSV -
saft_stats-- statistical overview and distributions - Streaming parser for large files (>= 50 MB)
- Accounting SAF-T support (journal entries, general ledger, trial balance)
-
saft_trial_balance-- generate trial balance from accounting data -
saft_ies_prepare-- pre-fill IES annual tax return fields -
saft_cross_check-- cross-reference invoicing vs accounting SAF-T - PyPI package (
pip install saft-mcp) - GitHub Actions CI (pytest + ruff + mypy)
Supported SAF-T versions
- SAF-T PT 1.04_01 (current Portuguese standard)
Tested with real exports from PHC Corporate. Should work with SAF-T files from any compliant Portuguese software (Sage, Primavera, PHC, Moloni, InvoiceXpress, etc.).
License
MIT
Built by bybloom.ai, a business unit of Bloomidea
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file saft_mcp-0.1.0.tar.gz.
File metadata
- Download URL: saft_mcp-0.1.0.tar.gz
- Upload date:
- Size: 69.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
815b8123aa54db7ae7ecef6472493c43c0f18bc34b737fc9f24e9ee44c833f5e
|
|
| MD5 |
6128c031d3a497033366ce97f6231b36
|
|
| BLAKE2b-256 |
a85f1297d2a2ad2716debfc97e9bf2db849dfa5ac98603255c6360621fd8b8d0
|
Provenance
The following attestation bundles were made for saft_mcp-0.1.0.tar.gz:
Publisher:
publish.yml on bybloom-ai/saft-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
saft_mcp-0.1.0.tar.gz -
Subject digest:
815b8123aa54db7ae7ecef6472493c43c0f18bc34b737fc9f24e9ee44c833f5e - Sigstore transparency entry: 992565777
- Sigstore integration time:
-
Permalink:
bybloom-ai/saft-mcp@93429e3d5c873eff743a696e2e8a849edede610f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/bybloom-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@93429e3d5c873eff743a696e2e8a849edede610f -
Trigger Event:
release
-
Statement type:
File details
Details for the file saft_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: saft_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 60.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8dd91736f8f3d18f599ac72c815e4dff584d1b2c9eb0c1d0cbfe4518f93cf56
|
|
| MD5 |
c860289c3fd16c2963906523ba605e76
|
|
| BLAKE2b-256 |
83e7c98ec55f2831298791e1b8d4ea2d27c7fd59f242d8009febe53782a9c363
|
Provenance
The following attestation bundles were made for saft_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on bybloom-ai/saft-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
saft_mcp-0.1.0-py3-none-any.whl -
Subject digest:
f8dd91736f8f3d18f599ac72c815e4dff584d1b2c9eb0c1d0cbfe4518f93cf56 - Sigstore transparency entry: 992565795
- Sigstore integration time:
-
Permalink:
bybloom-ai/saft-mcp@93429e3d5c873eff743a696e2e8a849edede610f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/bybloom-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@93429e3d5c873eff743a696e2e8a849edede610f -
Trigger Event:
release
-
Statement type: