Skip to main content

MCP server for parsing and analyzing Portuguese SAF-T XML files

Project description

SAF-T MCP Server

Parse and analyze Portuguese SAF-T tax files with AI assistants

License: MIT Python 3.11+ SAF-T PT 1.04_01

Getting Started · Available Tools · Configuration

Versao em Portugues · 13 tools · 152 tests


A Model Context Protocol (MCP) server that enables AI assistants like Claude, Cursor, and Windsurf to load, validate, and analyze Portuguese SAF-T (Standard Audit File for Tax Purposes) XML files. Load a SAF-T file and immediately query invoices, get revenue summaries, VAT breakdowns, and validate compliance with Portuguese tax rules.

What is SAF-T PT?

SAF-T PT is a mandatory XML file that all Portuguese companies must be able to export from their accounting/billing software. It contains the company's invoices, payments, customers, products, tax entries, and more. This MCP server turns that XML into a queryable data source for AI assistants.


Quick Start

Prerequisites

  • Python 3.11+ and uv (recommended) or pip
  • A SAF-T PT XML file exported from any Portuguese billing/accounting software (PHC, Sage, Primavera, etc.)

1. Clone and install

git clone https://github.com/bybloom-ai/saft-mcp.git
cd saft-mcp
uv sync

2. Add to your AI assistant

Claude Code
claude mcp add saft-mcp -- /path/to/saft-mcp/.venv/bin/python -m saft_mcp
Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "saft-mcp": {
      "command": "/path/to/saft-mcp/.venv/bin/python",
      "args": ["-m", "saft_mcp"]
    }
  }
}
Cursor / VS Code / Other MCP clients

Add to your MCP client configuration:

{
  "mcpServers": {
    "saft-mcp": {
      "command": "/path/to/saft-mcp/.venv/bin/python",
      "args": ["-m", "saft_mcp"]
    }
  }
}

3. Start using it

Ask your AI assistant:

"Load my SAF-T file at ~/Documents/saft_2025.xml and give me a revenue summary"

The server will parse the file, extract all invoices and tax data, and make it available for querying through natural conversation.


Available Tools

saft_load

Load and parse a SAF-T PT XML file. This must be called first before using any other tool.

Parameter Type Description
file_path string Path to the SAF-T XML file

Returns company name, NIF, fiscal period, SAF-T version, and record counts (customers, products, invoices, payments).

Handles Windows-1252 and UTF-8 encodings, BOM stripping, and automatic namespace detection. Files under 50 MB are parsed with full DOM; larger files use streaming.


saft_validate

Validate the loaded file against the official XSD schema and Portuguese business rules.

Parameter Type Default Description
rules list[string] all Specific rules to check

Available rules:

Rule What it checks
xsd XML structure against SAF-T PT 1.04_01 XSD schema
numbering Sequential invoice numbering within each series
nif NIF (tax ID) mod-11 check digit validation
tax_codes Tax percentages match known Portuguese VAT rates
atcud ATCUD unique document codes are present and well-formed
hash_chain Hash continuity across invoice sequences
control_totals Calculated totals match declared control totals

Returns results with severity (error/warning), location, and fix suggestions.


saft_summary

Generate an executive summary of the loaded file. No parameters needed.

Returns:

  • Revenue totals (gross, credit notes, net)
  • Invoice and credit note counts
  • VAT breakdown by rate
  • Top 10 customers by revenue
  • Document type distribution (FT, FR, NC, ND, FS)

saft_query_invoices

Search and filter invoices with full pagination.

Parameter Type Default Description
date_from string - Start date (YYYY-MM-DD)
date_to string - End date (YYYY-MM-DD)
customer_nif string - Filter by tax ID (partial match)
customer_name string - Filter by name (case-insensitive, partial)
doc_type string - FT, FR, NC, ND, or FS
min_amount number - Minimum gross total
max_amount number - Maximum gross total
status string - N (normal), A (cancelled), F (invoiced)
limit integer 50 Results per page (max 500)
offset integer 0 Pagination offset

Returns matching invoices with document number, date, type, customer, amounts, status, and line count.


saft_tax_summary

Generate a VAT analysis grouped by rate, month, or document type.

Parameter Type Default Description
date_from string - Start date (YYYY-MM-DD)
date_to string - End date (YYYY-MM-DD)
group_by string rate Group by rate, month, or doc_type

Returns taxable base, VAT amount, and gross total per group, plus overall totals.


saft_query_customers

Search and filter customer master data with revenue enrichment.

Parameter Type Default Description
name string - Company name (case-insensitive, partial)
nif string - Tax ID (partial match)
city string - Billing city (case-insensitive, partial)
country string - Country code (exact, e.g. "PT", "ES")
limit integer 50 Results per page (max 500)
offset integer 0 Pagination offset

Returns customers with invoice count and total revenue per customer.


saft_query_products

Search and filter the product catalog with sales statistics.

Parameter Type Default Description
description string - Product description (case-insensitive, partial)
code string - Product code (partial match)
product_type string - P (product), S (service), O (other), I (import), E (export)
group string - Product group (case-insensitive, partial)
limit integer 50 Results per page (max 500)
offset integer 0 Pagination offset

Returns products with times sold, total quantity, and total revenue.


saft_get_invoice

Get full detail for a single invoice including all line items.

Parameter Type Description
invoice_no string Exact invoice number (e.g. "FR 2025A15/90")

Returns complete invoice with header, document totals, special regimes, and all lines with product, quantity, price, tax, exemptions, and references.


saft_anomaly_detect

Detect suspicious patterns and irregularities in the loaded file.

Parameter Type Default Description
checks list[string] all Specific checks to run

Available checks:

Check What it detects
duplicate_invoices Same customer + amount + date combinations
numbering_gaps Missing sequential numbers within each series
weekend_invoices Invoices issued on Saturdays or Sundays
unusual_amounts Invoice amounts > 3 standard deviations from the mean
cancelled_ratio High cancellation rates per series
zero_amount Invoices with zero gross total

Returns anomalies with type, severity, description, and affected documents.


saft_compare

Compare the loaded SAF-T file against a second file (e.g. month-over-month, year-over-year).

Parameter Type Default Description
file_path string - Path to the second SAF-T XML file
metrics list[string] all Metrics to compare

Available metrics: revenue, customers, products, doc_types, vat.

Returns period labels and a changes dict with before/after/delta per metric. Includes top new/lost customers, top movers, and percentage changes.


saft_aging

Compute accounts receivable aging from invoices and payments.

Parameter Type Default Description
reference_date string today Date to age from (YYYY-MM-DD)
buckets list[int] [30,60,90,120] Aging bucket boundaries in days

Returns per-customer aging with amounts in each bucket, sorted by total outstanding. Uses FIFO allocation of payments against invoices.


saft_export

Export data to CSV files for use in spreadsheets or other tools.

Parameter Type Default Description
export_type string - invoices, customers, products, tax_summary, or anomalies
file_path string - Output CSV file path
filters dict - Optional filters (same as corresponding query tool)

Returns file path, row count, and column names.


saft_stats

Generate a statistical overview of invoicing data.

Parameter Type Default Description
date_from string - Start date (YYYY-MM-DD)
date_to string - End date (YYYY-MM-DD)

Returns invoice statistics (mean, median, std deviation), daily/weekly/monthly distributions, customer concentration (Pareto analysis), and top/bottom invoices.


Typical Workflow

1. saft_load            -> Parse the XML file
2. saft_validate        -> Check compliance (XSD + business rules)
3. saft_summary         -> Get the big picture (revenue, top customers, VAT)
4. saft_query_invoices  -> Drill into specific invoices
5. saft_get_invoice     -> Full detail for a single invoice
6. saft_tax_summary     -> VAT analysis by rate, month, or doc type
7. saft_anomaly_detect  -> Flag suspicious patterns
8. saft_stats           -> Statistical distributions and trends
9. saft_compare         -> Diff against another SAF-T file
10. saft_export         -> Export results to CSV

Example questions you can ask after loading a file:

  • "How much revenue did the company make this year?"
  • "Show me all credit notes above 500 euros"
  • "What's the monthly VAT breakdown?"
  • "Are there any validation errors in this file?"
  • "List invoices for customer XPTO in Q3"
  • "What percentage of revenue comes from the top 5 customers?"
  • "Are there any suspicious patterns or anomalies?"
  • "Compare this file against last month's SAF-T"
  • "What's the accounts receivable aging?"
  • "Export all invoices to CSV"

Configuration

All settings are configurable via environment variables with the SAFT_MCP_ prefix:

Variable Default Description
SAFT_MCP_STREAMING_THRESHOLD_BYTES 52428800 (50 MB) Files above this use streaming parser
SAFT_MCP_MAX_FILE_SIZE_BYTES 524288000 (500 MB) Maximum file size accepted
SAFT_MCP_SESSION_TIMEOUT_SECONDS 1800 (30 min) Session expiry after inactivity
SAFT_MCP_MAX_CONCURRENT_SESSIONS 5 Maximum simultaneous loaded files
SAFT_MCP_DEFAULT_QUERY_LIMIT 50 Default results per page
SAFT_MCP_MAX_QUERY_LIMIT 500 Maximum results per page
SAFT_MCP_LOG_LEVEL INFO Logging level

Architecture

AI Assistant (Claude, Cursor, etc.)
        |
        | MCP Protocol (stdio)
        v
+------------------------------------------+
|           saft-mcp server                |
|                                          |
|  server.py       FastMCP entry point     |
|  state.py        Session management      |
|                                          |
|  parser/                                 |
|    detector.py   Namespace detection     |
|    encoding.py   Charset handling        |
|    full_parser.py   DOM parse (< 50 MB)  |
|    models.py     Pydantic data models    |
|                                          |
|  tools/                                  |
|    load.py            saft_load          |
|    validate.py        saft_validate      |
|    summary.py         saft_summary       |
|    query_invoices.py  saft_query_invoices|
|    query_customers.py saft_query_customer|
|    query_products.py  saft_query_products|
|    get_invoice.py     saft_get_invoice   |
|    tax_summary.py     saft_tax_summary   |
|    anomaly_detect.py  saft_anomaly_detect|
|    compare.py         saft_compare       |
|    aging.py           saft_aging         |
|    export.py          saft_export        |
|    stats.py           saft_stats         |
|                                          |
|  validators/                             |
|    xsd_validator.py   XSD 1.04_01        |
|    business_rules.py  Numbering, totals  |
|    nif.py             NIF mod-11         |
|    hash_chain.py      Hash continuity    |
|                                          |
|  schemas/                                |
|    saftpt1.04_01.xsd  Official XSD       |
+------------------------------------------+

Key design decisions:

  • All monetary values use Decimal to avoid floating-point rounding in tax calculations
  • lxml for XML parsing, with automatic XSD 1.1 feature stripping (the official Portuguese XSD uses xs:assert and xs:all with unbounded children, which lxml's XSD 1.0 engine cannot handle natively)
  • Pydantic v2 models validated against real PHC Corporate exports
  • Namespace auto-detection by scanning the first 4 KB of the file (never hardcoded)
  • Windows-1252 encoding handled natively via the XML declaration

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests (152 tests)
pytest

# Lint
ruff check src/ tests/

# Format
ruff format src/ tests/

# Type check
mypy src/

Project structure

saft-mcp/
  src/saft_mcp/         # Source code
    server.py           # FastMCP entry point, tool registration
    config.py           # Settings (pydantic-settings, env vars)
    state.py            # Session store, parsed file state
    exceptions.py       # SaftError hierarchy
    parser/             # XML parsing (encoding, detection, models)
    tools/              # One file per MCP tool
    validators/         # XSD, business rules, NIF, hash chain
    schemas/            # Official XSD file
  tests/                # Mirrors src/ structure
  pyproject.toml        # Project config (hatch build, ruff, mypy, pytest)

Roadmap

  • saft_query_customers -- search and filter customer master data
  • saft_query_products -- search and filter product catalog
  • saft_get_invoice -- full invoice detail with line items
  • saft_anomaly_detect -- flag duplicate invoices, numbering gaps, unusual amounts
  • saft_compare -- diff two SAF-T files (e.g. month-over-month)
  • saft_aging -- accounts receivable aging analysis
  • saft_export -- export data to CSV
  • saft_stats -- statistical overview and distributions
  • Streaming parser for large files (>= 50 MB)
  • Accounting SAF-T support (journal entries, general ledger, trial balance)
  • saft_trial_balance -- generate trial balance from accounting data
  • saft_ies_prepare -- pre-fill IES annual tax return fields
  • saft_cross_check -- cross-reference invoicing vs accounting SAF-T
  • PyPI package (pip install saft-mcp)
  • GitHub Actions CI (pytest + ruff + mypy)

Supported SAF-T versions

  • SAF-T PT 1.04_01 (current Portuguese standard)

Tested with real exports from PHC Corporate. Should work with SAF-T files from any compliant Portuguese software (Sage, Primavera, PHC, Moloni, InvoiceXpress, etc.).


License

MIT


Built by bybloom.ai, a business unit of Bloomidea

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saft_mcp-0.1.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saft_mcp-0.1.0-py3-none-any.whl (60.6 kB view details)

Uploaded Python 3

File details

Details for the file saft_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: saft_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for saft_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 815b8123aa54db7ae7ecef6472493c43c0f18bc34b737fc9f24e9ee44c833f5e
MD5 6128c031d3a497033366ce97f6231b36
BLAKE2b-256 a85f1297d2a2ad2716debfc97e9bf2db849dfa5ac98603255c6360621fd8b8d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for saft_mcp-0.1.0.tar.gz:

Publisher: publish.yml on bybloom-ai/saft-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file saft_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: saft_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 60.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for saft_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8dd91736f8f3d18f599ac72c815e4dff584d1b2c9eb0c1d0cbfe4518f93cf56
MD5 c860289c3fd16c2963906523ba605e76
BLAKE2b-256 83e7c98ec55f2831298791e1b8d4ea2d27c7fd59f242d8009febe53782a9c363

See more details on using hashes here.

Provenance

The following attestation bundles were made for saft_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on bybloom-ai/saft-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page