Skip to main content

FinLang: a deterministic, auditable DSL for financial rules

Project description

FinLang โ€” The Financial Rules Engine

Deterministic. Auditable. Global.
Designed for explainable processing in regulated environments.

PyPI version License: AGPL v3 Build Status Python versions


๐ŸŒ Overview

FinLang is a domain-specific language (DSL) and high-performance CLI engine for financial transaction processing.
It replaces opaque machine-learning categorization with transparent, deterministic rules โ€” delivering explainability, auditability, and global compatibility.

Built for audit-friendly logic and deterministic processing.
A deterministic alternative where explainability and reproducibility matter.


๐Ÿ“ The FinLang DSL

FinLang rules are human-readable, Git-friendly, and designed for precision.
The engine processes rules top-to-bottom; the last matching rule sets the category, while flags accumulate.

# Example: Basic categorization and flagging
rule "GROCERIES: Tesco" {
  match:
    - counterparty ~ "*TESCO*"
  set:
    - category = "Groceries"
    - flags += "Supermarket"
}

# Example: Numeric range and exact match
rule "TRAVEL: High Value Flight" {
  match:
    - counterparty == "BRITISH AIRWAYS"
    - amount in -5000.00 .. -500.00
  set:
    - category = "Travel"
    - flags += "HighValue"
}

โš™๏ธ Key Features (v0.7.8)

Feature Description
Deterministic DSL Human-readable .fin rules language โ€” explainable logic, Git-friendly.
High-Performance Engine Vectorized core (Pandas + NumPy + PyArrow) โ€” ~217K rows/sec FastIO validated throughput on the integrity harness.
Dual Backend Standard (Engine: c) or FastIO (Engine: pyarrow) with automatic fallback.
Growth Loop Automated Discover โ†’ Suggest โ†’ Categorize workflow โ€” 97.8% success on addressable patterns.
Global I18n Support US/UK/EU/Commonwealth formats, ยฃ โ‚ฌ $ ยฅ โ‚น stripping, localized decimals/dates/delimiters.
Audit Trail System Every decision logged (before/after state diffs); stateless for reproducibility.
Exclude Marker Boolean exclude column โ€” rule-driven, auditable, supports blacklist/whitelist exception patterns.
CR/DR Semantics Case-insensitive CR/DR (with or without space), accounting negatives (123.45), trailing minus 123.45-. v0.7.7 fixes a latent bug on no-space CR/DR formats.
Amount Synthesis Auto-computes amount = abs(credit) โ€“ abs(debit) across 9 edge cases.
Strict Parsing Locale-aware normalization with configurable thresholds (--strict-parse).
Flag Integrity Append-only (flags +=) with deterministic deduplication.
Integrity Verification Built-in --verify and --verify-full โ€” SHA-256 fingerprinting of immutable fields with optional artifact output. See docs/verify.md.
ML Reconciliation (v0.7.8) --reconcile produces a row-by-row mismatch report against an external (typically ML) categorisation, with rule attribution and audit reason. Optional self-contained HTML report via --reconcile-html. See docs/reconciliation.md.

๐Ÿ“ฆ Installation

Requirements: Python 3.10โ€”3.14

From PyPI (Recommended):

pip install finlang

With Fast I/O (PyArrow):

pip install "finlang[fastio]"

(Enables --fastio for accelerated CSV I/O.)

From Source (Development):

git clone https://github.com/FinLang-Ltd/finlang.git
cd finlang
pip install -e .[fastio]

๐Ÿš€ Quick Start โ€” The 5-Step Growth Loop

1๏ธโƒฃ Initial Categorization

finlang --input transactions.csv --output baseline.csv \
  --rules my_rules.fin --include-pack retail,transport

2๏ธโƒฃ Discover Gaps

finlang-discover --input baseline.csv \
  --candidates candidates.csv --all-candidates all_candidates.csv \
  --min-count 5

3๏ธโƒฃ Suggest Rules (Exact Mode Recommended)

finlang-suggest --input candidates.csv --output suggested_rules.fin \
  --rules my_rules.fin --emit-match exact

4๏ธโƒฃ Merge and Re-run

cat my_rules.fin suggested_rules.fin > merged.fin
finlang --input transactions.csv --output improved.csv \
  --rules merged.fin --include-pack retail,transport

โœ… Expected Result: 5โ€“10% coverage improvement; zero duplicates in exact mode.


๐Ÿ“Š Performance Benchmarks (v0.7.7)

Measured with --audit-mode none (max throughput) on Intel i7-12700T, 48GB RAM, Windows 11, Python 3.13.7, PyArrow 21.0.

Dataset Test Time (s) Rows/sec Notes
100K (UK Synthetic) Growth Loop 2.54 39,370 โœ… Baseline (121 rules)
100K (after Growth Loop) Growth Loop 4.96 20,161 โœ… +6.3ร— rules โ†’ โ‰ˆ 2ร— slower (764 rules)
5M ร— 50 cols Benchmark Harness 179.27 27,900 โœ… Enterprise validation, 3-run average
20M ร— 6 cols Integrity Test (FastIO) ~90 217,068 โœ… Engine throughput, full SHA-256 verified

v0.7.7 improvement: Hot-path bug fix in _to_number removed an unnecessary \b word boundary that was both producing wrong results on no-space CR/DR formats AND costing measurable runtime. The fix delivered +30-50% throughput on the integrity harness vs v0.7.6, taking standard mode to ~180K rows/sec and FastIO to ~217K rows/sec.

Cumulative v0.6.4 โ†’ v0.7.7: -14% runtime, +16% throughput on the enterprise harness (5M ร— 50).

Audit Overhead: Enabling --audit-mode lite/full reduces throughput by โ‰ˆ38% due to diff calculation; provides full decision provenance.

Note: These figures are validated benchmark results from controlled tests. Actual performance varies depending on dataset, ruleset, and audit mode.
See docs/benchmarks.md for full details.


๐Ÿ” Cryptographic Integrity Verification (v0.7.7)

SHA-256 fingerprint verification benchmarked on large datasets:

Rows Engine (Standard) Engine (FastIO) Result
5M 178,903 rows/s 198,448 rows/s โœ… All fingerprints match
10M 178,511 rows/s 214,136 rows/s โœ… All fingerprints match
20M 181,566 rows/s 217,068 rows/s โœ… All fingerprints match

What this benchmark validated: Every row's immutable fields (date, amount, counterparty) were verified via SHA-256 hash before and after engine processing. Zero cross-row contamination detected. Zero data corruption detected. 60M rows verified field-by-field across three runs, zero mismatches.

Note: As of v0.7.7, SHA-256 integrity verification is available as a CLI feature via --verify (fast fingerprint) and --verify-full (fingerprint + field comparison). Use --verify-output-dir to save audit artifacts (JSON report + proof CSV). See docs/cli_reference.md for details.


๐ŸŒ Internationalization Matrix

Region Example Number Date Order CLI Flags
๐Ÿ‡บ๐Ÿ‡ธ US / ๐Ÿ‡จ๐Ÿ‡ฆ Canada 1,234.56 MM/DD (defaults)
๐Ÿ‡ฌ๐Ÿ‡ง UK / ๐Ÿ‡ฆ๐Ÿ‡บ Commonwealth 1,234.56 DD/MM --dayfirst
๐Ÿ‡ช๐Ÿ‡บ Continental Europe 1.234,56 DD/MM --decimal "," --thousands "." --dayfirst
๐Ÿ‡จ๐Ÿ‡ญ Switzerland 1'234.56 DD/MM --thousands "'" --dayfirst

Auto-Detection and Normalization: BOM-safe UTF-8 encodings, , ; | \t delimiters, and automatic currency symbol stripping.


๐Ÿง  The Growth Loop Explained

Discover โ†’ Suggest โ†’ Categorize โ†’ Repeat

FinLang's Growth Loop accelerates rule creation through data-driven discovery.

  • Discover uncategorized counterparties
  • Suggest new rules in seconds (1:1 mapping in exact mode)
  • Merge + Re-run for incremental coverage gains
  • Validated Result: 97.8% success on addressable patterns
  • ROI: 8.8 transactions categorized per new rule

๐Ÿ“„ See: docs/growth_loop_best_practices.md


๐Ÿงพ Known Limitations (v0.7.x)

  • โš ๏ธ --emit-match fuzzy (default) filters corporate stopwords (LTD, LLC, PLC, INC, GROUP, COMPANY, CO, SAS, GMBH, CORP) and deduplicates patterns within a batch (v0.7.7). Edge cases with very short counterparty names may still produce broad patterns. โ†’ Use --emit-match exact for production workflows.
  • โš ๏ธ Hyphenated/apostrophe names may affect fuzzy matching (< 1% impact).
  • โš ๏ธ No support for non-Gregorian calendars or non-Western numerals.

๐Ÿ“˜ Documentation

Command-line help:

finlang --help
finlang-discover --help
finlang-suggest --help

๐Ÿงฉ Example CLI Usage

finlang --input bank.csv --output categorized.csv \
  --rules examples/rules.demo.fin \
  --include-pack retail,transport,subs \
  --fastio --audit audit_log.json --audit-mode lite

๐Ÿ“œ License & Commercial Use

FinLang is open source under the GNU Affero General Public License (AGPL-3.0).
Commercial licenses and enterprise support are available via FinLang Ltd.

๐Ÿ“ง info@finlang.io
๐ŸŒ https://finlang.io


Contributing

Contributions are welcome! Before submitting a PR, please review and accept our Contributor Licence Agreement (CLA).


๐Ÿ“Œ Version Summary

Component Version Status
Core Engine v0.7.8 โœ… Production-Ready (byte-identical to v0.7.7)
CLI Suite v0.7.8 โœ… Validated (137 tests, 10 gates)
Discover/Suggest v0.7.8 โœ… 97.8% accuracy
Integrity Test v0.7.8 โœ… 20M rows verified, ~217K rows/sec FastIO
Verify v0.7.8 โœ… Built-in --verify / --verify-full
Reconcile v0.7.8 โœ… Built-in --reconcile / --reconcile-html (new)
Docs v0.7.8 โœ… Complete
Python Support 3.10โ€”3.14 โœ… Tested

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finlang-0.7.8.tar.gz (75.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finlang-0.7.8-py3-none-any.whl (80.0 kB view details)

Uploaded Python 3

File details

Details for the file finlang-0.7.8.tar.gz.

File metadata

  • Download URL: finlang-0.7.8.tar.gz
  • Upload date:
  • Size: 75.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for finlang-0.7.8.tar.gz
Algorithm Hash digest
SHA256 826a6158fbb26771f6a8e84b144967c8f7294494e3bb01b8d6497d0a3811100e
MD5 a0cb42795d4fb9344729fe3414d6edbf
BLAKE2b-256 d2255246a0aa783c09b7d827755d8d6bd717e4ff14fecf69d3032194de9cee4a

See more details on using hashes here.

File details

Details for the file finlang-0.7.8-py3-none-any.whl.

File metadata

  • Download URL: finlang-0.7.8-py3-none-any.whl
  • Upload date:
  • Size: 80.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for finlang-0.7.8-py3-none-any.whl
Algorithm Hash digest
SHA256 138198b2d569f3ef59b350a9a4dea8b4a03b6cca4e2cb1f208959ee925c8cf5f
MD5 0de5ed00dea68f9a127ab8c852fa63e4
BLAKE2b-256 6f396607a2fdaff99ba0954bd463fbd633e9d6570d4d27aa1ecbdeda26b0e9f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page