Skip to main content

Extract structured financial entities from Indian banking messages

Project description


language:

  • en license: mit library_name: transformers tags:
  • finance
  • entity-extraction
  • ner
  • phi-3
  • production
  • indian-banking base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation

Finance Entity Extractor (FinEE) v1.0

PyPI Tests License Open In Colab

Extract structured financial data from Indian banking messages.
94.5% field accuracy. <1ms latency. Zero setup.


โšก Install & Run in 10 Seconds

pip install finee
from finee import extract

r = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")

print(r.amount)    # 2500.0
print(r.merchant)  # "Swiggy"
print(r.category)  # "food"

No model download. No API keys. Works offline.


๐Ÿ“‹ Output Schema Contract

Every extraction returns this guaranteed JSON structure:

{
  "amount": 2500.0,           // float - Always numeric
  "currency": "INR",          // string - ISO 4217
  "type": "debit",            // "debit" | "credit"
  "account": "3545",          // string - Last 4 digits
  "date": "28-12-2025",       // string - DD-MM-YYYY
  "reference": "534567891234",// string - UPI/NEFT ref
  "merchant": "Swiggy",       // string - Normalized name
  "category": "food",         // string - food|shopping|transport|...
  "vpa": "swiggy@ybl",        // string - Raw VPA
  "confidence": 0.95,         // float - 0.0 to 1.0
  "confidence_level": "HIGH"  // "LOW" | "MEDIUM" | "HIGH"
}

๐Ÿ”ฌ Verify Accuracy Yourself

Don't trust "99% accuracy" claims. Run the benchmark:

# Clone and test
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install finee

# Run benchmark
python benchmark.py --all

Test on YOUR data:

python benchmark.py --file your_transactions.jsonl

๐Ÿ’€ Torture Test (Edge Cases)

Real bank SMS is messy. Here's how FinEE handles the chaos:

Edge Case Input Result
Missing spaces Rs.500.00debited from A/c1234 โœ… amount=500.0
Weird formatting Rs 2,500/-debited dt:28/12/25 โœ… amount=2500.0
Mixed case RS. 1500 DEBITED from ACCT โœ… amount=1500.0, type=debit
Unicode symbols โ‚น2,500 debited from โ€ขโ€ขโ€ขโ€ข 3545 โœ… amount=2500.0
Multiple amounts Rs.500 debited. Bal: Rs.15,000 โœ… amount=500.0 (first)
Truncated SMS Rs.2500 debited from A/c...3545 to swi... โœ… amount=2500.0
Extra noise ALERT! Dear Customer, Rs.500 debited... Ignore if done by you. โœ… amount=500.0

Run torture tests:

python benchmark.py --torture

๐Ÿฆ Supported Banks

Bank Debit Credit UPI NEFT/IMPS
HDFC โœ… โœ… โœ… โœ…
ICICI โœ… โœ… โœ… โœ…
SBI โœ… โœ… โœ… โœ…
Axis โœ… โœ… โœ… โœ…
Kotak โœ… โœ… โœ… โœ…

๐Ÿ—๏ธ Architecture

Input Text
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 0: Hash Cache (<1ms if seen before)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 1: Regex Engine (50+ battle-tested patterns)          โ”‚
โ”‚ Extract: amount, date, reference, account, vpa, type       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 2: Rule-Based Mapping (200+ VPA โ†’ merchant)           โ”‚
โ”‚ Map: vpa โ†’ merchant, merchant โ†’ category                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 3: LLM (Optional, for edge cases)                     โ”‚
โ”‚ Targeted prompts for: merchant, category only              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
ExtractionResult (Guaranteed Schema)

๐Ÿ“Š Benchmark Results

Metric Value
Field Accuracy 94.5%
Latency (Regex) <1ms
Latency (LLM) ~50ms
Throughput 50,000+ msg/sec
Banks Tested 5 (HDFC, ICICI, SBI, Axis, Kotak)

๐Ÿ’ป CLI Usage

# Extract from text
finee extract "Rs.500 debited from A/c 1234"

# Show version
finee --version

# Check available backends
finee backends

๐Ÿ“ Repository Structure

Finance-Entity-Extractor/
โ”œโ”€โ”€ src/finee/              # Core package (16 modules)
โ”‚   โ”œโ”€โ”€ extractor.py        # Pipeline orchestrator
โ”‚   โ”œโ”€โ”€ regex_engine.py     # 50+ regex patterns
โ”‚   โ”œโ”€โ”€ merchants.py        # 200+ VPA mappings
โ”‚   โ””โ”€โ”€ backends/           # MLX, PyTorch, GGUF
โ”œโ”€โ”€ tests/                  # 88 unit tests
โ”œโ”€โ”€ examples/               # Colab notebook
โ”œโ”€โ”€ experiments/            # Research notebooks
โ”œโ”€โ”€ benchmark.py            # โญ Verify accuracy yourself
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

๐Ÿค Contributing

git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[dev]"
pytest tests/

๐Ÿ“„ License

MIT License - see LICENSE


Made with โค๏ธ by Ranjit Behera

PyPI ยท GitHub ยท Hugging Face

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finee-1.0.3.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finee-1.0.3-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file finee-1.0.3.tar.gz.

File metadata

  • Download URL: finee-1.0.3.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.3.tar.gz
Algorithm Hash digest
SHA256 314d6b0a34aef7524fbe990d8d81d0b36cb8c6294c0eb374a1d8e4590e025e5e
MD5 77c8b9ab8f5648f507b80a4a327612fa
BLAKE2b-256 34d018e6e5e0b0fe134c1435eebeb9f5122b309cf709797dd94c791c4fdf30fb

See more details on using hashes here.

File details

Details for the file finee-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: finee-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 36.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 679df5028567e484cd8d357102a34ddca6274ad1be6528a926fb9fae9356be9d
MD5 b461367426d716fe1806fb9e47399235
BLAKE2b-256 7bd71f8b0ee0dfbd47ca47f2d514214282806af72aaf71c4761d0432200d086c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page