Extract structured financial entities from Indian banking messages
Project description
language:
- en license: mit library_name: transformers tags:
- finance
- entity-extraction
- ner
- phi-3
- production
- indian-banking base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation
Finance Entity Extractor (FinEE) v1.0
Extract structured financial data from Indian banking messages.
94.5% field accuracy. <1ms latency. Zero setup.
โก Install & Run in 10 Seconds
pip install finee
from finee import extract
r = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")
print(r.amount) # 2500.0
print(r.merchant) # "Swiggy"
print(r.category) # "food"
No model download. No API keys. Works offline.
๐ Output Schema Contract
Every extraction returns this guaranteed JSON structure:
{
"amount": 2500.0, // float - Always numeric
"currency": "INR", // string - ISO 4217
"type": "debit", // "debit" | "credit"
"account": "3545", // string - Last 4 digits
"date": "28-12-2025", // string - DD-MM-YYYY
"reference": "534567891234",// string - UPI/NEFT ref
"merchant": "Swiggy", // string - Normalized name
"category": "food", // string - food|shopping|transport|...
"vpa": "swiggy@ybl", // string - Raw VPA
"confidence": 0.95, // float - 0.0 to 1.0
"confidence_level": "HIGH" // "LOW" | "MEDIUM" | "HIGH"
}
๐ฌ Verify Accuracy Yourself
Don't trust "99% accuracy" claims. Run the benchmark:
# Clone and test
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install finee
# Run benchmark
python benchmark.py --all
Test on YOUR data:
python benchmark.py --file your_transactions.jsonl
๐ Torture Test (Edge Cases)
Real bank SMS is messy. Here's how FinEE handles the chaos:
| Edge Case | Input | Result |
|---|---|---|
| Missing spaces | Rs.500.00debited from A/c1234 |
โ amount=500.0 |
| Weird formatting | Rs 2,500/-debited dt:28/12/25 |
โ amount=2500.0 |
| Mixed case | RS. 1500 DEBITED from ACCT |
โ amount=1500.0, type=debit |
| Unicode symbols | โน2,500 debited from โขโขโขโข 3545 |
โ amount=2500.0 |
| Multiple amounts | Rs.500 debited. Bal: Rs.15,000 |
โ amount=500.0 (first) |
| Truncated SMS | Rs.2500 debited from A/c...3545 to swi... |
โ amount=2500.0 |
| Extra noise | ALERT! Dear Customer, Rs.500 debited... Ignore if done by you. |
โ amount=500.0 |
Run torture tests:
python benchmark.py --torture
๐ฆ Supported Banks
| Bank | Debit | Credit | UPI | NEFT/IMPS |
|---|---|---|---|---|
| HDFC | โ | โ | โ | โ |
| ICICI | โ | โ | โ | โ |
| SBI | โ | โ | โ | โ |
| Axis | โ | โ | โ | โ |
| Kotak | โ | โ | โ | โ |
๐๏ธ Architecture
Input Text
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 0: Hash Cache (<1ms if seen before) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 1: Regex Engine (50+ battle-tested patterns) โ
โ Extract: amount, date, reference, account, vpa, type โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 2: Rule-Based Mapping (200+ VPA โ merchant) โ
โ Map: vpa โ merchant, merchant โ category โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 3: LLM (Optional, for edge cases) โ
โ Targeted prompts for: merchant, category only โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
ExtractionResult (Guaranteed Schema)
๐ Benchmark Results
| Metric | Value |
|---|---|
| Field Accuracy | 94.5% |
| Latency (Regex) | <1ms |
| Latency (LLM) | ~50ms |
| Throughput | 50,000+ msg/sec |
| Banks Tested | 5 (HDFC, ICICI, SBI, Axis, Kotak) |
๐ป CLI Usage
# Extract from text
finee extract "Rs.500 debited from A/c 1234"
# Show version
finee --version
# Check available backends
finee backends
๐ Repository Structure
Finance-Entity-Extractor/
โโโ src/finee/ # Core package (16 modules)
โ โโโ extractor.py # Pipeline orchestrator
โ โโโ regex_engine.py # 50+ regex patterns
โ โโโ merchants.py # 200+ VPA mappings
โ โโโ backends/ # MLX, PyTorch, GGUF
โโโ tests/ # 88 unit tests
โโโ examples/ # Colab notebook
โโโ experiments/ # Research notebooks
โโโ benchmark.py # โญ Verify accuracy yourself
โโโ pyproject.toml
โโโ README.md
๐ค Contributing
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[dev]"
pytest tests/
๐ License
MIT License - see LICENSE
Made with โค๏ธ by Ranjit Behera
PyPI ยท GitHub ยท Hugging Face
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finee-1.0.3.tar.gz.
File metadata
- Download URL: finee-1.0.3.tar.gz
- Upload date:
- Size: 29.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
314d6b0a34aef7524fbe990d8d81d0b36cb8c6294c0eb374a1d8e4590e025e5e
|
|
| MD5 |
77c8b9ab8f5648f507b80a4a327612fa
|
|
| BLAKE2b-256 |
34d018e6e5e0b0fe134c1435eebeb9f5122b309cf709797dd94c791c4fdf30fb
|
File details
Details for the file finee-1.0.3-py3-none-any.whl.
File metadata
- Download URL: finee-1.0.3-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
679df5028567e484cd8d357102a34ddca6274ad1be6528a926fb9fae9356be9d
|
|
| MD5 |
b461367426d716fe1806fb9e47399235
|
|
| BLAKE2b-256 |
7bd71f8b0ee0dfbd47ca47f2d514214282806af72aaf71c4761d0432200d086c
|