Extract structured financial entities from Indian banking messages
Project description
language:
- en license: mit library_name: transformers tags:
- finance
- entity-extraction
- ner
- phi-3
- production
- gguf
- indian-banking
- structured-output base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation
Finance Entity Extractor (FinEE) v1.0
A production-ready 3.8B parameter language model optimized for zero-shot financial entity extraction.
Validated on Indian banking syntax (HDFC, ICICI, SBI, Axis, Kotak) with 94.5% field accuracy.
[ Model Card ] · [ GitHub ] · [ Quick Start ]
Performance Benchmarks
Comparison with Foundation Models
| Model | Parameters | Entity Precision (India) | Latency (CPU) | Cost |
|---|---|---|---|---|
| FinEE-3.8B (Ours) | 3.8B | 94.5% | 45ms | Free |
| Llama-3-8B-Instruct | 8B | 89.4% | 120ms | Free |
| GPT-3.5-Turbo | ~175B | 94.1% | ~500ms | $0.002/1K |
| GPT-4 | ~1.7T | 96.8% | ~800ms | $0.03/1K |
Platform Support
| Platform | Framework | Status |
|---|---|---|
| macOS Apple Silicon | MLX | ✅ Full Support |
| Linux + NVIDIA GPU | PyTorch/Transformers | ✅ Full Support |
| Linux + CPU | PyTorch/GGUF | ✅ Full Support |
| Windows | GGUF/llama.cpp | ✅ Full Support |
🐍 Quick Start with FinEE Library
The easiest way to use the model is through the finee Python library, which handles backend selection, caching, and validation automatically.
Installation
# Install from GitHub
pip install git+https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
# Or clone and install locally
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[metal]" # Apple Silicon
pip install -e ".[cuda]" # NVIDIA GPU
pip install -e ".[cpu]" # CPU only
Usage
from finee import extract
# Automatic backend detection (MLX, CUDA, or CPU)
text = "Rs.500 paid to swiggy@ybl on 01-01-2025"
result = extract(text)
print(f"Amount: {result.amount}")
print(f"Merchant: {result.merchant} ({result.category})")
print(f"Confidence: {result.confidence.value}")
# Output JSON
print(result.to_json())
# {
# "amount": 500.0,
# "type": "debit",
# "merchant": "Swiggy",
# "category": "food",
# "date": "01-01-2025",
# ...
# }
Command Line Interface
# Direct extraction
finee extract "Rs.500 debited from A/c 1234"
# Check available backends
finee backends
📋 Overview
This project demonstrates how to:
- Parse 40K+ emails from a Gmail MBOX export
- Classify emails into categories using Phi-3 Mini
- Discover patterns in financial emails (transactions, amounts, dates)
- Fine-tune a local LLM using LoRA for entity extraction
- Extract structured data: amount, transaction type, account, date, reference
🏗️ Project Structure
Finance-Entity-Extractor/
├── src/
│ └── finee/ # FinEE Package
│ ├── __init__.py
│ ├── extractor.py # Main pipeline orchestrator
│ ├── cache.py # Tier 0 LRU Cache
│ ├── regex_engine.py # Tier 1 Regex Engine
│ ├── merchants.py # Tier 2 Rule Mapping
│ ├── prompt.py # Tier 3 Targeted Prompts
│ ├── validator.py # Tier 4 Validation & Repair
│ ├── backends/ # Auto-detecting Backends (MLX, PT, GGUF)
│ └── cli.py # Command Line Interface
├── tests/ # 88 Unit Tests
├── .github/workflows/ # CI/CD
├── pyproject.toml
├── train.py # Training pipeline
└── README.md
🎯 Extracted Entities
| Entity | Description | Example |
|---|---|---|
amount |
Transaction amount | "2500.00" |
type |
Debit or Credit | "debit" |
account |
Account identifier | "3545" |
date |
Transaction date | "28-12-25" |
reference |
UPI/NEFT reference | "534567891234" |
merchant |
Merchant name | "swiggy" |
category |
Transaction category | "food" |
confidence |
Extraction confidence | "HIGH" |
📈 Benchmark Results
Multi-Bank Validation (v8)
| Bank | Field Accuracy | Status |
|---|---|---|
| ICICI | 96.2% | ✅ |
| HDFC | 95.0% | ✅ |
| SBI | 93.3% | ✅ |
| Axis | 93.3% | ✅ |
| Kotak | 92.0% | ✅ |
| Overall | 94.5% | ✅ |
Field-Level Accuracy
| Field | Accuracy |
|---|---|
| Amount | 98.5% |
| Type | 99.2% |
| Date | 97.8% |
| Account | 96.1% |
| Reference | 72.7% |
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Microsoft for Phi-3 model
- MLX team for the amazing framework
- Hugging Face for model hosting
Made with ❤️ by Ranjit Behera
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finee-1.0.0.tar.gz.
File metadata
- Download URL: finee-1.0.0.tar.gz
- Upload date:
- Size: 29.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fcb67f30d8e75b927d1bfc90776dd5e742a3e126ffa38b235801d0cf003906e
|
|
| MD5 |
73067c29eb7f48350d21d66ba806f0c7
|
|
| BLAKE2b-256 |
b11b0bcd3dd1dcfd6041c1a849340c9f65b34c49936ef40c8f86bc74a7ad3978
|
File details
Details for the file finee-1.0.0-py3-none-any.whl.
File metadata
- Download URL: finee-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1f800407909361b9026dd1e834c44a8bd121ab37b092e3485cdea631dab4347
|
|
| MD5 |
5f4bacca8e16ff8c41a3efe4479e9f47
|
|
| BLAKE2b-256 |
b82067782485b601471707d4db4f94d2245f3907390d92957d85d481d9332912
|