Extract structured financial entities from Indian banking messages
Project description
language:
- en license: mit library_name: transformers tags:
- finance
- entity-extraction
- ner
- phi-3
- production
- gguf
- indian-banking
- structured-output base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation
Finance Entity Extractor (FinEE) v1.0
Extract structured financial data from Indian banking messages in one command.
94.5% field accuracy across HDFC, ICICI, SBI, Axis, Kotak.
โก One-Command Installation
pip install finee
That's it. No cloning, no setup.
๐ 30-Second Quick Start
from finee import extract
# Parse any Indian bank message
result = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")
print(result.amount) # 2500.0
print(result.merchant) # "Swiggy"
print(result.category) # "food"
print(result.confidence) # Confidence.HIGH
๐ Output Schema Contract
Every extraction returns a guaranteed JSON structure:
{
"amount": 2500.0, // float - Always numeric, never "Rs. 2,500"
"currency": "INR", // string - ISO 4217 code
"type": "debit", // string - "debit" | "credit"
"account": "3545", // string - Last 4 digits only
"date": "28-12-2025", // string - DD-MM-YYYY format
"reference": "534567891234",// string - UPI/NEFT reference
"merchant": "Swiggy", // string - Normalized name (not "VPA-SWIGGY-BLR")
"category": "food", // string - Enum: food|shopping|transport|bills|...
"vpa": "swiggy@ybl", // string - Raw VPA
"confidence": 0.95, // float - 0.0 to 1.0
"confidence_level": "HIGH" // string - "LOW" | "MEDIUM" | "HIGH"
}
Type Definitions (TypeScript-style)
interface ExtractionResult {
amount: number | null;
currency: "INR";
type: "debit" | "credit" | null;
account: string | null;
date: string | null; // DD-MM-YYYY
reference: string | null;
merchant: string | null;
category: Category | null;
vpa: string | null;
confidence: number; // 0.0 - 1.0
confidence_level: "LOW" | "MEDIUM" | "HIGH";
}
type Category =
| "food" | "shopping" | "transport" | "bills"
| "entertainment" | "travel" | "grocery" | "fuel"
| "healthcare" | "education" | "investment" | "transfer" | "other";
๐ฆ Supported Banks
| Bank | Debit | Credit | UPI | NEFT/IMPS |
|---|---|---|---|---|
| HDFC | โ | โ | โ | โ |
| ICICI | โ | โ | โ | โ |
| SBI | โ | โ | โ | โ |
| Axis | โ | โ | โ | โ |
| Kotak | โ | โ | โ | โ |
๐ Benchmark
| Metric | Value |
|---|---|
| Field Accuracy | 94.5% |
| Latency (Regex mode) | <1ms |
| Latency (LLM mode) | ~50ms |
| Throughput | 50,000+ msg/sec |
๐ง Installation Options
# Core (Regex + Rules only, no ML)
pip install finee
# With Apple Silicon backend
pip install "finee[metal]"
# With NVIDIA GPU backend
pip install "finee[cuda]"
# With CPU backend (llama.cpp)
pip install "finee[cpu]"
๐ป CLI Usage
# Extract from text
finee extract "Rs.500 debited from A/c 1234"
# Check available backends
finee backends
# Show version
finee --version
๐๏ธ Architecture
Input Text
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 0: Hash Cache (<1ms if seen before) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 1: Regex Engine โ
โ Extract: amount, date, reference, account, vpa, type โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 2: Rule-Based Mapping โ
โ Map: vpa โ merchant, merchant โ category โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 3: LLM (Optional, for missing fields) โ
โ Targeted prompts for: merchant, category only โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TIER 4: Validation + Normalization โ
โ JSON repair, date normalization, confidence scoring โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
ExtractionResult (Guaranteed Schema)
๐ค Contributing
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[dev]"
pytest tests/
๐ License
MIT License - see LICENSE
Made with โค๏ธ by Ranjit Behera
GitHub ยท PyPI ยท Hugging Face
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finee-1.0.2.tar.gz.
File metadata
- Download URL: finee-1.0.2.tar.gz
- Upload date:
- Size: 29.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac7f8d12b879956f394e2788d841e6b89c57fea0ce31531b123463b308d65840
|
|
| MD5 |
f01f4d898105910d3609d7398d4f6d57
|
|
| BLAKE2b-256 |
fe708fc0d9d286c6a9bf5e8b30b5c0ca1595fe0b7698760e0f6426472d7107b0
|
File details
Details for the file finee-1.0.2-py3-none-any.whl.
File metadata
- Download URL: finee-1.0.2-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15b09970fcd2a5b9bea7494d937c7f113e25c9d63f1baef55afea1da015dd9e8
|
|
| MD5 |
f4d5740b0aedb9b1a7438351277075fe
|
|
| BLAKE2b-256 |
6ecf178629fb31c71570ab29c2cf7ed4bba060f6f87a64dd9de20a84bab87316
|