Skip to main content

Extract structured financial entities from Indian banking messages

Project description


language:

  • en license: mit library_name: transformers tags:
  • finance
  • entity-extraction
  • ner
  • phi-3
  • production
  • gguf
  • indian-banking
  • structured-output base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation

Finance Entity Extractor (FinEE) v1.0

Model Name License Parameters GGUF Tests

A production-ready 3.8B parameter language model optimized for zero-shot financial entity extraction.
Validated on Indian banking syntax (HDFC, ICICI, SBI, Axis, Kotak) with 94.5% field accuracy.

[ Model Card ] · [ GitHub ] · [ Quick Start ]


Performance Benchmarks

Comparison with Foundation Models

Model Parameters Entity Precision (India) Latency (CPU) Cost
FinEE-3.8B (Ours) 3.8B 94.5% 45ms Free
Llama-3-8B-Instruct 8B 89.4% 120ms Free
GPT-3.5-Turbo ~175B 94.1% ~500ms $0.002/1K
GPT-4 ~1.7T 96.8% ~800ms $0.03/1K

Platform Support

Platform Framework Status
macOS Apple Silicon MLX ✅ Full Support
Linux + NVIDIA GPU PyTorch/Transformers ✅ Full Support
Linux + CPU PyTorch/GGUF ✅ Full Support
Windows GGUF/llama.cpp ✅ Full Support

🐍 Quick Start with FinEE Library

The easiest way to use the model is through the finee Python library, which handles backend selection, caching, and validation automatically.

Installation

# Install from GitHub
pip install git+https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git

# Or clone and install locally
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[metal]"   # Apple Silicon
pip install -e ".[cuda]"    # NVIDIA GPU
pip install -e ".[cpu]"     # CPU only

Usage

from finee import extract

# Automatic backend detection (MLX, CUDA, or CPU)
text = "Rs.500 paid to swiggy@ybl on 01-01-2025"
result = extract(text)

print(f"Amount: {result.amount}")
print(f"Merchant: {result.merchant} ({result.category})")
print(f"Confidence: {result.confidence.value}")

# Output JSON
print(result.to_json())
# {
#   "amount": 500.0,
#   "type": "debit",
#   "merchant": "Swiggy",
#   "category": "food",
#   "date": "01-01-2025",
#   ...
# }

Command Line Interface

# Direct extraction
finee extract "Rs.500 debited from A/c 1234"

# Check available backends
finee backends

📋 Overview

This project demonstrates how to:

  1. Parse 40K+ emails from a Gmail MBOX export
  2. Classify emails into categories using Phi-3 Mini
  3. Discover patterns in financial emails (transactions, amounts, dates)
  4. Fine-tune a local LLM using LoRA for entity extraction
  5. Extract structured data: amount, transaction type, account, date, reference

🏗️ Project Structure

Finance-Entity-Extractor/
├── src/
│   └── finee/                 # FinEE Package
│       ├── __init__.py
│       ├── extractor.py       # Main pipeline orchestrator
│       ├── cache.py           # Tier 0 LRU Cache
│       ├── regex_engine.py    # Tier 1 Regex Engine
│       ├── merchants.py       # Tier 2 Rule Mapping
│       ├── prompt.py          # Tier 3 Targeted Prompts
│       ├── validator.py       # Tier 4 Validation & Repair
│       ├── backends/          # Auto-detecting Backends (MLX, PT, GGUF)
│       └── cli.py             # Command Line Interface
├── tests/                     # 88 Unit Tests
├── .github/workflows/         # CI/CD
├── pyproject.toml
├── train.py                   # Training pipeline
└── README.md

🎯 Extracted Entities

Entity Description Example
amount Transaction amount "2500.00"
type Debit or Credit "debit"
account Account identifier "3545"
date Transaction date "28-12-25"
reference UPI/NEFT reference "534567891234"
merchant Merchant name "swiggy"
category Transaction category "food"
confidence Extraction confidence "HIGH"

📈 Benchmark Results

Multi-Bank Validation (v8)

Bank Field Accuracy Status
ICICI 96.2%
HDFC 95.0%
SBI 93.3%
Axis 93.3%
Kotak 92.0%
Overall 94.5%

Field-Level Accuracy

Field Accuracy
Amount 98.5%
Type 99.2%
Date 97.8%
Account 96.1%
Reference 72.7%

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments


Made with ❤️ by Ranjit Behera

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finee-1.0.0.tar.gz (29.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finee-1.0.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file finee-1.0.0.tar.gz.

File metadata

  • Download URL: finee-1.0.0.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0fcb67f30d8e75b927d1bfc90776dd5e742a3e126ffa38b235801d0cf003906e
MD5 73067c29eb7f48350d21d66ba806f0c7
BLAKE2b-256 b11b0bcd3dd1dcfd6041c1a849340c9f65b34c49936ef40c8f86bc74a7ad3978

See more details on using hashes here.

File details

Details for the file finee-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: finee-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1f800407909361b9026dd1e834c44a8bd121ab37b092e3485cdea631dab4347
MD5 5f4bacca8e16ff8c41a3efe4479e9f47
BLAKE2b-256 b82067782485b601471707d4db4f94d2245f3907390d92957d85d481d9332912

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page