Skip to main content

Extract structured financial entities from Indian banking messages

Project description


language:

  • en license: mit library_name: transformers tags:
  • finance
  • entity-extraction
  • ner
  • phi-3
  • production
  • gguf
  • indian-banking
  • structured-output base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation

Finance Entity Extractor (FinEE) v1.0

PyPI Tests License Open In Colab

Extract structured financial data from Indian banking messages in one command.
94.5% field accuracy across HDFC, ICICI, SBI, Axis, Kotak.


โšก One-Command Installation

pip install finee

That's it. No cloning, no setup.


๐Ÿš€ 30-Second Quick Start

from finee import extract

# Parse any Indian bank message
result = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")

print(result.amount)      # 2500.0
print(result.merchant)    # "Swiggy"
print(result.category)    # "food"
print(result.confidence)  # Confidence.HIGH

Try it live: Open In Colab


๐Ÿ“‹ Output Schema Contract

Every extraction returns a guaranteed JSON structure:

{
  "amount": 2500.0,           // float - Always numeric, never "Rs. 2,500"
  "currency": "INR",          // string - ISO 4217 code
  "type": "debit",            // string - "debit" | "credit"
  "account": "3545",          // string - Last 4 digits only
  "date": "28-12-2025",       // string - DD-MM-YYYY format
  "reference": "534567891234",// string - UPI/NEFT reference
  "merchant": "Swiggy",       // string - Normalized name (not "VPA-SWIGGY-BLR")
  "category": "food",         // string - Enum: food|shopping|transport|bills|...
  "vpa": "swiggy@ybl",        // string - Raw VPA
  "confidence": 0.95,         // float - 0.0 to 1.0
  "confidence_level": "HIGH"  // string - "LOW" | "MEDIUM" | "HIGH"
}

Type Definitions (TypeScript-style)

interface ExtractionResult {
  amount: number | null;
  currency: "INR";
  type: "debit" | "credit" | null;
  account: string | null;
  date: string | null;        // DD-MM-YYYY
  reference: string | null;
  merchant: string | null;
  category: Category | null;
  vpa: string | null;
  confidence: number;         // 0.0 - 1.0
  confidence_level: "LOW" | "MEDIUM" | "HIGH";
}

type Category = 
  | "food" | "shopping" | "transport" | "bills"
  | "entertainment" | "travel" | "grocery" | "fuel"
  | "healthcare" | "education" | "investment" | "transfer" | "other";

๐Ÿฆ Supported Banks

Bank Debit Credit UPI NEFT/IMPS
HDFC โœ… โœ… โœ… โœ…
ICICI โœ… โœ… โœ… โœ…
SBI โœ… โœ… โœ… โœ…
Axis โœ… โœ… โœ… โœ…
Kotak โœ… โœ… โœ… โœ…

๐Ÿ“Š Benchmark

Metric Value
Field Accuracy 94.5%
Latency (Regex mode) <1ms
Latency (LLM mode) ~50ms
Throughput 50,000+ msg/sec

๐Ÿ”ง Installation Options

# Core (Regex + Rules only, no ML)
pip install finee

# With Apple Silicon backend
pip install "finee[metal]"

# With NVIDIA GPU backend
pip install "finee[cuda]"

# With CPU backend (llama.cpp)
pip install "finee[cpu]"

๐Ÿ’ป CLI Usage

# Extract from text
finee extract "Rs.500 debited from A/c 1234"

# Check available backends
finee backends

# Show version
finee --version

๐Ÿ—๏ธ Architecture

Input Text
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 0: Hash Cache (<1ms if seen before)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 1: Regex Engine                                        โ”‚
โ”‚ Extract: amount, date, reference, account, vpa, type        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 2: Rule-Based Mapping                                  โ”‚
โ”‚ Map: vpa โ†’ merchant, merchant โ†’ category                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 3: LLM (Optional, for missing fields)                  โ”‚
โ”‚ Targeted prompts for: merchant, category only               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 4: Validation + Normalization                          โ”‚
โ”‚ JSON repair, date normalization, confidence scoring         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
ExtractionResult (Guaranteed Schema)

๐Ÿค Contributing

git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[dev]"
pytest tests/

๐Ÿ“„ License

MIT License - see LICENSE


Made with โค๏ธ by Ranjit Behera

GitHub ยท PyPI ยท Hugging Face

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finee-1.0.1.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finee-1.0.1-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file finee-1.0.1.tar.gz.

File metadata

  • Download URL: finee-1.0.1.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.1.tar.gz
Algorithm Hash digest
SHA256 19fca9c9b33266e14b72b1d01ebd636df67d9e781de7ce192ba467489cf2faf8
MD5 ababc297e0df269f4b906f3ac102d377
BLAKE2b-256 d2bf91e2ae01027902ea688eccc51026e2c0eda5ad3d6d67a77e3465897eaed1

See more details on using hashes here.

File details

Details for the file finee-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: finee-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50f76727d620767b6b6e976663bd2c7d4f471b674383aaadcfaf5753f9f05cb4
MD5 e112869c2f9be2e567baea5d00f9ec01
BLAKE2b-256 179749f5d5c66e58746195b25a7347f673112ec3786cd3b3bea2c31a856d865d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page