Skip to main content

Extract structured financial entities from Indian banking messages

Project description


language:

  • en license: mit library_name: transformers tags:
  • finance
  • entity-extraction
  • ner
  • phi-3
  • production
  • gguf
  • indian-banking
  • structured-output base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation

Finance Entity Extractor (FinEE) v1.0

PyPI Tests License Open In Colab

Extract structured financial data from Indian banking messages in one command.
94.5% field accuracy across HDFC, ICICI, SBI, Axis, Kotak.


โšก One-Command Installation

pip install finee

That's it. No cloning, no setup.


๐Ÿš€ 30-Second Quick Start

from finee import extract

# Parse any Indian bank message
result = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")

print(result.amount)      # 2500.0
print(result.merchant)    # "Swiggy"
print(result.category)    # "food"
print(result.confidence)  # Confidence.HIGH

Try it live: Open In Colab


๐Ÿ“‹ Output Schema Contract

Every extraction returns a guaranteed JSON structure:

{
  "amount": 2500.0,           // float - Always numeric, never "Rs. 2,500"
  "currency": "INR",          // string - ISO 4217 code
  "type": "debit",            // string - "debit" | "credit"
  "account": "3545",          // string - Last 4 digits only
  "date": "28-12-2025",       // string - DD-MM-YYYY format
  "reference": "534567891234",// string - UPI/NEFT reference
  "merchant": "Swiggy",       // string - Normalized name (not "VPA-SWIGGY-BLR")
  "category": "food",         // string - Enum: food|shopping|transport|bills|...
  "vpa": "swiggy@ybl",        // string - Raw VPA
  "confidence": 0.95,         // float - 0.0 to 1.0
  "confidence_level": "HIGH"  // string - "LOW" | "MEDIUM" | "HIGH"
}

Type Definitions (TypeScript-style)

interface ExtractionResult {
  amount: number | null;
  currency: "INR";
  type: "debit" | "credit" | null;
  account: string | null;
  date: string | null;        // DD-MM-YYYY
  reference: string | null;
  merchant: string | null;
  category: Category | null;
  vpa: string | null;
  confidence: number;         // 0.0 - 1.0
  confidence_level: "LOW" | "MEDIUM" | "HIGH";
}

type Category = 
  | "food" | "shopping" | "transport" | "bills"
  | "entertainment" | "travel" | "grocery" | "fuel"
  | "healthcare" | "education" | "investment" | "transfer" | "other";

๐Ÿฆ Supported Banks

Bank Debit Credit UPI NEFT/IMPS
HDFC โœ… โœ… โœ… โœ…
ICICI โœ… โœ… โœ… โœ…
SBI โœ… โœ… โœ… โœ…
Axis โœ… โœ… โœ… โœ…
Kotak โœ… โœ… โœ… โœ…

๐Ÿ“Š Benchmark

Metric Value
Field Accuracy 94.5%
Latency (Regex mode) <1ms
Latency (LLM mode) ~50ms
Throughput 50,000+ msg/sec

๐Ÿ”ง Installation Options

# Core (Regex + Rules only, no ML)
pip install finee

# With Apple Silicon backend
pip install "finee[metal]"

# With NVIDIA GPU backend
pip install "finee[cuda]"

# With CPU backend (llama.cpp)
pip install "finee[cpu]"

๐Ÿ’ป CLI Usage

# Extract from text
finee extract "Rs.500 debited from A/c 1234"

# Check available backends
finee backends

# Show version
finee --version

๐Ÿ—๏ธ Architecture

Input Text
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 0: Hash Cache (<1ms if seen before)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 1: Regex Engine                                        โ”‚
โ”‚ Extract: amount, date, reference, account, vpa, type        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 2: Rule-Based Mapping                                  โ”‚
โ”‚ Map: vpa โ†’ merchant, merchant โ†’ category                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 3: LLM (Optional, for missing fields)                  โ”‚
โ”‚ Targeted prompts for: merchant, category only               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TIER 4: Validation + Normalization                          โ”‚
โ”‚ JSON repair, date normalization, confidence scoring         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
ExtractionResult (Guaranteed Schema)

๐Ÿค Contributing

git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install -e ".[dev]"
pytest tests/

๐Ÿ“„ License

MIT License - see LICENSE


Made with โค๏ธ by Ranjit Behera

GitHub ยท PyPI ยท Hugging Face

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finee-1.0.2.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finee-1.0.2-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file finee-1.0.2.tar.gz.

File metadata

  • Download URL: finee-1.0.2.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ac7f8d12b879956f394e2788d841e6b89c57fea0ce31531b123463b308d65840
MD5 f01f4d898105910d3609d7398d4f6d57
BLAKE2b-256 fe708fc0d9d286c6a9bf5e8b30b5c0ca1595fe0b7698760e0f6426472d7107b0

See more details on using hashes here.

File details

Details for the file finee-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: finee-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for finee-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 15b09970fcd2a5b9bea7494d937c7f113e25c9d63f1baef55afea1da015dd9e8
MD5 f4d5740b0aedb9b1a7438351277075fe
BLAKE2b-256 6ecf178629fb31c71570ab29c2cf7ed4bba060f6f87a64dd9de20a84bab87316

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page