Skip to main content

This projects converts maybank credit card statement pdf files to a single csv file that allows to be ingestable in other workflow.

Project description

maybankforme

This project converts Maybank credit card statement PDF files to CSV format via a FastAPI web service and an optional CLI.

Table of Contents

Overview

This is a FastAPI web service that accepts encrypted credit card statement PDF files, extracts the text, looks for specific transaction pattern lines, and returns them as a CSV file.

FastAPI Service

Running Locally

Choose one of these ways:

  1. Direct server entrypoint (recommended)
python -m maybankforme.server
  1. Uvicorn import
uvicorn maybankforme.api:app --host 0.0.0.0 --port 8000

Then access locally:

API Endpoints

  • GET / - API information
  • GET /health - Health check
  • GET /docs - Interactive API documentation (Swagger UI)
  • POST /process - Upload PDF files and get CSV response

Using the API

Upload one or more PDF files to process:

# Using curl
curl -X POST "http://localhost:8000/process" \
  -F "files=@statement1.pdf" \
  -F "files=@statement2.pdf" \
  -F "password=your-password" \
  -o transactions.csv

# Using Python requests
import requests

files = [
    ('files', open('statement1.pdf', 'rb')),
    ('files', open('statement2.pdf', 'rb'))
]
data = {'password': 'your-password'}

response = requests.post('http://localhost:8000/process', files=files, data=data)
with open('transactions.csv', 'wb') as f:
    f.write(response.content)

See docs/api_example.py for more examples.

Features

  • ✅ Upload single or multiple PDF files
  • ✅ Password-protected PDF support
  • ✅ Automatic date processing (handles year boundaries)
  • ✅ Returns sorted CSV with all transactions
  • ✅ File size validation (10MB max per file)
  • ✅ Comprehensive error handling
  • ✅ Health check endpoint for monitoring
  • Structured JSON logging for containers
  • Runtime configurable log levels

Logging

This application uses structlog for structured logging with automatic JSON formatting in containers.

Logging quick start

Development Mode (Human-readable console output):

export LOG_LEVEL=INFO
uvicorn maybankforme.api:app --reload

Production/Container Mode (JSON output):

export LOG_FORMAT=json
export LOG_LEVEL=INFO
uvicorn maybankforme.api:app

Configuration

Control logging behavior with environment variables:

Variable Values Default Description
LOG_LEVEL DEBUG, INFO, WARNING, ERROR INFO Log verbosity level
LOG_FORMAT json, console Auto-detect Output format
IN_CONTAINER true, false Auto-detect Force container mode

Example Output

JSON Format (Containers):

{
  "event": "processing_started",
  "file_count": 3,
  "logger": "maybankforme.api",
  "level": "info",
  "timestamp": "2025-10-25T20:30:45.123Z",
  "func_name": "process_statements"
}

Console Format (Development):

2025-10-25T20:30:45.123Z [info] processing_started [maybankforme.api] file_count=3

📖 Complete Logging Documentation - Detailed guide with diagrams, best practices, and troubleshooting.

Architecture

Processing Pipeline

graph LR
    A[PDF Files] --> B[Upload API]
    B --> C[PDF to Text]
    C --> D[Extract Transactions]
    D --> E[Add Year Info]
    E --> F[Sort by Date]
    F --> G[Generate CSV]
    G --> H[Return to Client]
    
    style B fill:#e3f2fd
    style C fill:#fff9c4
    style D fill:#fff9c4
    style E fill:#c8e6c9
    style F fill:#c8e6c9
    style G fill:#c8e6c9
    style H fill:#e3f2fd

System Architecture

graph TB
    subgraph "Client Layer"
        A[Web Browser / API Client]
    end
    
    subgraph "API Layer"
        B[FastAPI Application]
        B1[POST /process]
        B2[GET /health]
        B --> B1
        B --> B2
    end
    
    subgraph "Processing Layer"
        C[PDF to Text Converter]
        D[Transaction Extractor]
        E[Date Processor]
        F[CSV Generator]
    end
    
    subgraph "Logging Layer"
        G[Structlog]
        G1[JSON Renderer]
        G2[Console Renderer]
        G --> G1
        G --> G2
    end
    
    A --> B
    B1 --> C
    C --> D
    D --> E
    E --> F
    F --> A
    
    B -.logs.-> G
    C -.logs.-> G
    D -.logs.-> G
    E -.logs.-> G
    
    style B fill:#e3f2fd
    style C fill:#fff9c4
    style D fill:#fff9c4
    style E fill:#c8e6c9
    style F fill:#c8e6c9
    style G fill:#ffccbc

Data Flow

sequenceDiagram
    participant Client
    participant API
    participant PDFConverter
    participant TxtExtractor
    participant DateProcessor
    participant Logger
    
    Client->>API: POST /process (PDF files + password)
    API->>Logger: Log request received
    API->>PDFConverter: Convert PDF to text
    PDFConverter->>Logger: Log conversion progress
    PDFConverter->>TxtExtractor: Extract transactions
    TxtExtractor->>Logger: Log extraction stats
    TxtExtractor->>DateProcessor: Add year information
    DateProcessor->>Logger: Log date processing
    DateProcessor->>API: Return processed data
    API->>Logger: Log completion
    API->>Client: Return CSV file

Development

Setup

# Clone the repository
git clone https://github.com/zhrif/maybankforme.git
cd maybankforme

# Install dependencies (using uv)
uv sync --all-extras

# Run tests
uv run pytest -q

# Run API with auto-reload (dev)
uv run uvicorn maybankforme.api:app --reload --log-level debug

# Or run the packaged server entrypoint
uv run python -m maybankforme.server

Code Quality

# Linting
uv run ruff check src/ tests/

# Type checking
uv run mypy src/

# Format code
uv run black src/ tests/

# Run all checks
uv run ruff check src/ tests/ && uv run mypy src/ && uv run pytest -q

Project Structure

maybankforme/
├── src/maybankforme/
│   ├── api.py                 # FastAPI application
│   ├── main.py                # CLI entry point
│   ├── process_transaction.py # Batch processing
│   └── common/
│       ├── utils.py           # Logging utilities
│       ├── pdf_convert_txt.py # PDF conversion
│       └── txt_convert_csv.py # CSV generation
├── tests/                     # Test suite
├── docs/                      # Documentation
│   ├── LOGGING.md            # Logging guide
│   └── api_example.py        # API examples
└── Dockerfile                # Container image

Docker

Docker quick start

# Pull and run from GitHub Container Registry
docker run -p 8000:8000 ghcr.io/zhrif/maybankforme

Then access the API at http://localhost:8000/docs

Custom Configuration

# Run with debug logging
docker run -e LOG_LEVEL=DEBUG -p 8000:8000 ghcr.io/zhrif/maybankforme

# Run with console logging (for debugging)
docker run -e LOG_FORMAT=console -p 8000:8000 ghcr.io/zhrif/maybankforme

# Run on different port
docker run -e PORT=3000 -p 3000:3000 ghcr.io/zhrif/maybankforme

Build Locally

# Build the image
docker build -t maybankforme .

# Run the container
docker run -p 8000:8000 maybankforme

# View logs (JSON format by default)
docker logs <container-id>

Docker Compose

version: '3.8'
services:
  api:
    image: ghcr.io/zhrif/maybankforme
    ports:
      - "8000:8000"
    environment:
      - LOG_LEVEL=INFO
      - LOG_FORMAT=json
    restart: unless-stopped

Legacy CLI Tool

The original CLI tool is still available:

maybankforme -h
usage: maybankforme [-h] [--password PASSWORD] [--dataset_folder DATASET_FOLDER] input_folder output_file

positional arguments:
  input_folder          Folder containing pdf files
  output_file           csv file to save transactions

options:
  -h, --help            show this help message and exit
  --password PASSWORD   Password to open pdf files
  --dataset_folder DATASET_FOLDER
                        Folder containing dataset
maybankforme /dataset/pdf /dataset/Output.csv --password=<REDACTED> --dataset_folder /dataset

Contributing

Contributions are welcome. Please review the code standards and development guidelines in AGENTS.md, and see CONTRIBUTING.md for a quick checklist.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maybankforme-1.9.2.tar.gz (167.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maybankforme-1.9.2-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file maybankforme-1.9.2.tar.gz.

File metadata

  • Download URL: maybankforme-1.9.2.tar.gz
  • Upload date:
  • Size: 167.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for maybankforme-1.9.2.tar.gz
Algorithm Hash digest
SHA256 c7b0938ec295a299a639865a00fc8abbbf61dd358195cca43fcf9f788cf77c49
MD5 548647dc88c0a71737f86dd5ea8fbd61
BLAKE2b-256 34c8c60b6a3d53248d4e88e2efc20becb922a2a81cf3347c6c7c498608e2c8f7

See more details on using hashes here.

File details

Details for the file maybankforme-1.9.2-py3-none-any.whl.

File metadata

  • Download URL: maybankforme-1.9.2-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for maybankforme-1.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e242abf778672d1192239cc7204fc330f228209636958f524e5a1173bcbe47f0
MD5 517a406bbcf63ab48032d1b143bf6bd3
BLAKE2b-256 a64c5c80646ed44d4d1e15f7a3682d2406789ea0563518ab541f0f0cca25be69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page