This projects converts maybank credit card statement pdf files to a single csv file that allows to be ingestable in other workflow.
Project description
maybankforme
This project converts Maybank credit card statement PDF files to CSV format via a FastAPI web service and an optional CLI.
Table of Contents
Overview
This is a FastAPI web service that accepts encrypted credit card statement PDF files, extracts the text, looks for specific transaction pattern lines, and returns them as a CSV file.
FastAPI Service
Running Locally
Choose one of these ways:
- Direct server entrypoint (recommended)
python -m maybankforme.server
- Uvicorn import
uvicorn maybankforme.api:app --host 0.0.0.0 --port 8000
Then access locally:
- API documentation: http://localhost:8000/docs
- Root endpoint: http://localhost:8000/
- Health check: http://localhost:8000/health
API Endpoints
GET /- API informationGET /health- Health checkGET /docs- Interactive API documentation (Swagger UI)POST /process- Upload PDF files and get CSV response
Using the API
Upload one or more PDF files to process:
# Using curl
curl -X POST "http://localhost:8000/process" \
-F "files=@statement1.pdf" \
-F "files=@statement2.pdf" \
-F "password=your-password" \
-o transactions.csv
# Using Python requests
import requests
files = [
('files', open('statement1.pdf', 'rb')),
('files', open('statement2.pdf', 'rb'))
]
data = {'password': 'your-password'}
response = requests.post('http://localhost:8000/process', files=files, data=data)
with open('transactions.csv', 'wb') as f:
f.write(response.content)
See docs/api_example.py for more examples.
Features
- ✅ Upload single or multiple PDF files
- ✅ Password-protected PDF support
- ✅ Automatic date processing (handles year boundaries)
- ✅ Returns sorted CSV with all transactions
- ✅ File size validation (10MB max per file)
- ✅ Comprehensive error handling
- ✅ Health check endpoint for monitoring
- ✅ Structured JSON logging for containers
- ✅ Runtime configurable log levels
Logging
This application uses structlog for structured logging with automatic JSON formatting in containers.
Logging quick start
Development Mode (Human-readable console output):
export LOG_LEVEL=INFO
uvicorn maybankforme.api:app --reload
Production/Container Mode (JSON output):
export LOG_FORMAT=json
export LOG_LEVEL=INFO
uvicorn maybankforme.api:app
Configuration
Control logging behavior with environment variables:
| Variable | Values | Default | Description |
|---|---|---|---|
LOG_LEVEL |
DEBUG, INFO, WARNING, ERROR | INFO | Log verbosity level |
LOG_FORMAT |
json, console | Auto-detect | Output format |
IN_CONTAINER |
true, false | Auto-detect | Force container mode |
Example Output
JSON Format (Containers):
{
"event": "processing_started",
"file_count": 3,
"logger": "maybankforme.api",
"level": "info",
"timestamp": "2025-10-25T20:30:45.123Z",
"func_name": "process_statements"
}
Console Format (Development):
2025-10-25T20:30:45.123Z [info] processing_started [maybankforme.api] file_count=3
📖 Complete Logging Documentation - Detailed guide with diagrams, best practices, and troubleshooting.
Architecture
Processing Pipeline
graph LR
A[PDF Files] --> B[Upload API]
B --> C[PDF to Text]
C --> D[Extract Transactions]
D --> E[Add Year Info]
E --> F[Sort by Date]
F --> G[Generate CSV]
G --> H[Return to Client]
style B fill:#e3f2fd
style C fill:#fff9c4
style D fill:#fff9c4
style E fill:#c8e6c9
style F fill:#c8e6c9
style G fill:#c8e6c9
style H fill:#e3f2fd
System Architecture
graph TB
subgraph "Client Layer"
A[Web Browser / API Client]
end
subgraph "API Layer"
B[FastAPI Application]
B1[POST /process]
B2[GET /health]
B --> B1
B --> B2
end
subgraph "Processing Layer"
C[PDF to Text Converter]
D[Transaction Extractor]
E[Date Processor]
F[CSV Generator]
end
subgraph "Logging Layer"
G[Structlog]
G1[JSON Renderer]
G2[Console Renderer]
G --> G1
G --> G2
end
A --> B
B1 --> C
C --> D
D --> E
E --> F
F --> A
B -.logs.-> G
C -.logs.-> G
D -.logs.-> G
E -.logs.-> G
style B fill:#e3f2fd
style C fill:#fff9c4
style D fill:#fff9c4
style E fill:#c8e6c9
style F fill:#c8e6c9
style G fill:#ffccbc
Data Flow
sequenceDiagram
participant Client
participant API
participant PDFConverter
participant TxtExtractor
participant DateProcessor
participant Logger
Client->>API: POST /process (PDF files + password)
API->>Logger: Log request received
API->>PDFConverter: Convert PDF to text
PDFConverter->>Logger: Log conversion progress
PDFConverter->>TxtExtractor: Extract transactions
TxtExtractor->>Logger: Log extraction stats
TxtExtractor->>DateProcessor: Add year information
DateProcessor->>Logger: Log date processing
DateProcessor->>API: Return processed data
API->>Logger: Log completion
API->>Client: Return CSV file
Development
Setup
# Clone the repository
git clone https://github.com/zhrif/maybankforme.git
cd maybankforme
# Install dependencies (using uv)
uv sync --all-extras
# Run tests
uv run pytest -q
# Run API with auto-reload (dev)
uv run uvicorn maybankforme.api:app --reload --log-level debug
# Or run the packaged server entrypoint
uv run python -m maybankforme.server
Code Quality
# Linting
uv run ruff check src/ tests/
# Type checking
uv run mypy src/
# Format code
uv run black src/ tests/
# Run all checks
uv run ruff check src/ tests/ && uv run mypy src/ && uv run pytest -q
Project Structure
maybankforme/
├── src/maybankforme/
│ ├── api.py # FastAPI application
│ ├── main.py # CLI entry point
│ ├── process_transaction.py # Batch processing
│ └── common/
│ ├── utils.py # Logging utilities
│ ├── pdf_convert_txt.py # PDF conversion
│ └── txt_convert_csv.py # CSV generation
├── tests/ # Test suite
├── docs/ # Documentation
│ ├── LOGGING.md # Logging guide
│ └── api_example.py # API examples
└── Dockerfile # Container image
Docker
Docker quick start
# Pull and run from GitHub Container Registry
docker run -p 8000:8000 ghcr.io/zhrif/maybankforme
Then access the API at http://localhost:8000/docs
Custom Configuration
# Run with debug logging
docker run -e LOG_LEVEL=DEBUG -p 8000:8000 ghcr.io/zhrif/maybankforme
# Run with console logging (for debugging)
docker run -e LOG_FORMAT=console -p 8000:8000 ghcr.io/zhrif/maybankforme
# Run on different port
docker run -e PORT=3000 -p 3000:3000 ghcr.io/zhrif/maybankforme
Build Locally
# Build the image
docker build -t maybankforme .
# Run the container
docker run -p 8000:8000 maybankforme
# View logs (JSON format by default)
docker logs <container-id>
Docker Compose
version: '3.8'
services:
api:
image: ghcr.io/zhrif/maybankforme
ports:
- "8000:8000"
environment:
- LOG_LEVEL=INFO
- LOG_FORMAT=json
restart: unless-stopped
Legacy CLI Tool
The original CLI tool is still available:
maybankforme -h
usage: maybankforme [-h] [--password PASSWORD] [--dataset_folder DATASET_FOLDER] input_folder output_file
positional arguments:
input_folder Folder containing pdf files
output_file csv file to save transactions
options:
-h, --help show this help message and exit
--password PASSWORD Password to open pdf files
--dataset_folder DATASET_FOLDER
Folder containing dataset
maybankforme /dataset/pdf /dataset/Output.csv --password=<REDACTED> --dataset_folder /dataset
Contributing
Contributions are welcome. Please review the code standards and development guidelines in AGENTS.md, and see CONTRIBUTING.md for a quick checklist.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maybankforme-1.9.2.tar.gz.
File metadata
- Download URL: maybankforme-1.9.2.tar.gz
- Upload date:
- Size: 167.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7b0938ec295a299a639865a00fc8abbbf61dd358195cca43fcf9f788cf77c49
|
|
| MD5 |
548647dc88c0a71737f86dd5ea8fbd61
|
|
| BLAKE2b-256 |
34c8c60b6a3d53248d4e88e2efc20becb922a2a81cf3347c6c7c498608e2c8f7
|
File details
Details for the file maybankforme-1.9.2-py3-none-any.whl.
File metadata
- Download URL: maybankforme-1.9.2-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e242abf778672d1192239cc7204fc330f228209636958f524e5a1173bcbe47f0
|
|
| MD5 |
517a406bbcf63ab48032d1b143bf6bd3
|
|
| BLAKE2b-256 |
a64c5c80646ed44d4d1e15f7a3682d2406789ea0563518ab541f0f0cca25be69
|