Skip to main content

A package for extracting JSON data from Maybank PDF account statements

Project description

maybankpdf2json

PyPI version Python versions CI License Last Commit

A small Python library to extract transactions and statement metadata from Maybank PDF account statements.

Table of Contents

Overview

This project reads encrypted or unencrypted Maybank statement PDFs and returns one clear output shape:

  • account_number
  • statement_date
  • transactions

Features

  • PDF parsing with password support through pdfplumber.
  • Stable transaction schema: date, desc, trans, bal.
  • Single API method: json() returns metadata + transactions.
  • Statement amount parsing with trailing sign notation:
    • 123.45- becomes -123.45
    • 123.45+ becomes 123.45
  • Consistent date format: dd/mm/yy.

Install

Requires Python 3.8 or newer.

pip install maybankpdf2json

Quick Start

from maybankpdf2json import MaybankPdf2Json

with open("statement.pdf", "rb") as f:
    extractor = MaybankPdf2Json(f, "your_pdf_password")
  result = extractor.json()
  print(result["account_number"])
  print(result["statement_date"])
  print(result["transactions"][0])

If your PDF is not password-protected, pass None or omit the password:

with open("statement.pdf", "rb") as f:
    extractor = MaybankPdf2Json(f)
    print(extractor.json())

API

MaybankPdf2Json(buffer, pwd)

  • json() -> dict
    • Returns:
      • account_number: statement account number when available
      • statement_date: statement date in dd/mm/yy
      • transactions: list of rows with date, desc, trans, bal

If you need pretty JSON, format it in your own project based on your preferred style/tooling.

Output Notes

  • Dates use dd/mm/yy.
  • Amounts support trailing sign notation from statements:
    • 123.45- -> -123.45
    • 123.45+ -> 123.45

Example output from json():

{
  "account_number": "162021-851156",
  "statement_date": "30/09/24",
  "transactions": [
    {
      "date": "01/09/24",
      "desc": "BEGINNING BALANCE",
      "trans": 0,
      "bal": 3285.77
    }
  ]
}

Architecture

Processing pipeline:

graph LR
  A[PDF Buffer] --> B[read]
  B --> C[get_filtered_data]
  C --> D[get_mapped_data]
  B --> E[extract_account_and_date]
  D --> F[json transactions]
  E --> G[json metadata]

See docs/ARCHITECTURE.md for internals and parser conventions.

Project Structure

maybankpdf2json/
├── maybankpdf2json/
│   ├── __init__.py
│   ├── extractor.py
│   └── utils.py
├── tests/
│   ├── test_extractor.py
│   └── test.pdf
├── docs/
│   └── ARCHITECTURE.md
├── CHANGELOG.md
├── CONTRIBUTING.md
├── pyproject.toml
└── setup.py

Development

Install project dependencies:

make install

Run tests:

make test

Alternative test command:

pytest tests/

Current tests are fixture-based and rely on tests/test.pdf.

See CONTRIBUTING.md for development workflow and docs/ARCHITECTURE.md for parser internals.

Release

See CHANGELOG.md for release history.

Automatic PyPI publishing is configured with GitHub Actions in .github/workflows/publish.yml.

One-time setup on PyPI:

  1. Open the project on PyPI.
  2. Add a Trusted Publisher for this GitHub repository.
  3. Use workflow name publish.yml.
  4. Use environment name pypi.

Release flow:

  1. Move items from [Unreleased] in CHANGELOG.md into a new version section.
  2. Update the version in pyproject.toml and setup.py.
  3. Commit and push main.
  4. Create and push a version tag such as v0.1.53.
  5. GitHub Actions builds the package and publishes it to PyPI automatically.

Example:

git tag v0.1.53
git push origin v0.1.53

Local manual release remains available for maintainers:

make release

This builds and uploads to PyPI using Twine. Run only with valid release credentials.

Contributing

Contributions are welcome.

  • Keep changes focused and small.
  • Preserve the public import: from maybankpdf2json import MaybankPdf2Json.
  • Add user-facing changes to [Unreleased] in CHANGELOG.md.
  • Run tests before opening a pull request.

See CONTRIBUTING.md for the full checklist.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maybankpdf2json-0.2.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maybankpdf2json-0.2.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file maybankpdf2json-0.2.0.tar.gz.

File metadata

  • Download URL: maybankpdf2json-0.2.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for maybankpdf2json-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7be3388e3863384a38ddd2044c6ef20b829774806465ca8e0d7189c5b81f271e
MD5 ca57c095660f382245c6aea9d5c1a0d9
BLAKE2b-256 8679010f90af4953c30fcb9f082b099ee12583ddd8be1281fe9129f3030f4dfa

See more details on using hashes here.

Provenance

The following attestation bundles were made for maybankpdf2json-0.2.0.tar.gz:

Publisher: publish.yml on nordinz7/maybankpdf2json

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file maybankpdf2json-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for maybankpdf2json-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63fda2f5e77de18fbba2e5400472c0677c21fa471cc45c5e020809a4483df319
MD5 9d5c41bfe58fa548cc007e867dd7eedd
BLAKE2b-256 7e0eb79459bd5d741346dbb0c7260d500c2d705199e32bd933b444798282257e

See more details on using hashes here.

Provenance

The following attestation bundles were made for maybankpdf2json-0.2.0-py3-none-any.whl:

Publisher: publish.yml on nordinz7/maybankpdf2json

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page