A package for extracting JSON data from Maybank PDF account statements
Project description
maybankpdf2json
A small Python library to extract transactions and statement metadata from Maybank PDF account statements.
Table of Contents
- Overview
- Features
- Install
- Quick Start
- API
- Output Notes
- Architecture
- Project Structure
- Development
- Release
- Contributing
- License
Overview
This project reads encrypted or unencrypted Maybank statement PDFs and returns one clear output shape:
account_numberstatement_datetransactions
Features
- PDF parsing with password support through
pdfplumber. - Stable transaction schema:
date,desc,trans,bal. - Single API method:
json()returns metadata + transactions. - Statement amount parsing with trailing sign notation:
123.45-becomes-123.45123.45+becomes123.45
- Consistent date format:
dd/mm/yy.
Install
Requires Python 3.8 or newer.
pip install maybankpdf2json
Quick Start
from maybankpdf2json import MaybankPdf2Json
with open("statement.pdf", "rb") as f:
extractor = MaybankPdf2Json(f, "your_pdf_password")
result = extractor.json()
print(result["account_number"])
print(result["statement_date"])
print(result["transactions"][0])
If your PDF is not password-protected, pass None or omit the password:
with open("statement.pdf", "rb") as f:
extractor = MaybankPdf2Json(f)
print(extractor.json())
API
MaybankPdf2Json(buffer, pwd)
json()->dict- Returns:
account_number: statement account number when availablestatement_date: statement date indd/mm/yytransactions: list of rows withdate,desc,trans,bal
- Returns:
If you need pretty JSON, format it in your own project based on your preferred style/tooling.
Output Notes
- Dates use
dd/mm/yy. - Amounts support trailing sign notation from statements:
123.45-->-123.45123.45+->123.45
Example output from json():
{
"account_number": "162021-851156",
"statement_date": "30/09/24",
"transactions": [
{
"date": "01/09/24",
"desc": "BEGINNING BALANCE",
"trans": 0,
"bal": 3285.77
}
]
}
Architecture
Processing pipeline:
graph LR
A[PDF Buffer] --> B[read]
B --> C[get_filtered_data]
C --> D[get_mapped_data]
B --> E[extract_account_and_date]
D --> F[json transactions]
E --> G[json metadata]
See docs/ARCHITECTURE.md for internals and parser conventions.
Project Structure
maybankpdf2json/
├── maybankpdf2json/
│ ├── __init__.py
│ ├── extractor.py
│ └── utils.py
├── tests/
│ ├── test_extractor.py
│ └── test.pdf
├── docs/
│ └── ARCHITECTURE.md
├── CHANGELOG.md
├── CONTRIBUTING.md
├── pyproject.toml
└── setup.py
Development
Install project dependencies:
make install
Run tests:
make test
Alternative test command:
pytest tests/
Current tests are fixture-based and rely on tests/test.pdf.
See CONTRIBUTING.md for development workflow and docs/ARCHITECTURE.md for parser internals.
Release
See CHANGELOG.md for release history.
Automatic PyPI publishing is configured with GitHub Actions in .github/workflows/publish.yml.
One-time setup on PyPI:
- Open the project on PyPI.
- Add a Trusted Publisher for this GitHub repository.
- Use workflow name
publish.yml. - Use environment name
pypi.
Release flow:
- Move items from
[Unreleased]inCHANGELOG.mdinto a new version section. - Update the version in
pyproject.tomlandsetup.py. - Commit and push
main. - Create and push a version tag such as
v0.1.53. - GitHub Actions builds the package and publishes it to PyPI automatically.
Example:
git tag v0.1.53
git push origin v0.1.53
Local manual release remains available for maintainers:
make release
This builds and uploads to PyPI using Twine. Run only with valid release credentials.
Contributing
Contributions are welcome.
- Keep changes focused and small.
- Preserve the public import:
from maybankpdf2json import MaybankPdf2Json. - Add user-facing changes to
[Unreleased]inCHANGELOG.md. - Run tests before opening a pull request.
See CONTRIBUTING.md for the full checklist.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maybankpdf2json-0.2.0.tar.gz.
File metadata
- Download URL: maybankpdf2json-0.2.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7be3388e3863384a38ddd2044c6ef20b829774806465ca8e0d7189c5b81f271e
|
|
| MD5 |
ca57c095660f382245c6aea9d5c1a0d9
|
|
| BLAKE2b-256 |
8679010f90af4953c30fcb9f082b099ee12583ddd8be1281fe9129f3030f4dfa
|
Provenance
The following attestation bundles were made for maybankpdf2json-0.2.0.tar.gz:
Publisher:
publish.yml on nordinz7/maybankpdf2json
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
maybankpdf2json-0.2.0.tar.gz -
Subject digest:
7be3388e3863384a38ddd2044c6ef20b829774806465ca8e0d7189c5b81f271e - Sigstore transparency entry: 1232467602
- Sigstore integration time:
-
Permalink:
nordinz7/maybankpdf2json@6cd315a831f6b89c5e8ef0bd656d18c9cc92d597 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/nordinz7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6cd315a831f6b89c5e8ef0bd656d18c9cc92d597 -
Trigger Event:
push
-
Statement type:
File details
Details for the file maybankpdf2json-0.2.0-py3-none-any.whl.
File metadata
- Download URL: maybankpdf2json-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63fda2f5e77de18fbba2e5400472c0677c21fa471cc45c5e020809a4483df319
|
|
| MD5 |
9d5c41bfe58fa548cc007e867dd7eedd
|
|
| BLAKE2b-256 |
7e0eb79459bd5d741346dbb0c7260d500c2d705199e32bd933b444798282257e
|
Provenance
The following attestation bundles were made for maybankpdf2json-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on nordinz7/maybankpdf2json
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
maybankpdf2json-0.2.0-py3-none-any.whl -
Subject digest:
63fda2f5e77de18fbba2e5400472c0677c21fa471cc45c5e020809a4483df319 - Sigstore transparency entry: 1232467659
- Sigstore integration time:
-
Permalink:
nordinz7/maybankpdf2json@6cd315a831f6b89c5e8ef0bd656d18c9cc92d597 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/nordinz7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6cd315a831f6b89c5e8ef0bd656d18c9cc92d597 -
Trigger Event:
push
-
Statement type: