Skip to main content

Extract bank transactions from Crédit Mutuel PDF statements

Project description

Crédit Mutuel PDF Extractor

A robust Python utility to extract transaction data from Crédit Mutuel bank statement PDFs, validate data integrity, and export to structured formats (JSON/CSV).

Features

  • Automated Extraction: Parses transaction dates, descriptions, and amounts from multiple accounts per PDF.
  • Balance Validation: Computes the sum of transactions and cross-references them with the starting and ending balances provided in the statement.
  • Strict CLI: Explicit input file list and mandatory --output flag (with .csv or .json validation).
  • French Format Support: Handles French number formatting (e.g., 1.234,56 or 1 234,56).
  • Structured Logging: Uses the Python logging module for clean, professional output and error reporting.
  • Automation: Includes a Justfile for common tasks like run and clean.
  • Account Mapping: Support for custom account labels via YAML configuration.
  • Google Sheets Export: Direct export to a Google Spreadsheet.

Installation

Ensure you have uv installed.

uv sync

Usage

Using Just (Recommended)

To process all PDFs in the data/ directory using the labels defined in config.yaml (outputs to transactions.csv):

just run

To output in JSON format:

just run json

To clean up all generated files:

just clean

Configuration

Account Mapping

You can map account numbers to custom labels by creating a config.yaml file. See config.example.yaml for a template.

account_mapping:
  21945407: "Crequi"
  21945409: "Prevost"

[!NOTE] Account numbers are matched as integers (leading zeros are ignored).

Description Mapping

You can automatically rename transactions by adding a description_mapping section. If any key is found as a substring (case-insensitive) in the transaction description, it will be replaced by the corresponding label.

description_mapping:
  "VIR SEPA FROM": "Transfer"
  "NETFLIX": "Entertainment"
  "AMAZON": "Shopping"

Google Sheets Export

To enable Google Sheets export, add a google_sheets section to your config.yaml:

google_sheets:
  spreadsheet_id: "your-spreadsheet-id"
  sheet_name: "Transactions"
  credentials_file: "credentials.json"

Service Account Setup:

  1. Create a project in Google Cloud Console.
  2. Enable both Google Sheets API and Google Drive API.
  3. Create a Service Account (APIs & Services > Credentials > Create Credentials > Service Account).
  4. Create a JSON Key for that service account and download it.
  5. Save the key as credentials.json (or any path specified in your config.yaml).
  6. Permission: Share your Google Spreadsheet with the service account email (found in the JSON) with Editor access. (No broad IAM roles are needed if shared directly).

Command Line Interface

You can explicitly specify files, the output format, and enable Google Sheets export:

uv run main.py data/*.pdf --output results.csv --config config.yaml --gsheet --include-source-file

Requirements:

  • At least one input PDF file.
  • The --output flag is mandatory and must end in .csv or .json.

Technical Details

  • Account Identification: Uses vertical Y-coordinate mapping to associate tables with the correct account number headers.
  • Data Normalization: Amounts are cleaned and converted to standard floats.
  • Validation: If Starting Balance + Σ(Transactions) != Ending Balance, the script will report a CRITICAL error and halt execution.
  • Modular Design: Utility functions are separated into utils.py for maintainability.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

credit_mutual_pdf_extractor-0.1.0.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

credit_mutual_pdf_extractor-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file credit_mutual_pdf_extractor-0.1.0.tar.gz.

File metadata

File hashes

Hashes for credit_mutual_pdf_extractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8bf6785ac783d6a7812af1336ebc6541d9975e6e43fd570c5a3ea61d74b3249d
MD5 d8a134a8c4402437d4bed881d45ccc40
BLAKE2b-256 e48ada4e0ed7837942f2eb8bae0b577ecb061117cd53bcd8a031a103e1ec2377

See more details on using hashes here.

File details

Details for the file credit_mutual_pdf_extractor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for credit_mutual_pdf_extractor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c192d6d16175b734ced7ce449b23333c06f700035ee2ffb0f539691966cefc03
MD5 3aab652ee01cdc98303ebeffd329a000
BLAKE2b-256 eef54febf07725d5f41243eb31bf16ffdcbec5a81e7ddf3612f82848f1e929fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page