Skip to main content

Extract bank transactions from Crédit Mutuel PDF statements

Project description

Crédit Mutuel PDF Extractor

PyPI version

A robust Python utility to extract transaction data from Crédit Mutuel bank statement PDFs, validate data integrity, and export to structured formats (JSON/CSV) or Google Sheets.

Features

  • Automated Extraction: Parses transaction dates, descriptions, and amounts from multiple accounts per PDF.
  • Balance Validation: Computes the sum of transactions and cross-references them with the starting and ending balances provided in the statement.
  • Strict CLI: Explicit input file list and mandatory --output flag (with .csv or .json validation).
  • French Format Support: Handles French number formatting (e.g., 1.234,56 or 1 234,56).
  • Structured Logging: Uses the Python logging module for clean, professional output and error reporting.
  • Automation: Includes a Justfile for common tasks like run and clean.
  • Account Mapping: Support for custom account labels via YAML configuration.
  • Google Sheets Export: Direct export to a Google Spreadsheet.

Installation

You can install the extractor directly from PyPI:

pip install credit_mutuel_pdf_extractor

Or using uv:

uv tool install credit_mutuel_pdf_extractor

Usage

Global Command

Once installed, you can use the cmut_process_pdf command from anywhere:

cmut_process_pdf data/*.pdf --output results.csv --config config.yaml

Using Just (Development)

If you have the source code and just installed:

To process all PDFs in the data/ directory using the labels defined in config.yaml (outputs to transactions.csv):

just run

To output in JSON format:

just run json

To clean up all generated files:

just clean

Configuration

Account Mapping

You can map account numbers to custom labels by creating a config.yaml file. See config.example.yaml for a template.

account_mapping:
  21945407: "Crequi"
  21945409: "Prevost"

[!NOTE] Account numbers are matched as integers (leading zeros are ignored).

Description Mapping

You can automatically rename transactions by adding a description_mapping section. If any key is found as a substring (case-insensitive) in the transaction description, it will be replaced by the corresponding label.

description_mapping:
  "VIR SEPA FROM": "Transfer"
  "NETFLIX": "Entertainment"
  "AMAZON": "Shopping"

Google Sheets Export

To enable Google Sheets export, add a google_sheets section to your config.yaml:

google_sheets:
  spreadsheet_id: "your-spreadsheet-id"
  sheet_name: "Transactions"
  credentials_file: "credentials.json"

Service Account Setup:

  1. Create a project in Google Cloud Console.
  2. Enable both Google Sheets API and Google Drive API.
  3. Create a Service Account (APIs & Services > Credentials > Create Credentials > Service Account).
  4. Create a JSON Key for that service account and download it.
  5. Save the key as credentials.json (or any path specified in your config.yaml).
  6. Permission: Share your Google Spreadsheet with the service account email (found in the JSON) with Editor access. (No broad IAM roles are needed if shared directly).

Command Line Interface

You can explicitly specify files, the output format, and enable Google Sheets export:

uv run credit-mutuel-extractor data/*.pdf --output results.csv --config config.yaml --gsheet --include-source-file

Requirements:

  • At least one input PDF file.
  • The --output flag is mandatory and must end in .csv or .json.

Technical Details

  • Account Identification: Uses vertical Y-coordinate mapping to associate tables with the correct account number headers.
  • Data Normalization: Amounts are cleaned and converted to standard floats.
  • Validation: If Starting Balance + Σ(Transactions) != Ending Balance, the script will report a CRITICAL error and halt execution.
  • Modular Design: Utility functions are separated into utils.py for maintainability.

Security & Publishing

Secret Leak Prevention

This project uses pre-commit and detect-secrets to prevent accidental commits of sensitive data. Before committing, the hooks will scan for potential secrets.

Publishing to PyPI

Publishing is automated via the Justfile and integrated with 1Password for security.

  1. Store your PyPI Token: Create a "Login" or "Password" item in 1Password.
  2. Add Environment Variable: Add a field named UV_PUBLISH_TOKEN containing your PyPI API token.
  3. Publish:
    just publish
    
    This uses op run to securely inject the token into the uv publish command without it ever being stored in plain text or history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

credit_mutuel_pdf_extractor-0.1.0.tar.gz (61.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

credit_mutuel_pdf_extractor-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file credit_mutuel_pdf_extractor-0.1.0.tar.gz.

File metadata

File hashes

Hashes for credit_mutuel_pdf_extractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 455b2a52e1eacec0514a37c974cf740d6f2123b4c1648cf52d1d954fa14c2522
MD5 583c1664988ae0031a06ee018e3adafa
BLAKE2b-256 53b436dad4085910f5841edef72c989d8c22d4999a8a4f7262ba0fb730d05d01

See more details on using hashes here.

File details

Details for the file credit_mutuel_pdf_extractor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for credit_mutuel_pdf_extractor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba4edaa03173678848dc9f61099fab0a74f0da41a3ebb08b0c9afbc50c2e41ed
MD5 c8431bba3005688a67cb0b324ea6f2c4
BLAKE2b-256 6b6fe372dafa6860bf4120780323db085d3abfe78bf088e22f278f5bbe64b3cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page