Skip to main content

SAP <-> Salesforce Account Data Reconciliation Utility

Project description

SAP ↔ Salesforce Account Data Reconciliation Utility

Reconcile SAP and Salesforce Account master data at bulk scale (300K–400K records). Produces a 10-tab Excel workbook and an HTML dashboard with KPIs, field-level diffs, fuzzy match candidates, and a prioritised action plan.

Quick Start

# 1. Install package in editable mode (library + CLI)
pip install -e .

# 2. Place your CSV files in input/
#    input/sap_accounts.csv
#    input/sf_accounts.csv

# 3. Run using installed CLI command
reconcile-accounts --sap input/sap_accounts.csv --sf input/sf_accounts.csv

# 4. Or run using config input paths (config/rules.yaml -> input.sap/input.sf)
reconcile-accounts

# Output written to output/

Legacy script invocation still works:

python run_reconciliation.py

Options

--sap         Path to SAP accounts CSV (optional if config input.sap is set)
--sf          Path to Salesforce accounts CSV (optional if config input.sf is set)
--config      Path to rules YAML (default: ./config/rules.yaml, then packaged default)
--output-dir  Output directory (default: from config)
--formats     excel html (default: both)
--dry-run     Validate config + headers only; no report written
--no-fuzzy    Skip fuzzy matching (faster for large files)
--verbose     Verbose logging

Path resolution precedence:

  • If --sap / --sf are passed, CLI values are used.
  • If not passed, values are resolved from config/rules.yaml under input.sap and input.sf.
  • If neither CLI nor config provides paths, the run exits with an input path error.

Configuration

Edit config/rules.yaml to change:

  • Default input files via input.sap and input.sf (directory + file_name)
  • Join key columns (SAP ↔ SF linking fields)
  • Fallback-key matching toggle via join.fallback.enabled (default: false = primary-key-only matching)
  • Field comparison rules, severity levels, and normalize modes
  • Deduplication strategy (keep_first / keep_last / flag_all)
  • Fuzzy match threshold and fields
  • Output formats and directory
  • Output report location/name via output.report.directory + output.report.file_name

Config Reference (Input + Join)

input:
	sap:
		directory: "input"
		file_name: "sap_accounts.csv"
	sf:
		directory: "input"
		file_name: "sf_accounts.csv"

join:
	primary:
		sap_col: "SAP_Unique_ID"
		sf_col:  "BP_PowerCerv_Account_Id__c"
	fallback:
		enabled: false
		sap_col: "SAP_Unique_ID"
		sf_col:  "WC_SAP_Identification__c"

output:
	formats: ["excel", "html"]
	report:
		directory: "output"
		file_name: "reconciliation_report"

Notes:

  • Set join.fallback.enabled: false for strict primary-key-only matching (default).
  • Set join.fallback.enabled: true only when you explicitly want fallback-key matching.

Report Tabs

Tab Content
Summary KPI counts, match rate, exception rate
Exact_Matches Records found in both systems
Field_Mismatches Field-level diffs (CRITICAL / HIGH / INFO)
SAP_Only SAP records missing from Salesforce
SF_Only Salesforce records missing from SAP
SAP_Duplicates Duplicate SAP rows before dedup
SF_Duplicates Duplicate SF rows before dedup
Fuzzy_Match_Candidates Likely-same accounts not linked by ID
Data_Quality_Issues Null IDs, bad formats, validation failures
Action_Plan P1–P4 prioritised remediation table

Run Tests

pip install pytest
python -m pytest tests/ -v

Distribution (Business Rollout)

# Build wheel + source distribution
python -m build

# Install locally from wheel
pip install dist/phani_data_recon-1.0.0-py3-none-any.whl

If reconcile-accounts is not on PATH, run:

python -m phani_data_recon.cli --dry-run

Project Structure

reconciliation_project/
├── input/           ← Place source CSVs here
├── config/          ← rules.yaml + schema
├── src/             ← All Python modules
├── templates/       ← Jinja2 HTML template
├── tests/           ← pytest test suite
├── output/          ← Reports generated here
└── run_reconciliation.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phani_data_recon-1.0.1.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phani_data_recon-1.0.1-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file phani_data_recon-1.0.1.tar.gz.

File metadata

  • Download URL: phani_data_recon-1.0.1.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for phani_data_recon-1.0.1.tar.gz
Algorithm Hash digest
SHA256 680542a34ba6fdacfe3bbaee7f7e5f2a4f039463b496343453e86c883132ed12
MD5 dc6ff47162a000693ab7472f855c4b84
BLAKE2b-256 56262030e00885e5828bb0133b75a6d534a25df50d2ce0e5e83a721bd1648efd

See more details on using hashes here.

File details

Details for the file phani_data_recon-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for phani_data_recon-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 06c04ac3095369be5217a6de2ab4106480693416bea760372acc9dfa887dabc7
MD5 e047504274e5e1e3d0c1b1b893d1fb8f
BLAKE2b-256 fc239123787ab243bd206b6e6b592c8054ad7fc42fa965e6277fdb3a823c891b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page