Skip to main content

Extract and normalize financial data from OHADA-compliant Excel financial statements (DSF)

Project description

OHADA Financial Extractor

Extract and normalize financial data from OHADA-compliant Excel financial statements.

Python 3.10+

License: MIT

Overview

This library automates financial data extraction from Excel files following OHADA (Organization for the Harmonization of African Business Law) accounting standards, used by 18 African countries.

What It Does

  • Extracts Balance Sheets (Bilan Paysage), Income Statements (Compte de Résultat), and Cash Flow Statements (Tableau des Flux de Trésorerie)

  • Normalizes data with Gross/Amortization/Net decomposition for fixed assets

  • Consolidates multi-year financial data

  • Exports to JSON for downstream analysis

Why It Matters

Financial institutions across OHADA zone countries spend significant time:

  • ✗ Manually retyping financial statement data

  • ✗ Restructuring Excel files line-by-line

  • ✗ Validating data integrity across years

This library eliminates those bottlenecks:

  • ✓ Automated extraction from standard Excel formats

  • ✓ Structured JSON output

  • ✓ Multi-year period aggregation

  • ✓ Data validation checks

Quick Start

Installation

pip install ohada-financial-extractor

Basic Usage

from ohada_extractor import FinancialExtractor
from ohada_extractor.formatters import OHADAJSONFormatter
import json

# Extract from Excel
extractor = FinancialExtractor()
statement = extractor.extract_from_excel('financial_statement.xlsx')

# Convert to JSON
json_output = OHADAJSONFormatter.to_json(
    statement=statement,
    indent=2
)

# Use or save
data = json.loads(json_output)
print(f"Total Assets: {data['balance_sheet']['assets'][-1]}")

Supported Statements

Balance Sheet Asset (Bilan Paysage)

  • 29 accounts (AD-BZ)
  • Tracks: Gross, Amortization, Net values
  • Assets split: Fixed assets, Current assets, Cash

Balance Sheet Liability (Bilan Paysage)

  • 28 accounts (CA-DZ)
  • Tracks: Net values
  • Liabilities split: Equity, Long-Term Debt, Current Liabilities, Short Term Debt

Income Statement (Compte de Résultat)

  • 42 accounts (TA-XI)
  • Revenue, expenses, tax, net income
  • Tracks: Operating, financial, and extraordinary results

Cash Flow Statement (Tableau des Flux de Trésorerie)

  • 25 accounts (ZA-ZH)
  • Operating, investing, financing activities
  • Beginning and ending cash positions

Metadata Extraction (NEW)

The extractor now automatically parses company metadata from DSF notes and headers, including:

  • Legal form
  • Fiscal regime
  • Country (headquater)
  • Year of creation
  • Currency
  • etc.

Metadata is available immediately after extraction:

from ohada_extractor.core.extractor import FinancialExtractor
from ohada_extractor.core.metadata_extractor import CompanyMetadataExtractor

# Initialize extractor
extractor = FinancialExtractor()
    
# Extract data
statement = extractor.extract_from_excel("financial_statement.xlsx")

print("\n--- Building company metadata from statement ---")
statement.metadata = CompanyMetadataExtractor.extract_from_statement(statement)

metadata = statement.metadata

print(metadata.currency)
print(metadata.legal_form)
print(metadata.regime_fiscal)

Metadata is fully JSON‑serializable and can be exported:

metadata_dict = metadata.to_dict()

This enables automated KYC, and regulatory reporting workflow.

Notes Extraction (NEW)

The extractor now includes 23 OHADA Notes (Annexes) engine, parsing structured and unstructured notes such as:

  • Fiche R2 — Company identity
  • Note 31 — Repartition du résultat et autres elements sur les dernières années
  • Accounting policies
  • Commitments & guarantees
  • Tax regime
  • Share capital information
  • Workforce details

Notes are extracted automatically:

statement = extractor.extract_from_excel("financial_statement.xlsx")

# Retrieve by key
note = statement.get_note("note3a")

# Retrieve by human-readable name
note = statement.get_note_by_name("IMMOBILISATION BRUTE")

Each note includes:

  • name
  • raw_value
  • preprocess_value

Notes can be exported to JSON for auditing or BI tools.

Features

  • ✅ Multi-file period aggregation (2-5 years)
  • ✅ Automatic data validation
  • ✅ JSON-serializable output
  • ✅ Account code standardization (OHADA)
  • ✅ Gross/Amort/Net decomposition for assets
  • ✅ Support for 18 OHADA zone countries

📊 Visualization Layer (NEW)

The library now includes a visualization module for OHADA financial statements.

Static 4×2 Overview Dashboard

Grouped, stacked, waterfall, and cashflow charts in a single figure:

from ohada_extractor.visualization import plot_overview_dashboard_clean
plot_overview_dashboard_clean(statement)

Dynamic Tabbed Dashboard

Interactive dashboard with tabs for:

  • Assets
  • Liabilities
  • Income
  • Cashflow
from ohada_extractor.visualization import plot_ohada_tabs_dynamic
plot_ohada_tabs_dynamic(statement)

Streamlit Integration

A ready‑to‑use Streamlit app is included:

streamlit run examples/example_visualization_streamlit.py

This enables instant deployment of dashboards for analysts, auditors, and credit officers.

Documentation

  • OHADA Standards — Account codes and structures for 18 countries
  • Output Schema — JSON output format specification
  • Examples — Sample extraction workflows

Example Output

{
  "extraction_metadata": {
    "periods": ["2023-12-31", "2024-12-31"],
    "statement_types": ["balance_sheet_assets", "income_statement", "cashflow", "notes"]
  },
  "balance_sheet": {
    "assets": [
      {
        "reference": "AD",
        "label": "Immobilisations incorporelles",
        "gross": 100000.0,
        "amort": 50000.0,
        "net": 50000.0,
        "gross1": 110000.0,
        "amort1": 55000.0,
        "net1": 55000.0
      }
    ]
  }
}

📁 Examples (New)

The repository now includes a full examples/ directory:

examples/
    example_metadata_extraction.py
    example_notes_usage.py
    example_visualization_streamlit.py

Metadata Example

Shows how to extract and export company metadata.

Notes Example

Demonstrates how to retrieve raw and processed OHADA notes.

Visualization Example

Runs static dashboards, dynamic dashboards, and a Streamlit UI.

These examples make onboarding fast for banks, auditors, fintechs, and researchers.

TESTING

python -m pytest tests/

# Use or save
data = json.loads(json_output)
print(f"Total Assets: {data['balance_sheet']['assets'][-1]}")

Use Cases

  1. Credit Processing Accelerate loan analysis for SMEs by automating financial statement data entry.

  2. Portfolio Management Consolidate financials from multiple companies for real-time portfolio analytics.

  3. Regulatory Reporting Standardized extraction for compliance with OHADA zone banking regulations.

  4. Financial Analytics Feed cleaned, structured data into analytics and forecasting models.

OHADA Zone Coverage

Supported in: Benin, Burkina Faso, Cameroon, Central African Republic, Chad, Comoros, Congo (DR), Congo, Côte d'Ivoire, Equatorial Guinea, Gabon, Guinea, Guinea-Bissau, Mali, Niger, Senegal, Togo.

Contribution

Contributions welcome! Areas for expansion:

  • PDF extraction support
  • Additional statement types
  • Data validation rules repository
  • Performance optimizations

License MIT License — see LICENSE

Citation

If you use this library in your research or production system, please cite:

Kamguia Wabo, L. B., & Ndayou, R. V. (2026). OHADA Financial Extractor. B.K. Research & Analytics. Retrieved from https://github.com/bomyrk/ohada-financial-extractor

@software{ohada_extractor_2026,
  title={OHADA Financial Extractor},
  author={Kamguia Wabo, L. Bomyr},
  year={2026},
  url={https://github.com/bomyrk/ohada-financial-extractor}
}

Author

Kamguia Wabo, L. B.
B.K. Research & Analytics
bomyr.kamguia@bkresearchandanalytics.com


Democratizing financial data extraction for African financial institutions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ohada_financial_extractor-0.1.0.tar.gz (511.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ohada_financial_extractor-0.1.0-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file ohada_financial_extractor-0.1.0.tar.gz.

File metadata

File hashes

Hashes for ohada_financial_extractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d2e93c378cd42eb9d46ec7664836e31db43e0799818cb8000a75d1ab75ab2637
MD5 52ff10e1c91c0ed1aac5290c0e82f94d
BLAKE2b-256 969bc1b25f2bd5eeabd051179c8deb2f351a09733469aadbe90239180b7d55da

See more details on using hashes here.

File details

Details for the file ohada_financial_extractor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ohada_financial_extractor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ecac898720ba460162eac49cc6f7fa30f849cc318c22f2a91e8dfd1e90d3ab28
MD5 3d6ea5f75306a0cb3fd7b69af8e3eaac
BLAKE2b-256 88ba406a950d19f20527207dcd5eaf376c2a5ccb0c714a1bbebeee6ff2e9c7d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page