Skip to main content

Extract and normalize financial data from OHADA-compliant Excel financial statements (DSF)

Project description

OHADA Financial Extractor

Extract and normalize financial data from OHADA-compliant Excel financial statements.

PyPI version

Python 3.10+

CI

License: MIT

Overview

This library automates financial data extraction from Excel files following OHADA (Organization for the Harmonization of African Business Law) accounting standards, used by 18 African countries.

What It Does

  • Extracts Balance Sheets (Bilan Paysage), Income Statements (Compte de Résultat), and Cash Flow Statements (Tableau des Flux de Trésorerie)

  • Normalizes data with Gross/Amortization/Net decomposition for fixed assets

  • Consolidates multi-year financial data

  • Exports to JSON for downstream analysis

Why It Matters

Financial institutions across OHADA zone countries spend significant time:

  • ✗ Manually retyping financial statement data

  • ✗ Restructuring Excel files line-by-line

  • ✗ Validating data integrity across years

This library eliminates those bottlenecks:

  • ✓ Automated extraction from standard Excel formats

  • ✓ Structured JSON output

  • ✓ Multi-year period aggregation

  • ✓ Data validation checks

Quick Start

Installation

You can install the OHADA Financial Extractor directly from PyPI:

pip install ohada-financial-extractor

Basic Usage

from ohada_extractor import FinancialExtractor
from ohada_extractor.formatters import OHADAJSONFormatter
import json

# Extract from Excel
extractor = FinancialExtractor()
statement = extractor.extract_from_excel('financial_statement.xlsx')

# Convert to JSON
json_output = OHADAJSONFormatter.to_json(
    statement=statement,
    indent=2
)

# Use or save
data = json.loads(json_output)
print(f"Total Assets: {data['balance_sheet']['assets'][-1]}")

Supported Statements

Balance Sheet Asset (Bilan Paysage)

  • 29 accounts (AD-BZ)
  • Tracks: Gross, Amortization, Net values
  • Assets split: Fixed assets, Current assets, Cash

Balance Sheet Liability (Bilan Paysage)

  • 28 accounts (CA-DZ)
  • Tracks: Net values
  • Liabilities split: Equity, Long-Term Debt, Current Liabilities, Short Term Debt

Income Statement (Compte de Résultat)

  • 42 accounts (TA-XI)
  • Revenue, expenses, tax, net income
  • Tracks: Operating, financial, and extraordinary results

Cash Flow Statement (Tableau des Flux de Trésorerie)

  • 25 accounts (ZA-ZH)
  • Operating, investing, financing activities
  • Beginning and ending cash positions

Metadata Extraction (NEW)

The extractor now automatically parses company metadata from DSF notes and headers, including:

  • Legal form
  • Fiscal regime
  • Country (headquater)
  • Year of creation
  • Currency
  • etc.

Metadata is available immediately after extraction:

from ohada_extractor.core.extractor import FinancialExtractor
from ohada_extractor.core.metadata_extractor import CompanyMetadataExtractor

# Initialize extractor
extractor = FinancialExtractor()
    
# Extract data
statement = extractor.extract_from_excel("financial_statement.xlsx")

print("\n--- Building company metadata from statement ---")
statement.metadata = CompanyMetadataExtractor.extract_from_statement(statement)

metadata = statement.metadata

print(metadata.currency)
print(metadata.legal_form)
print(metadata.regime_fiscal)

Metadata is fully JSON‑serializable and can be exported:

metadata_dict = metadata.to_dict()

This enables automated KYC, and regulatory reporting workflow.

Notes Extraction (NEW)

The extractor now includes 23 OHADA Notes (Annexes) engine, parsing structured and unstructured notes such as:

  • Fiche R2 — Company identity
  • Note 31 — Repartition du résultat et autres elements sur les dernières années
  • Accounting policies
  • Commitments & guarantees
  • Tax regime
  • Share capital information
  • Workforce details

Notes are extracted automatically:

statement = extractor.extract_from_excel("financial_statement.xlsx")

# Retrieve by key
note = statement.get_note("note3a")

# Retrieve by human-readable name
note = statement.get_note_by_name("IMMOBILISATION BRUTE")

Each note includes:

  • name
  • raw_value
  • preprocess_value

Notes can be exported to JSON for auditing or BI tools.

Features

  • ✅ Multi-file period aggregation (2-5 years)
  • ✅ Automatic data validation
  • ✅ JSON-serializable output
  • ✅ Account code standardization (OHADA)
  • ✅ Gross/Amort/Net decomposition for assets
  • ✅ Support for 18 OHADA zone countries

📊 Visualization Layer (NEW)

The library now includes a visualization module for OHADA financial statements.

Static 4×2 Overview Dashboard

Grouped, stacked, waterfall, and cashflow charts in a single figure:

from ohada_extractor.visualization import plot_overview_dashboard_clean
plot_overview_dashboard_clean(statement)

Dynamic Tabbed Dashboard

Interactive dashboard with tabs for:

  • Assets
  • Liabilities
  • Income
  • Cashflow
from ohada_extractor.visualization import plot_ohada_tabs_dynamic
plot_ohada_tabs_dynamic(statement)

Streamlit Integration

A ready‑to‑use Streamlit app is included:

streamlit run examples/example_visualization_streamlit.py

This enables instant deployment of dashboards for analysts, auditors, and credit officers.

Documentation

  • OHADA Standards — Account codes and structures for 18 countries
  • Output Schema — JSON output format specification
  • Examples — Sample extraction workflows

Example Output

{
  "extraction_metadata": {
    "periods": ["2023-12-31", "2024-12-31"],
    "statement_types": ["balance_sheet_assets", "income_statement", "cashflow", "notes"]
  },
  "balance_sheet": {
    "assets": [
      {
        "reference": "AD",
        "label": "Immobilisations incorporelles",
        "gross": 100000.0,
        "amort": 50000.0,
        "net": 50000.0,
        "gross1": 110000.0,
        "amort1": 55000.0,
        "net1": 55000.0
      }
    ]
  }
}

📁 Examples (New)

The repository now includes a full examples/ directory:

examples/
    example_metadata_extraction.py
    example_notes_usage.py
    example_visualization_streamlit.py

Metadata Example

Shows how to extract and export company metadata.

Notes Example

Demonstrates how to retrieve raw and processed OHADA notes.

Visualization Example

Runs static dashboards, dynamic dashboards, and a Streamlit UI.

These examples make onboarding fast for banks, auditors, fintechs, and researchers.

TESTING

python -m pytest tests/

# Use or save
data = json.loads(json_output)
print(f"Total Assets: {data['balance_sheet']['assets'][-1]}")

Use Cases

  1. Credit Processing Accelerate loan analysis for SMEs by automating financial statement data entry.

  2. Portfolio Management Consolidate financials from multiple companies for real-time portfolio analytics.

  3. Regulatory Reporting Standardized extraction for compliance with OHADA zone banking regulations.

  4. Financial Analytics Feed cleaned, structured data into analytics and forecasting models.

OHADA Zone Coverage

Supported in: Benin, Burkina Faso, Cameroon, Central African Republic, Chad, Comoros, Congo (DR), Congo, Côte d'Ivoire, Equatorial Guinea, Gabon, Guinea, Guinea-Bissau, Mali, Niger, Senegal, Togo.

Contribution

Contributions welcome! Areas for expansion:

  • PDF extraction support
  • Additional statement types
  • Data validation rules repository
  • Performance optimizations

License MIT License — see LICENSE

Citation

If you use this library in your research or production system, please cite:

Kamguia Wabo, L. B., & Ndayou, R. V. (2026). OHADA Financial Extractor. B.K. Research & Analytics. Retrieved from https://github.com/bomyrk/ohada-financial-extractor

@software{ohada_extractor_2026,
  title={OHADA Financial Extractor},
  author={Kamguia Wabo, L. Bomyr},
  year={2026},
  url={https://github.com/bomyrk/ohada-financial-extractor}
}

Author

Kamguia Wabo, L. B.
B.K. Research & Analytics
bomyr.kamguia@bkresearchandanalytics.com


Democratizing financial data extraction for African financial institutions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ohada_financial_extractor-0.1.2.tar.gz (518.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ohada_financial_extractor-0.1.2-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file ohada_financial_extractor-0.1.2.tar.gz.

File metadata

File hashes

Hashes for ohada_financial_extractor-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8ed1b341bed9ff900ffd9fdef09357aaf10630fe82b6358ed47a3599107232c0
MD5 76b712fea0961c747f67c274103093db
BLAKE2b-256 b5933a6e9db71b5714c5f4164f433782fb32300ea1cd037ae1e3e3518aa48f52

See more details on using hashes here.

File details

Details for the file ohada_financial_extractor-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ohada_financial_extractor-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 05c79d7de38529009bbfa97e5abfc655e743e15a967e19333a4b28a70bcc8d2f
MD5 321cca9dfb9afd0afc7fc2269f67770b
BLAKE2b-256 b44042047f2d9f9452596ece61bff90782822578680c3a0cdd2d8ae4a14703f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page