Skip to main content

A Python framework for hierarchical B2B sales quota cascading and pipeline reconciliation.

Project description

B2B Revenue Forecasting (b2b_revenue_forecasting)

PyPI version Tests License: MIT Python 3.8+

An open-source Python framework designed mathematically for Enterprise RevOps and Data Strategy teams.

Unlike traditional bottom-up time-series libraries (which are strictly built for B2C retail/inventory forecasting and rely on mathematical averages), this package is explicitly architected to handle the realities of B2B enterprise sales: Hierarchical Quotas, Managerial Cascading, Pipeline Health Analysis, and "Sandbagging" Biases.


🚀 Features

Module Purpose
SalesHierarchy Build flexible org charts as DAGs from flat CRM data — supports 3-level startups to 10-level enterprises
QuotaCascader Distribute macro-targets top-down using rolling N-quarter capacity models with configurable managerial hedges
MetricSpec Declare which historical metrics (NetNewACV, CloudSeats, DC seats, LTM expansion, …) drive cascading, in what direction (proportional or inverse), and at what weight — with auto-suggested weights from correlation analysis
CommitReconciler Detect sandbagging and "happy ears" bias via historical Bias Quotients, then auto-correct forecasts
PipelineAdjuster Diagnose pipeline health with per-region thresholds and redistribute IC quotas using zero-sum logic

What's New in v0.4.0

  • Gate metrics — hard kill-switches. cascade_quota(..., gate_metrics=[...]) excludes any node whose rolled-up gate value is at or below a threshold from the cascade entirely (quota = 0), redistributing its share among non-gated siblings. Designed for white-space planning: e.g., gating "migration NetNewACV" on Unmigrated_Seats zeros out territories with nothing left to migrate. Gates propagate upward naturally — a manager whose whole team fails the gate gets $0 too. Composes with AND across multiple gates. CRO overrides win over gates.
  • Two planning philosophies, both supported. See the section below.
  • is_gated column in quotas_to_dataframe when gates were used, so analysts can distinguish "$0 because gated" from "$0 because no signal."
  • cascader.gated_nodes — the set of gated nodes from the most recent cascade, stored for inspection.

Two Planning Philosophies

The package supports two philosophically distinct ways of building a quota plan. Both use the same primitives — pick the one that matches how your org thinks about fairness.

Earned planning"who has proven they can sell this?"

Cascade on historical signals (past NetNewACV attainment, past cloud-seat adds, LTM expansion). Reconcile against forward pipeline (open opps + late-stage commit + best-case). Best when historical attainment is a clean signal of forward capacity (mature business, low churn in territories, stable rep tenure).

historical_metrics = [
    MetricSpec('NetNewACV',  direction='proportional', weight=1.0, lookback=4),
    MetricSpec('CloudSeats', direction='proportional', weight=0.6, lookback=4),
    MetricSpec('DCSeats',    direction='inverse',      weight=0.4, lookback=4),
]
quotas = cascader.cascade_quota('Global_Corp', macro_target, metrics=historical_metrics)

# Reconcile against forward pipeline
adjuster = PipelineAdjuster(hierarchy, quotas,
                            pipeline_attr=['Open_Pipeline', 'Late_Stage_Commit'])

White-space planning"what can be achieved if we look at the opportunity in front of us?"

Cascade on forward-looking signals (current installed seats, knowledge-worker counts, white-space indicators), with dampeners (LTM spend) and hard gates (unmigrated seats). Reconcile against historical attainment to flag where the plan asks for a step-up. Best when past performance is noisy (rapid growth, territory shuffles, recent re-orgs) and the org wants every rep to be measured against the opportunity in front of them.

forward_metrics = [
    MetricSpec('Current_Seats_ProductX',  direction='proportional', weight=1.0,
               columns=['Current_Seats_ProductX']),
    MetricSpec('Knowledge_Workers_Count', direction='proportional', weight=0.7,
               columns=['Knowledge_Workers_Count']),
    MetricSpec('LTM_ExpansionSpent',      direction='inverse',      weight=0.5,
               columns=['LTM_ExpansionSpent']),
]
gate_metrics = [
    MetricSpec('Unmigrated_Seats', columns=['Unmigrated_Seats']),  # threshold defaults to 0
]
quotas = cascader.cascade_quota(
    'Global_Corp', macro_target,
    metrics=forward_metrics, gate_metrics=gate_metrics,
)

# Reconcile against historical attainment
adjuster = PipelineAdjuster(hierarchy, quotas, pipeline_attr=[
    'Q1_NetNewACV', 'Q2_NetNewACV', 'Q3_NetNewACV', 'Q4_NetNewACV',
])
diagnosis = adjuster.diagnose(coverage_thresholds={
    '_default': {'healthy': 1.0, 'at_risk': 0.75},   # ratios near 1.0, not 1.5–3x
})

Neither philosophy is "correct" — they answer different questions. The package supports either as a first-class flow, and you can blend them (some metrics historical, some forward) by mixing them in a single metrics= list.

What's New in v0.3.x

  • Multi-metric cascading via the new MetricSpec API — blend historical NetNewACV with any number of secondary signals (cloud seats, on-prem seats, LTM expansion spend, customer-sat scores, certification flags, anything else the analyst tracks), each marked as proportional or inverse, with per-metric weights and lookbacks
  • Direction is always a user input. Domain knowledge ("more cloud seats means more ACV") trumps statistical sign. The package surfaces correlations and warns on mismatch but never overrides the analyst's call
  • MetricSpec.suggest_weights(...) suggests weights (magnitude of correlation) for user-declared directions. For exploratory use, MetricSpec.suggest_directions_and_weights(...) infers both
  • Normalized-weights viewMetricSpec.normalized_weights(specs) shows the post-normalization share each metric actually contributes; auto-printed before every multi-metric cascade and accessible via cascader.weights_report
  • Brand-new IC handling — either-or: flag brand-new ICs in the same CSV the analyst already uploads (brand_new_col='Is_Brand_New' on SalesHierarchy.from_dataframe, then new_ic_attr='_is_brand_new' on cascade_quota), OR pick a rule (new_ic_rule='all_metrics_zero' / 'primary_metric_zero'). Passing both raises ValueError
  • Any metric name, any numeric type — including booleans (Has_Active_Cert: True/False). Boolean / 0-1 sparse metrics are auto-detected and excluded from zero-imputation so False isn't mistaken for missing data
  • PipelineAdjuster accepts multiple pipeline columnspipeline_attr=['Open_Pipeline', 'Late_Stage_Commit', 'Best_Case_Adds'] sums them per IC into a combined dollar amount for the coverage ratio
  • CSV / SQL / dashboard exports — every output converts to a DataFrame via cascader.quotas_to_dataframe(...), cascader.quotas_diff_to_dataframe(...), or reconciler.reconcile_all(...). From there .to_csv(), .to_sql(), or cascader.to_html_dashboard(...) writes wherever you need
  • Hedge audit columns — pass unhedged_quotas= to quotas_to_dataframe for unhedged_quota, hedge_buffer, and overassignment_pct columns showing exactly how much of each quota is hedge buffer
  • Fully backward compatiblecascade_quota(...) without metrics= behaves exactly as in v0.2.x

What's New in v0.2.0

  • PipelineAdjuster: Post-cascade pipeline health analyzer with diagnose() and adjust() modes
  • Flexible quarter support: QuotaCascader now auto-discovers any number of _Attainment columns (4, 8, 12 quarters)
  • New IC handling: Partial-history imputation and equal-share allocation for brand-new hires
  • CRO overrides: Lock specific IC quotas via new_ic_overrides to bypass the algorithm
  • Per-node hedging: Apply different hedge multipliers to different regions/managers
  • GitHub Actions CI/CD: Automated testing on Python 3.9–3.12

📦 Installation

pip install b2b-revenue-forecasting

💻 Quickstart

1. Build the Org Hierarchy

import pandas as pd
from b2b_revenue_forecasting.hierarchy import SalesHierarchy

# ⚠️ Use keep_default_na=False if your data has 'NA' as a region name
df = pd.read_csv('your_crm_data.csv', keep_default_na=False)

# Works with any depth: 3 levels or 10 levels
hierarchy = SalesHierarchy()
hierarchy.from_dataframe(
    df, 
    path_cols=['Global', 'Region', 'RVP', 'Director', 'Manager', 'IC'], 
    metrics_cols=['Q1_Attainment', 'Q2_Attainment', 'Q3_Attainment', 'Q4_Attainment',
                  'Current_Pipeline']
)

print(f"Nodes: {len(hierarchy.graph.nodes)}")
print(f"ICs:   {len(hierarchy.get_leaves('Global_Corp'))}")

2. Cascade Quotas Top-Down

from b2b_revenue_forecasting.quota_cascader import QuotaCascader

cascader = QuotaCascader(hierarchy)

# Basic: distribute $100M evenly by historical capacity
quotas = cascader.cascade_quota('Global_Corp', 100_000_000.0)

# With 5% hedge at every management level (compounds: 1.05^5 ≈ 27.6% overassignment)
quotas = cascader.cascade_quota('Global_Corp', 100_000_000.0, hedge_multiplier=1.05)

# Per-node hedge: NA gets aggressive 10%, others standard 5%
quotas = cascader.cascade_quota('Global_Corp', 100_000_000.0, hedge_multiplier={
    'Global_Corp': 1.05, 'NA': 1.10, 'EMEA': 1.05, 'APAC': 1.05
})

# CRO override: strategic hire gets exactly $500K regardless of history
quotas = cascader.cascade_quota('Global_Corp', 100_000_000.0,
    hedge_multiplier=1.05,
    new_ic_overrides={'IC_Strategic_Hire': 500_000.0}
)

3. Multi-Metric Cascading (v0.3+)

For real B2B planning, the metric you're cascading (e.g., NetNewACV) is rarely the only signal that should drive its allocation. Cloud-seat counts predict more new ACV; on-prem (DC) seat counts predict less; high LTM expansion spend means the account is already saturated. The MetricSpec API lets you mix any number of these into a single cascade.

Direction is always your call. You declare whether each metric is proportional (more → more quota) or inverse (more → less quota) up front. The package surfaces correlations and warns when the data sign disagrees, but never overrides your domain knowledge.

from b2b_revenue_forecasting import MetricSpec

# Declare each metric's role — direction is required, weight is your knob
metrics = [
    MetricSpec('NetNewACV',     direction='proportional', weight=1.0, lookback=4),
    MetricSpec('CloudSeats',    direction='proportional', weight=0.5, lookback=4),
    MetricSpec('DCSeats',       direction='inverse',      weight=0.4, lookback=4),
    MetricSpec('ExpansionSpent',direction='inverse',      weight=0.7,
               columns=['LTM_ExpansionSpent']),  # single LTM column
]

quotas = cascader.cascade_quota(
    'Global_Corp', 100_000_000.0,
    hedge_multiplier=1.05,
    metrics=metrics,
)

Any metric name, any data type works. Customer_Sat_Score, MQLs_Sourced_via_Outbound, Has_Active_Cert (boolean), Renewals_Caught_Up (0/1 counter) — anything numeric, with any column name. Boolean and 0/1 sparse metrics are auto-detected and excluded from zero-imputation so False isn't treated as a missing value.

How the blend works. At every level, each child gets a share of the parent's quota equal to a weighted sum of its per-metric shares-of-siblings. Proportional metrics use raw shares; inverse metrics flip via reciprocal-then-normalize. The final per-child share is Σ_m (weight_m × share_m(child)), which sums to 1 across siblings.

Don't know the weights? Pass direction= on each candidate, let suggest_weights() propose magnitudes via Pearson correlation:

suggestions, report = MetricSpec.suggest_weights(
    df,
    target_column='NetNewACV_4Q_sum',
    candidate_metrics=[
        {'name': 'CloudSeats',     'column': 'CloudSeats_4Q_sum',
         'direction': 'proportional', 'lookback': 4},
        {'name': 'DCSeats',        'column': 'DCSeats_4Q_sum',
         'direction': 'inverse',      'lookback': 4},
        {'name': 'ExpansionSpent', 'column': 'LTM_ExpansionSpent',
         'columns': ['LTM_ExpansionSpent'],
         'direction': 'inverse',      'lookback': 1},
    ],
)
# report['CloudSeats']['weight'] == 0.62, ['rationale'] explains why,
# ['direction_matches_data'] tells you if your call agrees with the sign

quotas = cascader.cascade_quota('Global_Corp', 100_000_000.0, metrics=suggestions)

For pure exploration (you don't yet have a domain opinion), use MetricSpec.suggest_directions_and_weights(...) — it infers both from data. This is a sanity-check helper, not a production-planning API.

Brand-new ICs — either-or, your choice of where they're listed. The cleanest option keeps everything in the same CSV the analyst already uploads:

# CSV has a column Is_Brand_New with True / 1 / "yes" for each new hire
hierarchy = SalesHierarchy()
hierarchy.from_dataframe(
    df, path_cols=[...], metrics_cols=[...],
    brand_new_col='Is_Brand_New',     # ingested as node attribute _is_brand_new
)

quotas = cascader.cascade_quota(
    'Global_Corp', 100_000_000.0,
    metrics=metrics,
    new_ic_attr='_is_brand_new',       # read the flag from the CSV
)

Or, if you don't want a separate column, pick an auto-detection rule:

quotas = cascader.cascade_quota(
    'Global_Corp', 100_000_000.0,
    metrics=metrics,
    new_ic_rule='all_metrics_zero',    # or 'primary_metric_zero'
)

You pick one or the other — passing both an explicit identifier (new_ic_attr or new_ic_ids) AND new_ic_rule in the same call raises ValueError, because the two would silently disagree.

Brand-new ICs get an equal-share carve-out of the team target before the remainder is split proportionally — just like the single-metric path.

4. Detect & Fix Forecasting Bias

from b2b_revenue_forecasting.commit_reconciler import CommitReconciler

historical = pd.DataFrame({
    'Manager_ID':              ['Mgr_A', 'Mgr_A', 'Mgr_B', 'Mgr_B'],
    'Historical_Commit':       [200_000,  250_000, 300_000,  350_000],
    'Historical_Actual_Closed': [300_000,  375_000, 270_000,  280_000],
})

reconciler = CommitReconciler(historical)

# Mgr_A is a sandbagger (bias = 1.5x) — commit inflated automatically
adjusted = reconciler.reconcile_forecast('Mgr_A', current_commit=100_000)
# → $150,000

# Blend with ML baseline (50/50 average)
blended = reconciler.reconcile_forecast('Mgr_A', 100_000, machine_forecast=120_000)
# → $135,000

5. Export to CSV, SQL, or an Interactive Dashboard

Every output is a pandas DataFrame, so the same code writes anywhere:

# CSV — analyst-ready, one row per node at every level
cascaded_df = cascader.quotas_to_dataframe(quotas, level_names=taxonomy)
cascaded_df.to_csv('cascaded_quotas.csv', index=False)

# CSV with hedge audit — also include the unhedged baseline
quotas_unhedged = cascader.cascade_quota(
    'Global_Corp', 100_000_000.0, hedge_multiplier=1.0,
    metrics=cascade_metrics, verbose=False,
)
cascader.quotas_to_dataframe(
    quotas, level_names=taxonomy, unhedged_quotas=quotas_unhedged,
).to_csv('cascaded_quotas_with_audit.csv', index=False)
# → adds unhedged_quota, hedge_buffer, overassignment_pct columns

# SQL — same DataFrames, any SQLAlchemy-compatible database
import sqlite3
with sqlite3.connect('cascade.db') as conn:
    cascaded_df.to_sql('cascaded_quotas', conn, if_exists='replace', index=False)
    cascader.weights_report.to_sql('normalized_weights', conn,
                                    if_exists='replace', index=False)
# Postgres / Snowflake / BigQuery: swap conn for a SQLAlchemy engine

# Interactive HTML dashboard — Chart.js, self-contained, shareable
cascader.to_html_dashboard(
    quotas, output_path='cascade_dashboard.html',
    title='Q1 Cascade — $100M Plan',
    unhedged_quotas=quotas_unhedged,
    adjusted_quotas=adjusted, diagnosis=diagnosis,
)

6. Pipeline Health Diagnosis & Redistribution

from b2b_revenue_forecasting.pipeline_adjuster import PipelineAdjuster

# Single pipeline column (backward compat)
adjuster = PipelineAdjuster(hierarchy, quotas, pipeline_attr='Current_Pipeline')

# Or sum multiple dollar-denominated pipeline columns from the same CSV
adjuster = PipelineAdjuster(hierarchy, quotas, pipeline_attr=[
    'Open_Pipeline', 'Late_Stage_Commit', 'Best_Case_Adds',
])

# Configure per-region coverage thresholds (ICs inherit from ancestors)
thresholds = {
    'NA':       {'healthy': 1.5, 'at_risk': 0.8},
    'EMEA':     {'healthy': 2.5, 'at_risk': 1.2},
    'APAC':     {'healthy': 3.0, 'at_risk': 1.5},
    '_default': {'healthy': 2.0, 'at_risk': 1.0}
}

# Diagnose — returns a DataFrame with risk status for every node
diagnosis = adjuster.diagnose(thresholds)
print(diagnosis.groupby('Risk_Status')['Node'].count())

# Flag-only mode — returns original quotas unchanged (for pre-approval review)
flagged = adjuster.adjust(mode='flag_only', coverage_thresholds=thresholds)

# Redistribute mode — zero-sum IC adjustment within each manager's team
adjusted = adjuster.adjust(
    mode='redistribute',
    coverage_thresholds=thresholds,
    max_adjustment_pct=0.20,                          # ±20% cap per IC
    locked_nodes={'IC_Protected': 500_000.0}           # CRO-locked ICs excluded
)
# ✅ Manager totals preserved | ✅ Donors give, receivers get | ✅ 20% cap enforced

🧠 Key Concepts

Managerial Hedge (Overassignment Buffer)

A multiplier applied at each management level to create mathematical safety. A 5% hedge across 5 layers compounds to ~27.6% total overassignment (1.05⁵), ensuring the enterprise hits its number even if some ICs miss.

Bias Quotient

Bias Quotient = Σ(Actual Closed) / Σ(Committed)
  • > 1.0 = Sandbagger (closes more than committed → inflate their forecast)
  • = 1.0 = Neutral
  • < 1.0 = Happy Ears (over-promises → deflate their forecast)

Pipeline Coverage Ratio

Coverage = Current Pipeline / Cascaded Quota
Coverage Status Action
≥ healthy threshold 🟢 Healthy May receive quota
≥ at_risk threshold 🟡 Moderate No action
≥ 1.0 🟠 At Risk May donate quota
< 1.0 🔴 Critical Urgent — pipeline below target (May donate quota)

New IC Handling

Scenario Behavior
Full history Proportional by total capacity
Partial history (e.g., 1 of 4 quarters) Zero quarters imputed with own non-zero average
Brand new (all zeros) Equal share of team target
CRO override Fixed amount, excluded from pool

🧪 Testing

# Run all tests
cd hierarchical_sales_forecasting
pip install -e .
python -m pytest tests/ -v

# Run the full demo
python demo_full_pipeline.py

📄 Publications

This framework is the subject of peer-reviewed research and technical publications:

Publication Venue Status
Hierarchical Sales Target Cascading using DAGs in Python Towards AI ✅ Published
Graph-Theoretic Approaches to Hierarchical Revenue Target Allocation in B2B Enterprises SSRN (Preprint) ✅ Published
Graph-Theoretic Approaches to Hierarchical Revenue Target Allocation in B2B Enterprises Journal of Revenue and Pricing Management (Springer) ⏳ Under Review

If you use this package in your research, please cite:

Karwa, S. (2026). Graph-Theoretic Approaches to Hierarchical Revenue Target Allocation
in B2B Enterprises: A Methodological Framework. SSRN Working Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6456999

📋 Requirements

  • Python ≥ 3.8
  • pandas ≥ 1.0.0
  • networkx ≥ 2.5
  • numpy ≥ 1.19.0

🤝 Contributing

Built explicitly for RevOps analysts, Data Scientists, and VP Revenue Operations executing scaling go-to-market strategies. Contributions, issues, and pull requests are warmly welcomed!


📄 License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

b2b_revenue_forecasting-0.4.0.tar.gz (56.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

b2b_revenue_forecasting-0.4.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file b2b_revenue_forecasting-0.4.0.tar.gz.

File metadata

  • Download URL: b2b_revenue_forecasting-0.4.0.tar.gz
  • Upload date:
  • Size: 56.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for b2b_revenue_forecasting-0.4.0.tar.gz
Algorithm Hash digest
SHA256 1e79b2e8bb1410d065bd3aa197cbcf455f88f7a615dcb97f2ac1f22073effdd7
MD5 78ee673527b89609b6a1859511b7018f
BLAKE2b-256 4fa3f7aee5abb72628aaced96715e756b3e199fb7397dd44859326497a63f773

See more details on using hashes here.

File details

Details for the file b2b_revenue_forecasting-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for b2b_revenue_forecasting-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 285d252b26ef34ef403985effc19d0c98803eab104c37c8b60e74bfa919a8f47
MD5 3340f5119895b6d9488d51305efcb562
BLAKE2b-256 ce3b05e3c6d2872f5e1abd6d9ea041c194edde72eac22ced07f56a4b1a501933

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page