Skip to main content

Cross-backend binscatter plots.

Project description

Dataframe agnostic binscatter plots

TL;DR: Fast binscatter plots for all kinds of dataframes.

  • Built on the narwhals dataframe abstraction, so pandas, Polars, DuckDB, Dask, and PySpark inputs all work out of the box.
    • All other Narwhals backends fall back to a generic quantile handler if a native path is unavailable
  • Lightweight - little dependencies
  • Just works: by default picks the number of bins automatically via the rule-of-thumb selector from Cattaneo et al. (2024) - no manual tuning
  • Efficiently avoids materializing large intermediate datasets
  • Optional polynomial regression overlay computed directly from the raw data (and any controls) for quick visual comparison
  • Uses plotly as graphics backend - because: (1) it's great (2) it uses narwhals as well, minimizing dependencies
  • Pythonic alternative to the excellent binsreg package

What are binscatter plots?

Binscatter plots group the x-axis into bins and plot average outcomes for each bin, giving a cleaner view of the relationship between two variables—possibly controlling for confounders. They show an estimate of the conditional mean, rather than all the underlying data as in a classical scatter plot.

Installation

pip install binscatter

Example

A binscatter plot showing patenting activity against the 3-year net of tax rate controlling for several state-level covariates.

Scatter and binscatter

See code below:

from binscatter import binscatter

binscatter(
    df,
    "mtr90_lag3",
    "lnpat",
    controls=[
        "top_corp_lag3",
        "real_gdp_pc",
        "population_density",
        "rd_credit_lag3",
        "statenum",
        "year",
    ],
    # num_bins="rule-of-thumb",  # optional: let the selector choose the bin count
    # return_type="native",  # optional: get the aggregated dataframe instead of a Plotly figure
    # poly_line=2,  # optional: overlay a degree-2 polynomial fit using the raw data plus controls
).update_layout(  # binscatter returns a Plotly figure, so you can tweak labels, colors, etc.
    xaxis_title="Log net of tax rate := log(1 - tax rate)",
    yaxis_title="Log number of patents",
)

This is how a classical scatter of the same data looks like, clearly showing a lot of noise:

Scatter

This package implements binscatter plots following:

  • Cattaneo, Matias D.; Crump, Richard K.; Farrell, Max H.; Feng, Yingjie (2024), “On Binscatter,” American Economic Review, 114(5), 1488–1514. DOI: 10.1257/aer.20221576

Data for the example originates from:

  • Akcigit, Ufuk; Grigsby, John; Nicholas, Tom; Stantcheva, Stefanie (2021), “Replication Data for: ‘Taxation and Innovation in the 20th Century’,” Harvard Dataverse, V1. DOI: 10.7910/DVN/SR410I

Tests

  • Run the full backend matrix, including PySpark: just test
  • Use the faster run without PySpark: just ftest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binscatter-0.2.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

binscatter-0.2.0-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file binscatter-0.2.0.tar.gz.

File metadata

  • Download URL: binscatter-0.2.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for binscatter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 04bccc84c88113cda1c2d6feb72bcdf4eb1fa921fbf17cebfd4494b9d3d5a320
MD5 4cb3dc42389ead9423f3674077d81ade
BLAKE2b-256 aaf122c0ef6e1d4a2ebd83d637df3f619c32f003293bb51eda4ddd2dc5365ac3

See more details on using hashes here.

File details

Details for the file binscatter-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: binscatter-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for binscatter-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 460e5a5b7d34022fd5a8957ab4bc34e5d62871757843ae26cbeb2d8ac928a377
MD5 f195bac6379416423d06c0d6150eaa4e
BLAKE2b-256 686ae70de2c966d935878a079293f7d92c3a8eedc10cac0f85b3a10b4a89897e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page