Cross-backend binscatter plots.
Project description
Dataframe agnostic binscatter plots
TL;DR: Fast binscatter plots for all kinds of dataframes.
- Built on the
narwhalsdataframe abstraction, so pandas, Polars, DuckDB, Dask, and PySpark inputs all work out of the box.- All other Narwhals backends fall back to a generic quantile handler if a native path is unavailable
- Lightweight - little dependencies
- Just works: by default picks the number of bins automatically via the rule-of-thumb selector from Cattaneo et al. (2024) - no manual tuning
- Efficiently avoids materializing large intermediate datasets
- Optional polynomial regression overlay computed directly from the raw data (and any controls) for quick visual comparison
- Uses
plotlyas graphics backend - because: (1) it's great (2) it usesnarwhalsas well, minimizing dependencies - Pythonic alternative to the excellent binsreg package
What are binscatter plots?
Binscatter plots group the x-axis into bins and plot average outcomes for each bin, giving a cleaner view of the relationship between two variables—possibly controlling for confounders. They show an estimate of the conditional mean, rather than all the underlying data as in a classical scatter plot.
Installation
pip install binscatter
Example
A binscatter plot showing patenting activity against the 3-year net of tax rate controlling for several state-level covariates.
See code below:
from binscatter import binscatter
binscatter(
df,
"mtr90_lag3",
"lnpat",
controls=[
"top_corp_lag3",
"real_gdp_pc",
"population_density",
"rd_credit_lag3",
"statenum",
"year",
],
# num_bins="rule-of-thumb", # optional: let the selector choose the bin count
# return_type="native", # optional: get the aggregated dataframe instead of a Plotly figure
# poly_line=2, # optional: overlay a degree-2 polynomial fit using the raw data plus controls
).update_layout( # binscatter returns a Plotly figure, so you can tweak labels, colors, etc.
xaxis_title="Log net of tax rate := log(1 - tax rate)",
yaxis_title="Log number of patents",
)
This is how a classical scatter of the same data looks like, clearly showing a lot of noise:
This package implements binscatter plots following:
- Cattaneo, Matias D.; Crump, Richard K.; Farrell, Max H.; Feng, Yingjie (2024), “On Binscatter,” American Economic Review, 114(5), 1488–1514. DOI: 10.1257/aer.20221576
Data for the example originates from:
- Akcigit, Ufuk; Grigsby, John; Nicholas, Tom; Stantcheva, Stefanie (2021), “Replication Data for: ‘Taxation and Innovation in the 20th Century’,” Harvard Dataverse, V1. DOI: 10.7910/DVN/SR410I
Tests
- Run the full backend matrix, including PySpark:
just test - Use the faster run without PySpark:
just ftest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file binscatter-0.2.0.tar.gz.
File metadata
- Download URL: binscatter-0.2.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04bccc84c88113cda1c2d6feb72bcdf4eb1fa921fbf17cebfd4494b9d3d5a320
|
|
| MD5 |
4cb3dc42389ead9423f3674077d81ade
|
|
| BLAKE2b-256 |
aaf122c0ef6e1d4a2ebd83d637df3f619c32f003293bb51eda4ddd2dc5365ac3
|
File details
Details for the file binscatter-0.2.0-py3-none-any.whl.
File metadata
- Download URL: binscatter-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
460e5a5b7d34022fd5a8957ab4bc34e5d62871757843ae26cbeb2d8ac928a377
|
|
| MD5 |
f195bac6379416423d06c0d6150eaa4e
|
|
| BLAKE2b-256 |
686ae70de2c966d935878a079293f7d92c3a8eedc10cac0f85b3a10b4a89897e
|