Skip to main content

A package for Regression in compressed representation powered by DuckDB

Project description

duckreg : very fast out-of-memory regressions with duckdb

python package to run stratified/saturated regressions out-of-memory with duckdb. The package is a wrapper around the duckdb package and provides a simple interface to run regressions on very large datasets that do not fit in memory by reducing the data to a set of summary statistics and runs weighted least squares with frequency weights. Robust standard errors are computed from sufficient statistics, while clustered standard errors are computed using the cluster bootstrap.

See examples in notebooks/introduction.ipynb.

  • install
pip install duckreg
  • dev install (preferably in a venv) with
(uv) pip install git+https://github.com/apoorvalal/duckreg.git

or git clone this repository and install in editable mode.


Currently supports the following regression specifications:

  1. DuckRegression: general linear regression, which compresses the data to y averages stratified by all unique values of the x variables
  2. DuckMundlak: One- or Two-Way Mundlak regression, which compresses the data to the following RHS and avoids the need to incorporate unit (and time FEs)

$$ y \sim 1, w, \bar{w}_{i, .}, \bar{w}_{., t} $$

  1. DuckDoubleDemeaning: Double demeaning regression, which compresses the data to y averages by all values of $w$ after demeaning. This also eliminates unit and time FEs

$$ y \sim (W_{it} - \bar{w}_{i, .} - \bar{w}_{., t} + \bar{w}_{., .}) $$

  1. DuckMundlakEventStudy: Two-way mundlak with dynamic treatment effects. This incorporates treatment-cohort FEs ($\psi_i$), time-period FEs ($\gamma_t$) and dynamic treatment effects $\tau_k$ given by cohort X time interactions.

$$ y \sim \psi_i + \gamma_t + \sum_{k=1}^{T} \tau_{k} D_i 1(t = k) $$

All the above regressions are run in compressed fashion with duckdb.


references:

methods:

libraries:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckreg-0.1.1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

duckreg-0.1.1-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file duckreg-0.1.1.tar.gz.

File metadata

  • Download URL: duckreg-0.1.1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for duckreg-0.1.1.tar.gz
Algorithm Hash digest
SHA256 510ea77c4ab0a7527fc26e24f1678d139998bbe1e18396afaa1c140634aa517c
MD5 01381e7d5c2a6312e6a90620b2dacdca
BLAKE2b-256 541a3c745fffe9f1635786b6e69fe51862d35a7e817d81e3f8784782d82d5e15

See more details on using hashes here.

File details

Details for the file duckreg-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: duckreg-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for duckreg-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2136c49009c11e21b70b8541a70ec153b3fbee9ae71d3ef80cfcffc6bffe915e
MD5 5cfebae17a96ed3cbfde278a4884217b
BLAKE2b-256 2a0c3d8b9567baa4da29adca85503f0f3c383951a9d69cbcb9612fd67dcf4bb3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page