Reproducible regression workflow: loaders → dependency tracking → codegen → execution.
Project description
Regression Monkey
Regression Monkey is a reproducible regression workflow for empirical research. It connects structured data loading, dependency-aware refreshing, templated code generation, batch execution, and a Textual-based TUI for curating final tables. The goal is to replace ad-hoc notebooks with a traceable, automation-friendly stack. (中文介绍请见 README_zh.md。)
Highlights
- Deterministic data refresh – DataLoader modules produce single artifacts; DataManager tracks ArcticDB/PKL/DataLoader sources, semantic hashes, and dependency propagation.
- Task-centric modeling –
StandardRegTaskcaptures Y/X/control/fixed-effect specs, cluster options, incremental controls, and classification filters; tasks serialize cleanly and carry fingerprints for auditing. - Code generation and execution –
CodeGeneratorrenders Jinja2 templates (currently R) with dependency injection;CodeExecutororchestrates task trees via rpy2, wiring datasets and capturing normalized results (including stepwise regressions). - Table editing TUI – Textual UI lets you search tasks, attach columns (including stepwise
variants), reorder/rename columns, and export reproducibility bundles (
main.R, datasets). - International-ready messaging – All runtime prompts, logs, and TUI notifications are in English for cross-team collaboration.
Components at a Glance
| Component | Purpose |
|---|---|
DataLoader |
Minimal class for defining clean_data() → DataFrame/PKL/Arctic output with declared dependencies. |
DataManager |
Orchestrates multi-source loading (Arctic ↔ DataLoader ↔ PKL), semantic fingerprinting, cost-aware refresh decisions, and caching. |
StandardRegTask |
Declarative regression spec with serialization, subset filters, incremental controls, and acceptance tests. |
CodeGenerator |
Jinja2 macro toolkit that emits R code (OLS/FE/RE, stepwise, etc.) and dependency stubs. |
CodeExecutor |
rpy2-based runner that feeds datasets, executes generated code, captures python_output, and records stepwise metadata. |
Planner |
Builds task trees (sections/nodes) and coordinates downstream rendering/execution. |
tui/* |
Textual UI for browsing tasks, selecting columns, editing tables, and exporting reproducibility bundles. |
Installation
Requires Python 3.14+.
pip install regression_monkey
For development extras (testing, linting, packaging):
pip install "regression_monkey[dev]"
External Requirements
- R runtime if you plan to execute generated R code.
- rpy2 is installed automatically on non-Windows platforms (you can install it manually on Windows if R is available).
- ArcticDB requires system dependencies compatible with LMDB.
Quick Start
1. Define a DataLoader
# data_loader/users.py
from reg_monkey.data_loader import DataLoader
import pandas as pd
class UsersLoader(DataLoader):
output_pkl_name = "users.pkl"
def clean_data(self):
df = pd.read_csv("source_data/users_raw.csv")
df = df.dropna(subset=["firm_id"]).rename(columns={"signup_time": "ts"})
self.df = df
return df
2. Refresh/load datasets
from reg_monkey.data_manager import DataManager
dm = DataManager(target_symbols=["users"], project_root=".")
df_users = dm.get("users") # hits Arctic/Pickle/DataLoader according to priority
3. Describe a regression task
from reg_monkey.task_obj import StandardRegTask
from reg_monkey.code_generator import CodeGenerator
task = StandardRegTask(
name="baseline",
dataset="users",
y="y",
X=["treatment"],
controls=["size","age"],
category_controls=["industry","year"],
model="OLS",
incremental_controls=True,
)
cg = CodeGenerator(task)
segments = cg.assembly(internal_output=True)
print(segments["combined"]) # rendered R script
4. Execute and inspect results
from reg_monkey.code_executor import CodeExecutor
executor = CodeExecutor(plan=None, datasets={"users": df_users})
executor.run_single_task(task, segments["combined"]) # custom helper you implement
print(task.exec_result["forward_res"]["coefficients"].head())
5. Launch the TUI
from reg_monkey.tui import run_app
run_app(code_executor=executor, config_path="output_mapping.json")
Use the TUI (Table List → Table Editor → Result Browser) to add columns, toggle stepwise results,
rename labels, and export reproducibility bundles (main.R + datasets + metadata).
Reproducibility Exports
ExportService bundles:
main.Rwith dependency installation, dataset loading, preparation sections, and regression execution (deduplicated by code hash).- Feather/CSV datasets referenced in tables.
- Stepwise columns honoring user selections (enable columns via TUI and choose steps in the modal).
Project Layout
src/
reg_monkey/
data_loader.py
data_manager.py
task_obj.py
code_generator.py
code_executor.py
planner.py
export_service.py
tui/
r_template.jinja
prd/ # design docs (Chinese allowed)
bk/ # backups / historical references
Development Tips
- Run
pytestfor unit tests; TUI flows are best verified manually. - Use
ruff+blackfor lint/format. - When touching the TUI, ensure
output_mapping.jsonremains backward compatible (columns carrycontrols,parent_task_id, etc.). - All user-facing text must remain in English.
Contributing
Pull requests are welcome. Please include:
- A clear description of the change.
- Tests or manual verification steps for regression-critical paths.
- Documentation updates if behavior changes.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file regression_monkey-0.2.1.tar.gz.
File metadata
- Download URL: regression_monkey-0.2.1.tar.gz
- Upload date:
- Size: 134.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a96e838272e8fcf7ee610b09b95d36c84a5de3d3522db16610eb0603a1604666
|
|
| MD5 |
0ed709cc0cfa49fec18f0f096032579e
|
|
| BLAKE2b-256 |
12d496c3c0879e4f3bfd2eeeae4e8aaab460d6e2aed0c494c678f3e309174ed4
|
Provenance
The following attestation bundles were made for regression_monkey-0.2.1.tar.gz:
Publisher:
publish.yml on guanzd88/regression_monkey
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
regression_monkey-0.2.1.tar.gz -
Subject digest:
a96e838272e8fcf7ee610b09b95d36c84a5de3d3522db16610eb0603a1604666 - Sigstore transparency entry: 1004937865
- Sigstore integration time:
-
Permalink:
guanzd88/regression_monkey@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/guanzd88
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb -
Trigger Event:
release
-
Statement type:
File details
Details for the file regression_monkey-0.2.1-py3-none-any.whl.
File metadata
- Download URL: regression_monkey-0.2.1-py3-none-any.whl
- Upload date:
- Size: 147.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7380c185dbe12407f6ac348d8c35813ac8e4b06833894161f3ed6cad329adc5f
|
|
| MD5 |
a9765c05cdcc5d853c3dea9e9e08b761
|
|
| BLAKE2b-256 |
1392a55a046bee18866ea52cc289d20f5e72375fbd1b3a163e62748e10801403
|
Provenance
The following attestation bundles were made for regression_monkey-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on guanzd88/regression_monkey
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
regression_monkey-0.2.1-py3-none-any.whl -
Subject digest:
7380c185dbe12407f6ac348d8c35813ac8e4b06833894161f3ed6cad329adc5f - Sigstore transparency entry: 1004937866
- Sigstore integration time:
-
Permalink:
guanzd88/regression_monkey@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/guanzd88
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb -
Trigger Event:
release
-
Statement type: