Reproducible regression workflow: loaders → dependency tracking → codegen → execution.

These details have not been verified by PyPI

Project links

Project description

Regression Monkey

Regression Monkey is a reproducible regression workflow for empirical research. It connects structured data loading, dependency-aware refreshing, templated code generation, batch execution, and a Textual-based TUI for curating final tables. The goal is to replace ad-hoc notebooks with a traceable, automation-friendly stack. (中文介绍请见 README_zh.md。)

Highlights

Deterministic data refresh – DataLoader modules produce single artifacts; DataManager tracks ArcticDB/PKL/DataLoader sources, semantic hashes, and dependency propagation.
Task-centric modeling – StandardRegTask captures Y/X/control/fixed-effect specs, cluster options, incremental controls, and classification filters; tasks serialize cleanly and carry fingerprints for auditing.
Code generation and execution – CodeGenerator renders Jinja2 templates (currently R) with dependency injection; CodeExecutor orchestrates task trees via rpy2, wiring datasets and capturing normalized results (including stepwise regressions).
Table editing TUI – Textual UI lets you search tasks, attach columns (including stepwise variants), reorder/rename columns, and export reproducibility bundles (main.R, datasets).
International-ready messaging – All runtime prompts, logs, and TUI notifications are in English for cross-team collaboration.

Components at a Glance

Component	Purpose
`DataLoader`	Minimal class for defining `clean_data()` → DataFrame/PKL/Arctic output with declared dependencies.
`DataManager`	Orchestrates multi-source loading (Arctic ↔ DataLoader ↔ PKL), semantic fingerprinting, cost-aware refresh decisions, and caching.
`StandardRegTask`	Declarative regression spec with serialization, subset filters, incremental controls, and acceptance tests.
`CodeGenerator`	Jinja2 macro toolkit that emits R code (OLS/FE/RE, stepwise, etc.) and dependency stubs.
`CodeExecutor`	rpy2-based runner that feeds datasets, executes generated code, captures `python_output`, and records stepwise metadata.
`Planner`	Builds task trees (sections/nodes) and coordinates downstream rendering/execution.
`tui/*`	Textual UI for browsing tasks, selecting columns, editing tables, and exporting reproducibility bundles.

Installation

Requires Python 3.14+.

pip install regression_monkey

For development extras (testing, linting, packaging):

pip install "regression_monkey[dev]"

External Requirements

R runtime if you plan to execute generated R code.
rpy2 is installed automatically on non-Windows platforms (you can install it manually on Windows if R is available).
ArcticDB requires system dependencies compatible with LMDB.

Quick Start

1. Define a DataLoader

# data_loader/users.py
from reg_monkey.data_loader import DataLoader
import pandas as pd

class UsersLoader(DataLoader):
    output_pkl_name = "users.pkl"

    def clean_data(self):
        df = pd.read_csv("source_data/users_raw.csv")
        df = df.dropna(subset=["firm_id"]).rename(columns={"signup_time": "ts"})
        self.df = df
        return df

2. Refresh/load datasets

from reg_monkey.data_manager import DataManager

dm = DataManager(target_symbols=["users"], project_root=".")
df_users = dm.get("users")  # hits Arctic/Pickle/DataLoader according to priority

3. Describe a regression task

from reg_monkey.task_obj import StandardRegTask
from reg_monkey.code_generator import CodeGenerator

task = StandardRegTask(
    name="baseline",
    dataset="users",
    y="y",
    X=["treatment"],
    controls=["size","age"],
    category_controls=["industry","year"],
    model="OLS",
    incremental_controls=True,
)

cg = CodeGenerator(task)
segments = cg.assembly(internal_output=True)
print(segments["combined"])  # rendered R script

4. Execute and inspect results

from reg_monkey.code_executor import CodeExecutor

executor = CodeExecutor(plan=None, datasets={"users": df_users})
executor.run_single_task(task, segments["combined"])  # custom helper you implement
print(task.exec_result["forward_res"]["coefficients"].head())

5. Launch the TUI

from reg_monkey.tui import run_app

run_app(code_executor=executor, config_path="output_mapping.json")

Use the TUI (Table List → Table Editor → Result Browser) to add columns, toggle stepwise results, rename labels, and export reproducibility bundles (main.R + datasets + metadata).

Reproducibility Exports

ExportService bundles:

main.R with dependency installation, dataset loading, preparation sections, and regression execution (deduplicated by code hash).
Feather/CSV datasets referenced in tables.
Stepwise columns honoring user selections (enable columns via TUI and choose steps in the modal).

Project Layout

src/
  reg_monkey/
    data_loader.py
    data_manager.py
    task_obj.py
    code_generator.py
    code_executor.py
    planner.py
    export_service.py
    tui/
    r_template.jinja
  prd/        # design docs (Chinese allowed)
  bk/         # backups / historical references

Development Tips

Run pytest for unit tests; TUI flows are best verified manually.
Use ruff + black for lint/format.
When touching the TUI, ensure output_mapping.json remains backward compatible (columns carry controls, parent_task_id, etc.).
All user-facing text must remain in English.

Contributing

Pull requests are welcome. Please include:

A clear description of the change.
Tests or manual verification steps for regression-critical paths.
Documentation updates if behavior changes.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Feb 28, 2026

0.2.0

Feb 26, 2026

0.1.2

Jan 16, 2026

0.1.1

Jan 15, 2026

0.1.0

Oct 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regression_monkey-0.2.1.tar.gz (134.7 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

regression_monkey-0.2.1-py3-none-any.whl (147.6 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file regression_monkey-0.2.1.tar.gz.

File metadata

Download URL: regression_monkey-0.2.1.tar.gz
Upload date: Feb 28, 2026
Size: 134.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for regression_monkey-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`a96e838272e8fcf7ee610b09b95d36c84a5de3d3522db16610eb0603a1604666`
MD5	`0ed709cc0cfa49fec18f0f096032579e`
BLAKE2b-256	`12d496c3c0879e4f3bfd2eeeae4e8aaab460d6e2aed0c494c678f3e309174ed4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for regression_monkey-0.2.1.tar.gz:

Publisher: publish.yml on guanzd88/regression_monkey

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: regression_monkey-0.2.1.tar.gz
- Subject digest: a96e838272e8fcf7ee610b09b95d36c84a5de3d3522db16610eb0603a1604666
- Sigstore transparency entry: 1004937865
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: guanzd88/regression_monkey@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/guanzd88
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb
- Trigger Event: release

File details

Details for the file regression_monkey-0.2.1-py3-none-any.whl.

File metadata

Download URL: regression_monkey-0.2.1-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 147.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for regression_monkey-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7380c185dbe12407f6ac348d8c35813ac8e4b06833894161f3ed6cad329adc5f`
MD5	`a9765c05cdcc5d853c3dea9e9e08b761`
BLAKE2b-256	`1392a55a046bee18866ea52cc289d20f5e72375fbd1b3a163e62748e10801403`

See more details on using hashes here.

Provenance

The following attestation bundles were made for regression_monkey-0.2.1-py3-none-any.whl:

Publisher: publish.yml on guanzd88/regression_monkey

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: regression_monkey-0.2.1-py3-none-any.whl
- Subject digest: 7380c185dbe12407f6ac348d8c35813ac8e4b06833894161f3ed6cad329adc5f
- Sigstore transparency entry: 1004937866
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: guanzd88/regression_monkey@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/guanzd88
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8e1765cc9bd79a0f7596703aa2dab07d65ec01eb
- Trigger Event: release

regression-monkey 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Regression Monkey

Highlights

Components at a Glance

Installation

External Requirements

Quick Start

1. Define a DataLoader

2. Refresh/load datasets

3. Describe a regression task

4. Execute and inspect results

5. Launch the TUI

Reproducibility Exports

Project Layout

Development Tips

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance