Reproducible regression workflow: loaders → dependency tracking → codegen → execution.

These details have not been verified by PyPI

Project links

Project description

regmonkey

面向“回归类研究（计量/实证）”的一体化流水线：数据加载 → 依赖追踪/重算 → 代码生成（R/Stata/其他）→ 执行与标准化产出。

目标是把「一次性的分析脚本」升级为可重现、可复用、可审计的工作流。

功能一览

DataLoader：极简数据加载/清洗基类；一个 DataLoader 对应一个“产物”（如某个 DataFrame / PKL / ArcticDB 表）。
DataManager：统一的加载/重算/持久化调度器；支持 ArcticDB、本地 PKL、以及动态导入的 DataLoader，带语义指纹与依赖传播。
StandardRegTask：标准回归任务对象，描述变量、模型类型、固定效应、聚类、PSM 等；可序列化与指纹化。
CodeGenerator：基于 jinja2 的模板渲染器，把 StandardRegTask 生成 R（或其他语言）脚本；自动汇总依赖包并插入安装/加载段。
CodeExecutor：任务树执行编排器（对接 rpy2 等）并产出标准化结果。
Planner：树状任务编排（章节/标签/节点），便于结构化地组织回归组。

安装

方式一：本地安装（开发）

git clone <your-fork-or-path>/regmonkey
cd regmonkey
pip install regression_monkey ".[dev]"

方式二：从源码打包安装

pip install regression_monkey
python -m build
pip install regression_monkey

需要的关键依赖：pandas, numpy, jinja2, rpy2, arcticdb（可选）, pyyaml。详见下方 pyproject.toml。

快速上手

1) 定义一个 DataLoader

from reg_monkey import DataLoader
import pandas as pd

class MyUsersLoader(DataLoader):
    output_pkl_name = "users.pkl"
    dependency = ["raw/users.csv"]

    def clean_data(self):
        df = pd.read_csv("raw/users.csv")
        # minimal cleaning …
        df = df.dropna(subset=["id"]).rename(columns={"signup_time":"ts"})
        self.df = df
        return df

2) 使用 DataManager 读取/重算

from reg_monkey import DataManager
dm = DataManager(project_root=".", arctic_uri=None)   # 无 ArcticDB 时可为 None

# 首次会动态导入并执行 DataLoader，之后按优先级命中缓存/PKL/ArcticDB
users = dm.get("users.pkl", loader_module="my_loaders.users_loader")

3) 声明一个回归任务并生成 R 代码

from reg_monkey import StandardRegTask, CodeGenerator, PublicConfig

task = StandardRegTask(
    task_id="T1",
    data_key="users.pkl",
    y="y",
    X=["x1","x2","x3"],
    model="OLS",
    fe=["industry","year"],
    cluster="firm_id"
)

cg = CodeGenerator(public_config=PublicConfig())
code = cg.render(task)      # R 脚本字符串
open("out/T1.R","w",encoding="utf-8").write(code)

4) 执行（可选，依赖 rpy2）

from reg_monkey import CodeExecutor
executor = CodeExecutor(r_home=None)   # 如需，设置 R_HOME
res = executor.run_script_text(code)   # 返回标准化的结果字典/表

设计要点

三源优先级与回退：ArcticDB ↔ DataLoader(动态导入) ↔ PKL；失败自动回退。
语义指纹与依赖传播：对 clean_data() AST 与依赖列表做哈希；变动即触发重算，并沿反向依赖闭包传播。
预算与交互：基于历史耗时估算链路成本；可设阈值区分“自动/需确认”的策略。
标准化结果：把多模型的输出（系数、稳健性、PSM、Heckman 等）统一为结构化表格，便于对比/制表。

目录结构（建议）

regmonkey/
├─ pyproject.toml
├─ README.md
└─ src/
   └─ regmonkey/
      ├─ __init__.py
      ├─ data_loader.py
      ├─ data_manager.py
      ├─ task_obj.py
      ├─ code_generator.py
      ├─ code_executor.py
      ├─ planner.py
      ├─ util.py
      └─ r_template.jinja

配置

项目根目录放置 config.json：

{
  "arctic_uri": "lmdb:///path/to/arctic",   // 可选
  "data_root": "./data",
  "pkl_root": "./cache"
}

运行示例（最小）

from reg_monkey import DataManager, DataLoader

class Demo(DataLoader):
    output_pkl_name = "demo.pkl"
    def clean_data(self):
        import pandas as pd
        self.df = pd.DataFrame({"x":[1,2,3],"y":[2,4,6]})
        return self.df

dm = DataManager(project_root=".")
df = dm.get("demo.pkl", loader_module="demo_loader.Demo")  # 指向你的模块路径
print(df.head())

贡献

欢迎 PR：

新的语言模板（如 Stata、Python statsmodels）
更多模型类型（RE/IV/PSM/Heckman 等）
DataManager 的后端增强（DuckDB/Delta/Glue…）

许可证

MIT

注意：在 PyPI 上包名是 reg_monkey，但导入仍然是 import reg_monkey。

注意：PyPI 包名是 regression_monkey，导入名是 import reg_monkey。

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Feb 28, 2026

0.2.0

Feb 26, 2026

0.1.2

Jan 16, 2026

0.1.1

Jan 15, 2026

This version

0.1.0

Oct 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regression_monkey-0.1.0.tar.gz (68.6 kB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

regression_monkey-0.1.0-py3-none-any.whl (71.4 kB view details)

Uploaded Oct 20, 2025 Python 3

File details

Details for the file regression_monkey-0.1.0.tar.gz.

File metadata

Download URL: regression_monkey-0.1.0.tar.gz
Upload date: Oct 20, 2025
Size: 68.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for regression_monkey-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d80dad0be6555b8b135eb584a66d2700f8677ad5fd96feb0e3d8fdb76a0b6c20`
MD5	`86076364528161750b4559899b00472d`
BLAKE2b-256	`10139e371fa1e126779b7a935036c2871bf0ee7e64d0f5bf3809e5a2ea0e4b38`

See more details on using hashes here.

Provenance

The following attestation bundles were made for regression_monkey-0.1.0.tar.gz:

Publisher: publish.yml on guanzd88/regression_monkey

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: regression_monkey-0.1.0.tar.gz
- Subject digest: d80dad0be6555b8b135eb584a66d2700f8677ad5fd96feb0e3d8fdb76a0b6c20
- Sigstore transparency entry: 623411464
- Sigstore integration time: Oct 20, 2025
Source repository:
- Permalink: guanzd88/regression_monkey@108df74b8d747f4c42327354b5afa88b98adf848
- Branch / Tag: refs/tags/V0.1.0
- Owner: https://github.com/guanzd88
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@108df74b8d747f4c42327354b5afa88b98adf848
- Trigger Event: release

File details

Details for the file regression_monkey-0.1.0-py3-none-any.whl.

File metadata

Download URL: regression_monkey-0.1.0-py3-none-any.whl
Upload date: Oct 20, 2025
Size: 71.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for regression_monkey-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d71c44a0a9cc8e357f3a5bf8e004826426505e8388c7d3c90b20f5484d6e03f`
MD5	`39bdd0758fe21f1e42df31b60ffb29bb`
BLAKE2b-256	`c12eb8a3ecde87db181b94d5c18d8582112bff9d65ec08355daa5e5ed847e5e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for regression_monkey-0.1.0-py3-none-any.whl:

Publisher: publish.yml on guanzd88/regression_monkey

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: regression_monkey-0.1.0-py3-none-any.whl
- Subject digest: 7d71c44a0a9cc8e357f3a5bf8e004826426505e8388c7d3c90b20f5484d6e03f
- Sigstore transparency entry: 623411466
- Sigstore integration time: Oct 20, 2025
Source repository:
- Permalink: guanzd88/regression_monkey@108df74b8d747f4c42327354b5afa88b98adf848
- Branch / Tag: refs/tags/V0.1.0
- Owner: https://github.com/guanzd88
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@108df74b8d747f4c42327354b5afa88b98adf848
- Trigger Event: release

regression-monkey 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

regmonkey

功能一览

安装

方式一：本地安装（开发）

方式二：从源码打包安装

快速上手

1) 定义一个 DataLoader

2) 使用 DataManager 读取/重算

3) 声明一个回归任务并生成 R 代码

4) 执行（可选，依赖 rpy2）

设计要点

目录结构（建议）

配置

运行示例（最小）

贡献

许可证

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance