DSL → Polars Expr compile engine for quantitative factor mining
Project description
CodeStr
CodeStr 是一个专为量化因子挖掘设计的 DSL → Polars Expr 表达式计算引擎,提供高效的表达式转译、缓存与执行。
安装
git clone https://github.com/huangbogeng/codestr.git
cd codestr
uv sync --extra dev
快速开始
import polars as pl
from codestr import CodeStr
# 标准面板数据 (time, entity)
df = pl.DataFrame({
"datetime": ["2024-01-01", "2024-01-01", "2024-01-02", "2024-01-02"],
"asset": ["A", "B", "A", "B"],
"close": [100.0, 200.0, 101.0, 198.0],
"volume": [1000.0, 2000.0, 1100.0, 1900.0],
})
cs = CodeStr(df, index=("datetime", "asset"))
# 交互式查询 — 结果自动缓存
result = cs.sql(
"ts_mean(close, 5) as ma5",
"cs_rank(close) as rank",
"close / ts_delay(close, 1) - 1 as ret",
)
print(result)
两种 API 模式
| 模式 | API | 行为 |
|---|---|---|
| 纯编译 | cs.compile(expr) -> pl.Expr |
无副作用,返回 Polars 表达式 |
| 交互式 | cs.sql(expr, lazy=False) -> pl.DataFrame |
有状态,自动缓存与复用 |
# 纯编译 — 表达式可被任意 DataFrame 消费
expr = cs.compile("ts_mean(close, 5) as ma5")
other_df.with_columns(expr)
# 交互式 — 适合逐步构建因子
cs.sql("close + volume as total")
cs.sql("ts_mean(total, 5) as total_ma5") # 复用上一步的 total
窗口配置
CodeStr 使用 partition_by(实体分组轴)和 order_by(时间排序轴)控制窗口算子:
# 默认配置
cs = CodeStr(df)
# index=("datetime", "asset")
# → TS: over(partition_by=["asset"], order_by=["datetime"])
# → CS: over(partition_by=["datetime"], order_by=["asset"])
# 自定义列名
cs = CodeStr(df, index=("trade_date", "stock_code"))
# 多列窗口 — 按行业+股票分组,按日期+逐笔序号排序
cs = CodeStr(df,
index=("trade_date", "stock_code"),
partition_by=["industry", "stock_code"],
order_by=["trade_date", "tick"],
)
| 算子类别 | 窗口规则 |
|---|---|
| TS (时序) | over(partition_by=partition_by, order_by=order_by) |
| CS (截面) | over(partition_by=order_by, order_by=partition_by) |
自定义算子
from codestr.udf.registry import udf
import polars as pl
@udf(category="ts")
def ts_ewm(expr: pl.Expr, windows, partition_by=None, order_by=None):
"""指数加权移动平均"""
return expr.ewm_mean(halflife=windows).over(
partition_by=partition_by, order_by=order_by
)
cs.sql("ts_ewm(close, 10) as ewm10")
内置算子
基础算子 (base_udf):abs, log, sqrt, square, cube, sin, cos, tan, exp, sigmoid, sign, clip, trunc, between, cast, max, min, sum, mean, arg_max, arg_min, if_, fib 等
截面算子 (cs_udf):cs_rank, cs_zscore, cs_demean, cs_mean, cs_std, cs_var, cs_skew, cs_ic, cs_corr, cs_slope, cs_resid, cs_qcut, cs_midby, cs_meanby 等
时序算子 (ts_udf):ts_mean, ts_sum, ts_std, ts_var, ts_skew, ts_kurt, ts_max, ts_min, ts_mid, ts_delay, ts_delta, ts_mad 等
项目结构
src/codestr/
├── engine.py # CodeStr 引擎入口
├── compiler.py # AST → Polars Expr 编译器
├── parser.py # DSL 解析器 (Lark LALR grammar)
├── syntax.py # AST 节点定义
├── tokens.py # Token 定义
├── errors.py # 异常类型
└── udf/
├── registry.py # UDF 注册中心 (@udf 装饰器)
├── base_udf.py # 基础算子
├── cs_udf.py # 截面算子 (Cross-Section)
└── ts_udf.py # 时序算子 (Time-Series)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codestr-0.1.0.tar.gz.
File metadata
- Download URL: codestr-0.1.0.tar.gz
- Upload date:
- Size: 37.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7719317cfcc5af12b8467ff0a116a842890038ee9746277ecb8465a2cf9f8dd
|
|
| MD5 |
bbd7eef257418478421c288cd009cc12
|
|
| BLAKE2b-256 |
33380b048efbafd594ae0e2f22db82749ed568523ff64ca28140f96b3e48245c
|
Provenance
The following attestation bundles were made for codestr-0.1.0.tar.gz:
Publisher:
release.yml on huangbogeng/codestr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codestr-0.1.0.tar.gz -
Subject digest:
c7719317cfcc5af12b8467ff0a116a842890038ee9746277ecb8465a2cf9f8dd - Sigstore transparency entry: 1952509068
- Sigstore integration time:
-
Permalink:
huangbogeng/codestr@8fad3304f904d83c728ce702b59bea7cb4038d79 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/huangbogeng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8fad3304f904d83c728ce702b59bea7cb4038d79 -
Trigger Event:
release
-
Statement type:
File details
Details for the file codestr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codestr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd706d71085e84083636b29d0434a4efe06e5dd56423711ea578210e57cd1e87
|
|
| MD5 |
5f534466e04911d63eec020dec0f0795
|
|
| BLAKE2b-256 |
1a2ddf78fa2f6142b84d9386724a8d7fe919c43c9bb43ed7e41fe13d21d3342e
|
Provenance
The following attestation bundles were made for codestr-0.1.0-py3-none-any.whl:
Publisher:
release.yml on huangbogeng/codestr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codestr-0.1.0-py3-none-any.whl -
Subject digest:
dd706d71085e84083636b29d0434a4efe06e5dd56423711ea578210e57cd1e87 - Sigstore transparency entry: 1952509224
- Sigstore integration time:
-
Permalink:
huangbogeng/codestr@8fad3304f904d83c728ce702b59bea7cb4038d79 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/huangbogeng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8fad3304f904d83c728ce702b59bea7cb4038d79 -
Trigger Event:
release
-
Statement type: