A data-orchestration framework simplifying perf opt & dev with unified way
Project description
| - | - |
|---|---|
| 库分发 | scalim scalim-cli scalim-yaml-dsl-lsp |
| 文档生成器 | |
| 项目工具 | |
| 配套前端 |
简介
Scalim 是一个基于字段依赖和数据源加载关系的数据编排框架, 通过统一的方式控制内存占用和资源调度方案, 简化性能优化门槛和开发难度.
- 可以用 Python 编写需求
from scalim.execution.engine import ScalimEngine
from scalim.execution.runtime_bindings import RuntimeBindings
from scalim.planning import PlanBuilder
from scalim.sinks.memory import InMemoryRowDataSink
from scalim.spec.ir import CallBySpecIr, CallByValueIr, DemandIr, DerivedFieldIr, FieldIr, MainSourceIr, RuntimeHandleIdIr
def load_orders(**_kwargs):
raise NotImplementedError
def calc_amount_x2(amount):
return amount * 2
orders = MainSourceIr(source_id="orders", loader_ref=RuntimeHandleIdIr(handle_id="orders.loader"))
demand = DemandIr.from_irs(
sources=[],
main_source=orders,
fields=(
FieldIr(field_id="order_id", name="订单ID", source=orders),
FieldIr(field_id="amount", name="金额", source=orders),
DerivedFieldIr(
field_id="amount_x2",
name="金额*2",
dependencies=("amount",),
call_by=CallBySpecIr(
reference=RuntimeHandleIdIr(handle_id="amount_x2.calculator"),
kwargs=(("amount", CallByValueIr(kind="field", value="amount")),),
field_names=("amount",),
),
),
),
name="orders_report",
)
plan = PlanBuilder(demand).build()
runtime_bindings = RuntimeBindings(
main_source_loaders={"orders": load_orders},
derived_calculators={"amount_x2": calc_amount_x2},
)
engine = ScalimEngine(demand=demand, plan=plan, runtime_bindings=runtime_bindings, batch_size=1000, parallel_mode="seq")
sink = InMemoryRowDataSink()
engine.run(sink=sink)
rows = sink.get_data()
- 也可以用 YAML DSL 配置需求
name: orders_report
main_source:
source_id: orders
loader: "myapp.loaders:load_orders"
fields:
order_id:
name: 订单ID
# 主源字段,用于派生计算
amount:
name: 金额
# 关联键字段
pay_id:
name: 支付ID
sources:
payments:
loader: "myapp.loaders:load_payments"
key: id
params:
ids: {$keys: {as: set}}
fields:
method:
name: 支付方式
extract: payment_method
relation: orders_to_payments
relations:
orders_to_payments:
steps:
- from: orders.pay_id
to: payments.id
fields:
total_amount:
name: 总金额
compute: "amount * 2"
outputs:
- name: detail
to: {file: detail_csv}
write: {header_fields_output_by: name}
fields: [order_id, method, total_amount]
resources:
files:
detail_csv:
csv_file:
path: ./output
快速上手
# 加入到你的项目
uv add scalim
# 加入到你的环境
uv pip install scalim
# 交互式教程
just notebook
主要特性
- 可配置自适应并发执行: 大部分情况无需手动优化 — 运行时自动为你找到最优执行路径
- 自动识别并发机会:基于依赖图的拓扑分析
- Fan-out/Fan-in 编排:独立任务并行执行,依赖任务串行化
- 资源感知调度:根据任务数量、数据量、CPU 资源动态调整
- 快速失败回退:并发失败自动降级到串行模式
- 生产级可观测性: 16+ 种事件类型 + 4 种预设 Observer
- PerformanceObserver:吞吐量、延迟统计
- MemoryOptimizationObserver:内存释放追踪
- RelationObserver:关系查找命中率
- ExecutionTraceObserver:完整执行链路追踪
- 运行时防护机制: 内置 Guardrails 系统,提供策略模式错误处理(quiet / fast_fail),可自定义 Loader 级别的错误策略,实现细粒度容错控制
- 低内存模式: 内置字段剪枝、字段释放和行级释放,尽量只保留当前批次真正还要用的数据,减少上下文占用(内存占用)
- 多种编写方式: 支持直接用
Python描述计算逻辑,也支持用YAML DSL写配置,配套 JSON Schema 补全/校验 +scalim-cli语义校验 + LSP/IDE 集成,写配置时更容易补全、检查和落地 - 多种写入支持: 支持批量执行、流式输出和行式/列式 sink,方便在吞吐、内存和输出形式之间做取舍
- 方便集成AI开发环境: 支持 agent skill 集成
- 可视化在线工具: 有可视化在线工具做回放和排查,执行计划、事件流和 trace 都能接起来看
更多见 参考文档
质量保证
- 100% 核心测试覆盖率 (低于 100% 强制 CI 失败)
- 基于 pyright 的类型检查
src/scalim/默认走更严格的basedpyright规则,已启用Phase 1+Phase 2核心规则;notebooks与packages/scalim-cli等边界区域按分层策略定向放宽Python 3.6兼容除语法检查外,还额外验证隔离环境中的typing-extensions==4.1.1- Ruff 全量规则通过
设计哲学
- Core First:核心运行时与方言/CLI 解耦
- Type Safety:完整的类型注解,支持静态分析
- Observable:默认可观测,而非事后补丁
- Extensible:通过 Hook/Observer/Policy 三大扩展点支持自定义
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scalim-0.9.10.tar.gz
(1.8 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scalim-0.9.10.tar.gz.
File metadata
- Download URL: scalim-0.9.10.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90353ace90b178015f49a6b21d7d5c62c65985675902e46fa75f43acd2dc5cd9
|
|
| MD5 |
f2780f9119d02d8cd916e81955937fa6
|
|
| BLAKE2b-256 |
a4a09914494af0ded2f475602df2e3f38aa3bc7d40a0888c1a9453b03872ebd3
|
File details
Details for the file scalim-0.9.10-py3-none-any.whl.
File metadata
- Download URL: scalim-0.9.10-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0f8804c2aabcb053f85fe8a791a243eda91be947f330ee878a7e9e7b971904c
|
|
| MD5 |
0763b95a03edc53ea448f9a5256d72db
|
|
| BLAKE2b-256 |
7962f194eeeb32ee1efc742bad0bcb5d7500b46ed6ecc8ae67db90cf207c5bf7
|