Skip to main content

Defensive monitoring layer for finlab data.get API - detect unexpected data changes

Project description

finlab-sentinel

Python versions Windows Linux macOS License: MIT CI coverage

finlab-sentinelfinlab 套件的防禦層,用於監控 data.get API 的資料變化,防止未預期的資料異動影響回測或選股結果。

功能特色

  • 自動比對: 每次 data.get 時自動比對歷史資料
  • 滾動備份: 保留 7 天(可配置)的備份資料
  • 智慧檢測:
    • 數值容差比對(可配置 rtol/atol)
    • dtype 變更檢測
    • NA 類型差異檢測(pd.NA vs np.nan vs None)
  • 彈性政策:
    • append_only: 只允許新增,不允許刪除或修改歷史
    • threshold: 允許小幅度變更(如 10% 以內)
    • 黑名單配置:指定可修改歷史的資料集
  • 可配置行為:
    • 拋出例外(預設)
    • 警告並使用快取
    • 警告並使用新資料
  • Preprocess Hook: 比對前預處理(如四捨五入),支援萬用字元模式
  • 通知機制: 支援自訂 callback(如 LINE、email 通知)
  • CLI 工具: 管理備份、查看差異、接受新資料

安裝

pip install finlab-sentinel

或使用 uv:

uv add finlab-sentinel

快速開始

import finlab_sentinel

# 啟用 sentinel
finlab_sentinel.enable()

# 正常使用 finlab
from finlab import data
close = data.get('price:收盤價')  # 自動備份並比對

# 如果資料異常,會根據配置拋出例外或警告

配置

建立 sentinel.toml 檔案:

[storage]
path = "~/.finlab-sentinel/"
retention_days = 7

[comparison]
rtol = 1e-5
change_threshold = 0.10

[comparison.policies]
default_mode = "append_only"
history_modifiable = ["fundamental_features:某些財報資料"]

[anomaly]
behavior = "raise"  # raise | warn_return_cached | warn_return_new
save_reports = true

# 可選:設定通知 callback
# callback = "myproject.notifications:send_line"

CLI 使用

# 列出所有備份
sentinel list

# 清理過期備份
sentinel cleanup --days 14

# 查看資料差異
sentinel diff "price:收盤價"

# 接受新資料作為基準
sentinel accept "price:收盤價" --reason "確認資料修正"

# 匯出備份
sentinel export "price:收盤價" -o ./backup.parquet

處理資料異常

當檢測到資料異常時:

from finlab_sentinel import DataAnomalyError

try:
    close = data.get('price:收盤價')
except DataAnomalyError as e:
    print(f"資料異常: {e.report.summary}")
    # 檢查報告詳情
    print(f"變動比例: {e.report.comparison_result.change_ratio:.1%}")

    # 如果確認要接受新資料
    from finlab_sentinel.core.interceptor import accept_current_data
    accept_current_data('price:收盤價', reason="確認資料修正")

Preprocess Hook

Preprocess hook 讓你可以在比對前先對資料做預處理,例如四捨五入、排序欄位等。這在處理預期的浮點數精度差異時特別有用。

注意: 預處理只用於比對,回傳給使用者的永遠是原始資料。

import finlab_sentinel

# 註冊特定 dataset 的 preprocess hook
finlab_sentinel.register_preprocess_hook(
    "price:收盤價",
    lambda df: df.round(2)  # 四捨五入到小數第二位
)

# 支援萬用字元模式
finlab_sentinel.register_preprocess_hook(
    "price:*",  # 符合所有 price: 開頭的 dataset
    lambda df: df.round(2)
)

# 也支援 ? 萬用字元(符合單一字元)
finlab_sentinel.register_preprocess_hook(
    "price:?",
    lambda df: df.round(2)
)

finlab_sentinel.enable()

# 使用 finlab
from finlab import data
close = data.get('price:收盤價')  # 比對時會先 round(2),但回傳原始資料

進階用法

import finlab_sentinel

# 自訂預處理函式
def normalize_for_comparison(df):
    """標準化 DataFrame 以忽略預期的差異"""
    df = df.copy()
    # 四捨五入數值欄位
    numeric_cols = df.select_dtypes(include=['float64', 'float32']).columns
    df[numeric_cols] = df[numeric_cols].round(4)
    # 排序欄位(忽略欄位順序差異)
    df = df[sorted(df.columns)]
    return df

finlab_sentinel.register_preprocess_hook("fundamental_features:*", normalize_for_comparison)

# 取消註冊
finlab_sentinel.unregister_preprocess_hook("price:收盤價")

# 清除所有 hooks
finlab_sentinel.clear_preprocess_hooks()

優先順序

當多個 pattern 都符合時,精確匹配優先於萬用字元匹配:

finlab_sentinel.register_preprocess_hook("price:*", lambda df: df.round(1))
finlab_sentinel.register_preprocess_hook("price:收盤價", lambda df: df.round(2))

# "price:收盤價" 會使用 round(2)(精確匹配)
# "price:開盤價" 會使用 round(1)(萬用字元匹配)

自訂通知

def send_line_notification(report):
    """當檢測到異常時發送 LINE 通知"""
    import requests
    requests.post(
        "https://notify-api.line.me/api/notify",
        headers={"Authorization": f"Bearer {LINE_TOKEN}"},
        data={"message": f"finlab 資料異常: {report.summary}"}
    )

# 在 sentinel.toml 中設定
# [anomaly]
# callback = "myproject.notifications:send_line_notification"

開發

# Clone 專案
git clone https://github.com/yourusername/finlab-sentinel
cd finlab-sentinel

# 使用 uv 安裝開發依賴
uv sync --dev

# 執行測試
uv run pytest

# 執行 lint
uv run ruff check src/ tests/
uv run mypy src/

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finlab_sentinel-0.1.4.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finlab_sentinel-0.1.4-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file finlab_sentinel-0.1.4.tar.gz.

File metadata

  • Download URL: finlab_sentinel-0.1.4.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for finlab_sentinel-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1743f52a06d8ecdf12cb8de25f1e7c611ebe608ded97b7c37cec2164f601d303
MD5 dc2ed0fa3c36c41ee718d683503a00fe
BLAKE2b-256 649f669748e6ef7d2b2f433444721d9d250324d2393ede660cea74b8fac6b5cf

See more details on using hashes here.

File details

Details for the file finlab_sentinel-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for finlab_sentinel-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2ad7d96c7537e658fdeb3ebfaf55c8c275d781b6050707d9c5b38a003fc5a3bd
MD5 9313ec6901e1bc4ff40d2d21358a78c8
BLAKE2b-256 0e82304a7f1e6ce2056cf871f5b14cf3171c1a341fefab4e890d1d928f98ceb0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page