Skip to main content

基于pyinstrument的Flask应用性能监控和告警工具

Project description

Web Performance Monitor

PyPI version Python Support License: MIT

基于pyinstrument的Flask应用性能监控和告警工具,提供零入侵的性能监控解决方案。

✨ 功能特性

  • 🚀 零入侵监控: 通过中间件和装饰器模式实现无侵入性集成
  • 性能优先: 监控工具本身的性能开销控制在5%以内
  • 🔧 灵活配置: 支持环境变量、配置文件和代码配置三种方式
  • 📊 详细报告: 基于pyinstrument生成详细的HTML性能分析报告
  • 🔔 多种通知: 支持本地文件和Mattermost通知方式
  • 🛡️ 容错机制: 所有监控和通知错误都不影响原应用正常运行
  • 📈 智能告警: 基于时间窗口的重复告警去重机制
  • 🔍 参数追踪: 自动提取和记录请求参数,支持敏感信息脱敏
  • 🏷️ 追踪支持: 支持TraceID、SpanID等分布式追踪标识

🚀 快速开始

安装

# 基础安装
pip install web-performance-monitor

# 包含Mattermost支持
pip install web-performance-monitor[mattermost]

# 包含Sanic框架支持
pip install web-performance-monitor[sanic]

# 包含所有可选功能
pip install web-performance-monitor[all]

# 开发环境安装
pip install web-performance-monitor[dev]

5分钟快速接入

支持的框架

  • Flask - WSGI中间件模式(推荐)
  • Django - WSGI中间件模式
  • FastAPI - ASGI中间件模式
  • Sanic - 专用中间件模式 📖详细文档
  • 其他WSGI/ASGI框架 - 通用中间件模式
  • Django - WSGI中间件模式
  • FastAPI - ASGI中间件模式
  • Sanic - 专用中间件模式
  • 其他WSGI/ASGI框架 - 通用中间件模式
  • 任意函数 - 装饰器模式

1. Flask中间件模式(推荐)

最简单的接入方式,自动监控所有HTTP请求:

from flask import Flask
from web_performance_monitor import PerformanceMonitor, Config

app = Flask(__name__)

# 基础配置
config = Config(
    threshold_seconds=1.0,              # 响应时间阈值
    enable_local_file=True,             # 启用本地文件通知
    local_output_dir="/tmp/reports",    # 输出目录
)

monitor = PerformanceMonitor(config)

# 零入侵集成 - 只需要这一行代码!
app.wsgi_app = monitor.create_middleware()(app.wsgi_app)

@app.route('/api/users')
def get_users():
    # 业务逻辑 - 会被自动监控
    return {"users": []}

if __name__ == '__main__':
    app.run()

2. 装饰器模式

监控特定的关键函数:

from web_performance_monitor import PerformanceMonitor, Config

config = Config(threshold_seconds=0.5)
monitor = PerformanceMonitor(config)

@monitor.create_decorator()
def slow_database_query(user_id):
    # 关键业务逻辑 - 独立监控
    return database.query_user_data(user_id)

@monitor.create_decorator()
def complex_calculation(data):
    # 复杂计算逻辑
    return process_complex_data(data)

3. Sanic框架集成

Sanic异步框架的专用集成方式:

from sanic import Sanic
from web_performance_monitor import PerformanceMonitor, Config

app = Sanic("MyApp")

# 配置性能监控
config = Config(
    threshold_seconds=0.5,
    enable_local_file=True,
    local_output_dir="./sanic_reports"
)

monitor = PerformanceMonitor(config)

# 创建Sanic适配器
from web_performance_monitor.adapters.sanic import SanicAdapter
sanic_adapter = SanicAdapter(monitor)

# 请求中间件 - 开始监控
@app.middleware('request')
async def monitor_request(request):
    return sanic_adapter._monitor_sanic_request(request)

# 响应中间件 - 完成监控
@app.middleware('response')
async def monitor_response(request, response):
    sanic_adapter.process_response(request, response)

@app.route('/api/users')
async def get_users(request):
    # 业务逻辑 - 会被自动监控
    return json({"users": []})

# 装饰器模式也支持异步函数
@monitor.create_decorator()
async def async_database_query(user_id):
    # 异步数据库查询
    await asyncio.sleep(0.1)
    return {"id": user_id, "name": f"User {user_id}"}

if __name__ == '__main__':
    app.run(host="127.0.0.1", port=8000)

4. 环境变量配置

生产环境推荐使用环境变量配置:

# 基础配置
export WPM_THRESHOLD_SECONDS=2.0
export WPM_ALERT_WINDOW_DAYS=7
export WPM_ENABLE_LOCAL_FILE=true
export WPM_LOCAL_OUTPUT_DIR=/var/log/performance

# Mattermost通知配置
export WPM_ENABLE_MATTERMOST=true
export WPM_MATTERMOST_SERVER_URL=https://mattermost.example.com
export WPM_MATTERMOST_TOKEN=your-bot-token
export WPM_MATTERMOST_CHANNEL_ID=your-channel-id
from web_performance_monitor import Config, PerformanceMonitor

# 从环境变量自动加载配置
config = Config.from_env()
monitor = PerformanceMonitor(config)

# 应用到Flask应用
app.wsgi_app = monitor.create_middleware()(app.wsgi_app)

📋 详细接入指南

Flask应用接入

方式1: 应用工厂模式

from flask import Flask
from web_performance_monitor import PerformanceMonitor, Config

def create_app():
    app = Flask(__name__)
    
    # 配置监控
    config = Config(
        threshold_seconds=1.0,
        enable_local_file=True,
        local_output_dir="/var/log/performance"
    )
    
    monitor = PerformanceMonitor(config)
    app.wsgi_app = monitor.create_middleware()(app.wsgi_app)
    
    return app

app = create_app()

方式2: 蓝图应用

from flask import Flask, Blueprint
from web_performance_monitor import PerformanceMonitor, Config

# 创建蓝图
api_bp = Blueprint('api', __name__, url_prefix='/api')

@api_bp.route('/users')
def get_users():
    return {"users": []}

# 主应用
app = Flask(__name__)
app.register_blueprint(api_bp)

# 应用监控(会监控所有蓝图的路由)
config = Config.from_env()
monitor = PerformanceMonitor(config)
app.wsgi_app = monitor.create_middleware()(app.wsgi_app)

方式3: 条件监控

import os
from flask import Flask
from web_performance_monitor import PerformanceMonitor, Config

app = Flask(__name__)

# 只在生产环境启用监控
if os.getenv('FLASK_ENV') == 'production':
    config = Config(
        threshold_seconds=2.0,  # 生产环境阈值更高
        enable_mattermost=True,
        mattermost_server_url=os.getenv('MATTERMOST_URL'),
        mattermost_token=os.getenv('MATTERMOST_TOKEN'),
        mattermost_channel_id=os.getenv('MATTERMOST_CHANNEL')
    )
    monitor = PerformanceMonitor(config)
    app.wsgi_app = monitor.create_middleware()(app.wsgi_app)

函数监控接入

数据库操作监控

from web_performance_monitor import PerformanceMonitor, Config

config = Config(threshold_seconds=0.5)
monitor = PerformanceMonitor(config)

@monitor.create_decorator()
def query_user_data(user_id):
    """监控数据库查询性能"""
    return db.session.query(User).filter_by(id=user_id).first()

@monitor.create_decorator()
def bulk_insert_data(data_list):
    """监控批量插入性能"""
    return db.session.bulk_insert_mappings(DataModel, data_list)

外部API调用监控

import requests
from web_performance_monitor import PerformanceMonitor, Config

config = Config(threshold_seconds=3.0)  # API调用阈值设置更高
monitor = PerformanceMonitor(config)

@monitor.create_decorator()
def call_external_api(endpoint, data):
    """监控外部API调用"""
    response = requests.post(f"https://api.example.com/{endpoint}", json=data)
    return response.json()

@monitor.create_decorator()
def fetch_user_profile(user_id):
    """监控用户资料获取"""
    return call_external_api(f"users/{user_id}", {})

计算密集型任务监控

@monitor.create_decorator()
def calculate_risk_score(data):
    """监控风险评分计算"""
    # 复杂的计算逻辑
    return complex_algorithm(data)

@monitor.create_decorator()
def generate_report(report_type, filters):
    """监控报告生成"""
    return report_generator.create_report(report_type, filters)

⚙️ 配置选项

完整配置表

配置项 环境变量 默认值 说明
threshold_seconds WPM_THRESHOLD_SECONDS 1.0 响应时间阈值(秒)
alert_window_days WPM_ALERT_WINDOW_DAYS 10 重复告警时间窗口(天)
max_performance_overhead WPM_MAX_PERFORMANCE_OVERHEAD 0.05 最大性能开销(5%)
enable_local_file WPM_ENABLE_LOCAL_FILE true 启用本地文件通知
local_output_dir WPM_LOCAL_OUTPUT_DIR /tmp 本地文件输出目录
enable_mattermost WPM_ENABLE_MATTERMOST false 启用Mattermost通知
mattermost_server_url WPM_MATTERMOST_SERVER_URL - Mattermost服务器URL
mattermost_token WPM_MATTERMOST_TOKEN - Mattermost访问令牌
mattermost_channel_id WPM_MATTERMOST_CHANNEL_ID - Mattermost频道ID
url_blacklist WPM_URL_BLACKLIST [] URL黑名单(逗号分隔,支持正则)
enable_url_blacklist WPM_ENABLE_URL_BLACKLIST true 启用URL黑名单功能
log_level WPM_LOG_LEVEL INFO 日志级别

配置示例

开发环境配置

config = Config(
    threshold_seconds=0.5,      # 开发环境阈值较低
    alert_window_days=1,        # 短时间窗口
    enable_local_file=True,
    local_output_dir="./dev_reports",
    enable_mattermost=False,    # 开发环境不发送通知
    log_level="DEBUG"
)

测试环境配置

config = Config(
    threshold_seconds=1.0,
    alert_window_days=3,
    enable_local_file=True,
    local_output_dir="/var/log/test_performance",
    enable_mattermost=True,
    mattermost_server_url="https://test-mattermost.company.com",
    mattermost_token=os.getenv('TEST_MATTERMOST_TOKEN'),
    mattermost_channel_id="test-alerts",
    log_level="INFO"
)

生产环境配置

config = Config(
    threshold_seconds=2.0,      # 生产环境阈值较高
    alert_window_days=7,        # 较长的去重窗口
    max_performance_overhead=0.03,  # 更严格的性能要求
    enable_local_file=True,
    local_output_dir="/var/log/performance",
    enable_mattermost=True,
    mattermost_server_url=os.getenv('MATTERMOST_URL'),
    mattermost_token=os.getenv('MATTERMOST_TOKEN'),
    mattermost_channel_id="production-alerts",
    log_level="WARNING"
)

� UR功L黑名单功能

永久屏蔽无法优化的接口

在实际生产环境中,某些业务接口由于历史原因或复杂性无法快速优化,可以使用URL黑名单功能永久屏蔽告警。

基本配置

from web_performance_monitor import Config, PerformanceMonitor

config = Config(
    threshold_seconds=1.0,
    url_blacklist=[
        '/api/legacy/.*',           # 遗留API(正则匹配)
        '/health',                  # 健康检查(精确匹配)
        '.*\\.(jpg|png|gif)$',     # 图片资源(正则匹配)
        '/api/slow-report/.*'       # 已知慢接口
    ],
    enable_url_blacklist=True
)

monitor = PerformanceMonitor(config)

环境变量配置

# 多个URL用逗号分隔,支持正则表达式
export WPM_URL_BLACKLIST="/api/legacy/.*,/health,/metrics,.*\\.(css|js)$"
export WPM_ENABLE_URL_BLACKLIST="true"

动态管理黑名单

# 添加黑名单规则
config.add_blacklist_url('/api/temp/.*')

# 移除黑名单规则
config.remove_blacklist_url('/api/temp/.*')

# 检查URL是否被屏蔽
is_blocked = config.is_url_blacklisted('/api/legacy/old-function')

常用黑名单模式

# 生产环境推荐配置
url_blacklist = [
    # 遗留系统接口
    '/api/legacy/.*',
    '/api/v1/old/.*',
    
    # 系统监控接口
    '/health',
    '/metrics',
    '/status',
    '/ping',
    
    # 静态资源
    '.*\\.(jpg|png|gif|ico|svg)$',
    '.*\\.(css|js|woff|ttf|eot)$',
    
    # 管理员接口(已知较慢)
    '/admin/.*',
    '/management/.*',
    
    # 报告和导出接口(业务需要,已知较慢)
    '/api/reports/generate/.*',
    '/api/export/.*',
    '/api/download/.*',
    
    # 第三方回调接口
    '/webhook/.*',
    '/callback/.*',
    
    # 调试和开发接口
    '/debug/.*',
    '/dev/.*'
]

黑名单匹配逻辑

  • 支持正则表达式匹配,提供强大的模式匹配能力
  • 同时检查完整URL端点路径
  • 匹配成功的请求会跳过告警,但仍会被监控统计
  • 自动验证正则表达式有效性,无效模式会被忽略

🔧 高级功能

监控统计信息

# 获取监控统计
stats = monitor.get_stats()
print(f"总请求数: {stats['total_requests']}")
print(f"慢请求数: {stats['slow_requests']}")
print(f"慢请求率: {stats['slow_request_rate']:.1f}%")
print(f"告警发送数: {stats['alerts_sent']}")

# 获取性能开销统计
overhead_stats = stats.get('overhead_stats', {})
print(f"平均开销: {overhead_stats.get('average_overhead', 0):.2%}")

测试告警系统

# 测试告警配置是否正常
test_results = monitor.test_alert_system()
if test_results['success']:
    print("✅ 告警系统配置正常")
    for notifier, result in test_results['notifier_results'].items():
        print(f"  {notifier}: {'✅' if result else '❌'}")
else:
    print(f"❌ 告警系统配置错误: {test_results['error']}")

重置监控数据

# 重置所有统计数据
monitor.reset_stats()
print("监控统计已重置")

动态配置更新

# 运行时更新配置
monitor.update_config(
    threshold_seconds=3.0,
    enable_mattermost=False
)

📊 告警报告说明

HTML报告内容

生成的HTML报告包含以下信息:

  • 基本信息: 请求URL、方法、状态码、响应时间
  • 请求参数: JSON参数、查询参数、表单数据(敏感信息自动脱敏)
  • 请求头信息: User-Agent、Accept、TraceID等追踪信息
  • 性能分析: 基于pyinstrument的详细性能分析图表
  • 调用栈: 函数调用层次和耗时分布
  • 系统信息: 服务器时间、Python版本等环境信息

敏感信息保护

系统自动识别并脱敏以下敏感信息:

  • 密码字段(password、passwd、pwd等)
  • 令牌字段(token、auth、authorization等)
  • 密钥字段(key、secret、credential等)

🎯 最佳实践

1. 阈值设置建议

# 不同环境的推荐阈值
THRESHOLDS = {
    'development': 0.5,    # 开发环境:快速发现问题
    'testing': 1.0,        # 测试环境:模拟真实场景
    'staging': 1.5,        # 预发布环境:接近生产环境
    'production': 2.0,     # 生产环境:避免误报
}

config = Config(
    threshold_seconds=THRESHOLDS.get(os.getenv('ENV', 'development'), 1.0)
)

2. 监控范围控制

# 只监控关键API
@app.route('/api/critical-operation')
def critical_operation():
    # 这个端点会被监控
    return process_critical_data()

# 排除健康检查等高频端点
@app.route('/health')
def health_check():
    # 可以通过路径过滤排除此类端点
    return {"status": "ok"}

3. 生产环境部署

import os
from web_performance_monitor import Config, PerformanceMonitor

# 生产环境配置
config = Config(
    threshold_seconds=float(os.getenv('WPM_THRESHOLD', '2.0')),
    alert_window_days=int(os.getenv('WPM_WINDOW_DAYS', '7')),
    enable_local_file=True,
    local_output_dir=os.getenv('WPM_LOG_DIR', '/var/log/performance'),
    enable_mattermost=os.getenv('WPM_ENABLE_MATTERMOST', 'false').lower() == 'true',
    mattermost_server_url=os.getenv('MATTERMOST_URL'),
    mattermost_token=os.getenv('MATTERMOST_TOKEN'),
    mattermost_channel_id=os.getenv('MATTERMOST_CHANNEL'),
    log_level=os.getenv('WPM_LOG_LEVEL', 'WARNING')
)

monitor = PerformanceMonitor(config)

4. 日志管理

# 设置日志轮转(推荐使用logrotate)
# /etc/logrotate.d/web-performance-monitor
/var/log/performance/*.html {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 644 www-data www-data
}

🔍 故障排除

常见问题

1. Mattermost连接失败

# 检查配置
config = Config.from_env()
monitor = PerformanceMonitor(config)

# 测试连接
test_results = monitor.test_alert_system()
if not test_results['success']:
    print(f"连接失败: {test_results['error']}")
    
# 常见解决方案:
# - 确保server_url包含完整协议 (https://)
# - 验证token和channel_id的正确性
# - 检查网络连接和防火墙设置

2. 性能开销过高

# 检查性能开销
stats = monitor.get_stats()
overhead = stats.get('overhead_stats', {}).get('average_overhead', 0)

if overhead > 0.05:  # 超过5%
    print(f"⚠️ 性能开销过高: {overhead:.2%}")
    # 建议:提高阈值或减少监控频率
    monitor.update_config(threshold_seconds=3.0)

3. 告警文件过多

# 清理旧的告警文件
find /var/log/performance -name "*.html" -mtime +30 -delete

# 或者在配置中设置更长的告警窗口
export WPM_ALERT_WINDOW_DAYS=30

📚 示例项目

查看 examples/ 目录获取更多示例:

  • quick_start.py - 5分钟快速开始
  • flask_middleware_example.py - Flask中间件完整示例
  • decorator_example.py - 装饰器使用示例
  • production_example.py - 生产环境配置示例
  • advanced_usage.py - 高级功能使用示例

🤝 贡献

欢迎提交Issue和Pull Request!

开发环境设置

# 克隆项目
git clone https://github.com/your-repo/web-performance-monitor.git
cd web-performance-monitor

# 安装开发依赖
pip install -e ".[dev]"

# 运行测试
pytest

# 代码格式化
black web_performance_monitor/ tests/
isort web_performance_monitor/ tests/

# 类型检查
mypy web_performance_monitor/

构建和发布

# 使用Makefile
make clean build test

# 或使用脚本
python scripts/build_and_test.py
python scripts/release.py 1.0.1 --test  # 发布到测试PyPI

📄 许可证

MIT License - 详见 LICENSE 文件

🔗 相关链接

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_performance_monitor-1.2.3.tar.gz (61.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

web_performance_monitor-1.2.3-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file web_performance_monitor-1.2.3.tar.gz.

File metadata

  • Download URL: web_performance_monitor-1.2.3.tar.gz
  • Upload date:
  • Size: 61.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for web_performance_monitor-1.2.3.tar.gz
Algorithm Hash digest
SHA256 433c1490a8dfadab70c32d8313af743e8b9a6c1aad9918a2b01de0afc0182b12
MD5 92ff049f41cfd10db7f8d0df0b9aaf0e
BLAKE2b-256 acb97ca3df2f4194ea9587ccce543710103616a3ecb4d319e16150c53b3e8c91

See more details on using hashes here.

File details

Details for the file web_performance_monitor-1.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for web_performance_monitor-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 def906d5819a40e562f410fda674ea5bce242359de2f77dc88d08173641b2e61
MD5 ae4f62761fcdc85ce903efab56a4eaf4
BLAKE2b-256 e7c3910f64cf3d77377d6920ced3cb4aba5bd56251a537eff7595a7945a5dc4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page