eagle-eye-scraper 是一个高效的 Python 数据采集框架,支持分布式部署,适用于复杂页面和大规模数据采集。
Project description
Eagle-Eye Scraper
Eagle-Eye Scraper 是一个高效、灵活且具备原生分布式特性的 Python 数据采集框架。它支持静态/动态网页、API 数据采集,并通过模块化架构实现采集逻辑与业务逻辑的彻底解耦,适合构建可维护、可扩展的数据抓取系统。
✨ 核心特点
-
原生分布式设计 内置对分布式任务调度的支持,轻松扩展至多节点并发采集,适用于大规摸爬取任务。
-
通用采集能力 支持静态网页、JavaScript 渲染页面和 API 接口等多种数据源类型,适应各类业务需求。
-
逻辑解耦架构 采集引擎逻辑与业务处理逻辑完全分离,便于测试、维护与功能演进。
-
高性能任务调度 集成
APScheduler提供异步高效的定时调度能力,支持复杂的任务管理。 -
模块化与插件化设计 支持自定义采集器、过滤器、解析器等组件,方便二次开发和集成。
📦 安装方式
基础安装
pip install eagle-eye-scraper
安装可选依赖项
根据使用场景,可选择安装如下依赖:
| 组件 | 安装命令 |
|---|---|
| Redis | pip install "eagle-eye-scraper[redis]" |
| MongoDB | pip install "eagle-eye-scraper[mongodb]" |
| MySQL | pip install "eagle-eye-scraper[mysql]" |
| MinIO | pip install "eagle-eye-scraper[minio]" |
| Pulsar MQ | pip install "eagle-eye-scraper[mq]" |
| 多组件组合安装 | pip install "eagle-eye-scraper[redis,mongodb,minio]" |
💡 如果使用的是旧版 pip,请将
[]用引号括起来,例如:pip install "eagle-eye-scraper[mongo,redis]"
🧰 示例用法
from eagle_eye_scraper import Spider
class SimpleSpider(Spider):
def crawl(self, **kwargs):
# 模拟从网络抓取数据
self.raw_data = "<html><title>示例页面</title><body>Hello World</body></html>"
print("抓取完成")
def parse(self, **kwargs):
# 模拟对抓取数据的解析
title_start = self.raw_data.find("<title>") + 7
title_end = self.raw_data.find("</title>")
title = self.raw_data[title_start:title_end]
print(f"解析得到标题:{title}")
if __name__ == "__main__":
spider = SimpleSpider()
spider.run()
📄 License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eagle_eye_scraper-1.3.4.tar.gz.
File metadata
- Download URL: eagle_eye_scraper-1.3.4.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5fc298d6b6d7d185e3a793336c871a954f9889f8698646d1dc48f02f18c83f4
|
|
| MD5 |
89805764ebb1f4cfaf5bf54dfbbaf906
|
|
| BLAKE2b-256 |
a3f09a7c474e72bcd1146598325c1fad8613b2efbdf017c722e171bbf0fd5b3b
|
File details
Details for the file eagle_eye_scraper-1.3.4-py3-none-any.whl.
File metadata
- Download URL: eagle_eye_scraper-1.3.4-py3-none-any.whl
- Upload date:
- Size: 41.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5d57a23cfede42435624cdd7d5298be8ccda603c8ffb339c98cda9bd1c66138
|
|
| MD5 |
9c19aa87e6d92362e46ea723af4a24c8
|
|
| BLAKE2b-256 |
89c7ba912cda7aceab2b270b0f7a444db889210e7e266e9037b29ccc51d9ad94
|