Skip to main content

eagle-eye-scraper 是一个高效的 Python 数据采集框架,支持分布式部署,适用于复杂页面和大规模数据采集。

Project description

Eagle-Eye Scraper

Eagle-Eye Scraper 是一个高效、灵活且具备原生分布式特性的 Python 数据采集框架。它支持静态/动态网页、API 数据采集,并通过模块化架构实现采集逻辑与业务逻辑的彻底解耦,适合构建可维护、可扩展的数据抓取系统。


✨ 核心特点

  • 原生分布式设计 内置对分布式任务调度的支持,轻松扩展至多节点并发采集,适用于大规摸爬取任务。

  • 通用采集能力 支持静态网页、JavaScript 渲染页面和 API 接口等多种数据源类型,适应各类业务需求。

  • 逻辑解耦架构 采集引擎逻辑与业务处理逻辑完全分离,便于测试、维护与功能演进。

  • 高性能任务调度 集成 APScheduler 提供异步高效的定时调度能力,支持复杂的任务管理。

  • 模块化与插件化设计 支持自定义采集器、过滤器、解析器等组件,方便二次开发和集成。


📦 安装方式

基础安装

pip install eagle-eye-scraper

安装可选依赖项

根据使用场景,可选择安装如下依赖:

组件 安装命令
Redis pip install "eagle-eye-scraper[redis]"
MongoDB pip install "eagle-eye-scraper[mongodb]"
MySQL pip install "eagle-eye-scraper[mysql]"
MinIO pip install "eagle-eye-scraper[minio]"
Pulsar MQ pip install "eagle-eye-scraper[mq]"
多组件组合安装 pip install "eagle-eye-scraper[redis,mongodb,minio]"

💡 如果使用的是旧版 pip,请将 [] 用引号括起来,例如:

pip install "eagle-eye-scraper[mongo,redis]"

🧰 示例用法

from eagle_eye_scraper import Spider

class SimpleSpider(Spider):
    def crawl(self, **kwargs):
        # 模拟从网络抓取数据
        self.raw_data = "<html><title>示例页面</title><body>Hello World</body></html>"
        print("抓取完成")

    def parse(self, **kwargs):
        # 模拟对抓取数据的解析
        title_start = self.raw_data.find("<title>") + 7
        title_end = self.raw_data.find("</title>")
        title = self.raw_data[title_start:title_end]
        print(f"解析得到标题:{title}")

if __name__ == "__main__":
    spider = SimpleSpider()
    spider.run()

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eagle_eye_scraper-1.3.3.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eagle_eye_scraper-1.3.3-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file eagle_eye_scraper-1.3.3.tar.gz.

File metadata

  • Download URL: eagle_eye_scraper-1.3.3.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.10

File hashes

Hashes for eagle_eye_scraper-1.3.3.tar.gz
Algorithm Hash digest
SHA256 787c2c24278bfac7f4fc4ea0dc074ebf9dc71c264a77480dd0a9618fb8773fde
MD5 cb7242d513045f8d89a9f21e61a2748c
BLAKE2b-256 f2d2f149524efacfd4b1d2c15caf66a69ad9fa097e09929e5ac077bb8af5d653

See more details on using hashes here.

File details

Details for the file eagle_eye_scraper-1.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for eagle_eye_scraper-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9663a39252327ef583017eb9e0a23f8e49d64243661e15329d3c8262ebac479d
MD5 ee663ff417634a19fad23f5704ade6f1
BLAKE2b-256 9640a78d0a16affb18865aff5bf5090a1b204eb2d29fff0ce5a470c0cf95306d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page