Cloudflare 5s 盾自动求解 + 页面批量抓取工具（基于 CloakBrowser）

These details have not been verified by PyPI

Project links

Project description

CF Killer

基于 CloakBrowser（Chromium C++ 源码级反检测浏览器）的 Cloudflare 5 秒盾自动求解 + 页面批量抓取 工具。

1. 运行环境

项目	说明
OS	Windows 10+ / Linux / macOS
Python	3.9+（推荐 3.11）
浏览器	CloakBrowser 专用 Chromium（自动下载，~200MB）

2. 依赖安装

# 1. 安装 cloakbrowser（含 Playwright）
pip install cloakbrowser

# 2. 下载特制 Chromium 二进制（首次运行前执行一次）
python -c "import cloakbrowser; cloakbrowser.ensure_binary()"

核心依赖链：

cloakbrowser (C++ 源码级反检测 Chromium)
  ├── playwright >= 1.40        # 浏览器自动化
  ├── httpx >= 0.24             # HTTP 客户端
  └── greenlet >= 3.1.1         # 协程支持

3. 功能概述

3.1 Cloudflare 自动解盾 (`CFSolver`)

自动检测并求解 Cloudflare Turnstile 挑战，支持多种 challenge 类型：

类型	策略
`non-interactive`	纯轮询等待 CF 自动放行
`managed`	等待 iframe → 点击 checkbox → 轮询消失
`interactive`	同上，带更复杂的点击路径
`embedded`	嵌入式 Turnstile 求解

点击采用四路径递进策略：iframe 内精确选择器 → iframe 坐标点击 → 主页面容器坐标 → Tab+Space 兜底。

3.2 页面批量抓取 (`CFPageFetcher`)

基于 CloakBrowser 持久化上下文，复用浏览器指纹和 cookie
内置 CF 检测（支持 JS 延迟写入标题的站点，如 ScienceDirect）
自动 context 回收：处理 N 页后重建浏览器上下文，防止内存泄漏
延迟回收机制：并发场景下等活跃页面全部完成后再回收，避免竞态崩溃
支持代理（单实例/多实例/callable 三种模式）

3.3 文件下载 (`download_file`)

过 CF 后，通过页内 fetch() 直接下载二进制文件（PDF、图片等），复用浏览器 cookie 和 TLS 指纹，绕过反爬限制。

3.4 多实例并行 (`fetch_all`)

将 URL 均匀分配到多个浏览器实例，每个实例独立 event loop + 独立代理，ThreadPoolExecutor 并行执行，最大化吞吐量。

3.5 主入口函数

函数	用途
`fetch_url(url, ...)`	同步抓取单个 URL
`fetch_urls(urls, ...)`	同步批量抓取（`fetch_all` 别名）
`fetch_all(urls, ...)`	多实例并行抓取，支持分片代理

4. 测试案例

案例 A：批量页面抓取

测试 31 个混合 URL（Gut 医学期刊 + American Football Wiki + ScienceDirect），验证 CF 解盾和页面抓取能力。

# -*- coding: utf-8 -*-
"""CF 自动解盾 + 页面抓取 — 测试脚本"""
import os
import sys

if sys.platform == "win32":
    import io
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
    sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace")

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from cf_killer import fetch_all

HEADLESS               = True
PROXY                  = None
CONCURRENCY            = 3
INSTANCES              = 2
MAX_PAGES_PER_CONTEXT  = 10
RETURN_COOKIES         = False

URLS = [
    "https://gut.bmj.com/content/75/6/1085",
    "https://gut.bmj.com/content/75/6/1087",
    "https://gut.bmj.com/content/75/6/1090",
    "https://gut.bmj.com/content/75/6/1092",
    "https://gut.bmj.com/content/75/6/1094",
    "https://gut.bmj.com/content/75/6/1097",
    "https://gut.bmj.com/content/75/6/1110",
    "https://gut.bmj.com/content/75/6/1123",
    "https://gut.bmj.com/content/75/6/1136",
    "https://gut.bmj.com/content/75/6/1147",
    "https://gut.bmj.com/content/75/6/1160",
    "https://gut.bmj.com/content/75/6/1169",
    "https://gut.bmj.com/content/75/6/1186",
    "https://gut.bmj.com/content/75/6/1201",
    "https://gut.bmj.com/content/75/6/1211",
    "https://gut.bmj.com/content/75/6/1226",
    "https://gut.bmj.com/content/75/6/1237",
    "https://gut.bmj.com/content/75/6/1248",
    "https://gut.bmj.com/content/75/6/1264",
    "https://gut.bmj.com/content/75/6/1266.1",
    "https://gut.bmj.com/content/75/6/1266.2",
    "https://gut.bmj.com/content/75/6/1267",
    "https://gut.bmj.com/content/75/6/1109",
    "http://americanfootball.fandom.com/1993_Kentucky_vs._Mississippi",
    "http://americanfootball.fandom.com/Isaiah_Foskey",
    "http://americanfootball.fandom.com/wiki/2014_Susquehanna_Crusaders",
    "http://americanfootball.fandom.com/wiki/2015_Lake_Forest_Foresters",
    "http://americanfootball.fandom.com/wiki/2023_Colorado_State_Rams",
    "http://americanfootballdatabase.fandom.com/Paul_Hackett_(American_football)",
    "http://americanfootballdatabase.fandom.com/wiki/100th_Grey_Cup",
    "https://www.sciencedirect.com/science/article/pii/S0039606025002491",
]

if __name__ == "__main__":
    print(f"测试: {len(URLS)} 个 URL")

    results = fetch_all(
        URLS,
        instances=INSTANCES,
        concurrency=CONCURRENCY,
        max_pages_per_context=MAX_PAGES_PER_CONTEXT,
        headless=HEADLESS,
        solve_cf=True,
        proxy=PROXY,
        return_cookies=RETURN_COOKIES,
        verbose=False,
    )

    ok = sum(1 for r in results if r["success"])
    print(f"\n{'='*50}")
    for r in results:
        status = "✓" if r["success"] else "✗"
        print(f"  {status}  {(r['title'] or 'FAILED')[:60]}")
    print(f"{'='*50}")
    print(f"结果: {ok}/{len(results)} 成功")

运行：python test.py

预期输出：

测试: 31 个 URL

==================================================
  ✓  Gut-peritoneal-multisystem axis in endometriosis | Gut
  ✓  Hitting the mitotic spot of fibrolamellar carcinoma | Gut
  ...
  ✓  100th Grey Cup | American Football Database | Fandom
  ✓  Guidelines for perioperative care in elective colorectal sur
==================================================
结果: 31/31 成功

案例 B：PDF 文件下载

通过过 CF 后的浏览器页面发起 fetch() 下载 PDF，复用 TLS 指纹和 cookie。

# -*- coding: utf-8 -*-
"""测试 download_file 方法 — PDF 下载"""
import asyncio
import os
import sys

if sys.platform == "win32":
    import io
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
    sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace")

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from cf_killer import CFPageFetcher

PDF_URL = "https://www.myavls.org/assets/pdf/SuperficialVenousDiseaseGuidelinesPMS313-02.03.16.pdf"
OUTPUT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "SuperficialVenousDiseaseGuidelines.pdf")


async def main():
    print(f"目标: {PDF_URL}")
    print(f"保存: {OUTPUT}")

    async with CFPageFetcher(
        headless=True,
        verbose=True,
        solve_cf=True,
    ) as fetcher:
        ok = await fetcher.download_file(PDF_URL, OUTPUT)
        if ok:
            size_kb = os.path.getsize(OUTPUT) / 1024
            print(f"\n✅ 下载成功! 文件: {OUTPUT} ({size_kb:.0f} KB)")
        else:
            print(f"\n❌ 下载失败")


if __name__ == "__main__":
    asyncio.run(main())

运行：python test_download.py

预期输出：

目标: https://www.myavls.org/assets/pdf/SuperficialVenousDiseaseGuidelinesPMS313-02.03.16.pdf
保存: ...\SuperficialVenousDiseaseGuidelines.pdf
[上下文] 已创建
[下载] 预热: https://www.myavls.org/
非 CF url=https://www.myavls.org/
[下载] 已保存: ...\SuperficialVenousDiseaseGuidelines.pdf (121KB)

✅ 下载成功! ... (121 KB)

5. 主要 API 参数

`CFPageFetcher`

参数	类型	默认值	说明
`headless`	bool	True	无头模式
`humanize`	bool	False	人类化鼠标轨迹/键盘时序
`solve_cf`	bool	True	自动求解 CF 挑战
`cf_max_retries`	int	5	CF 求解最大重试次数
`timeout`	int	90000	页面导航超时 (ms)
`proxy`	str	None	代理 URL
`max_pages_per_context`	int	20	每 N 页回收浏览器上下文
`return_cookies`	bool	False	结果中是否包含 cookies

`fetch_all`

参数	类型	默认值	说明
`urls`	list	-	URL 列表
`instances`	int	1	并行浏览器实例数
`concurrency`	int	3	每实例并发 tab 数
`max_pages_per_context`	int	20	每 N 页自动回收
`proxy`	str/list/callable	None	单代理/代理列表/代理工厂函数

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cf_killer-0.1.0.tar.gz (18.5 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cf_killer-0.1.0-py3-none-any.whl (15.2 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file cf_killer-0.1.0.tar.gz.

File metadata

Download URL: cf_killer-0.1.0.tar.gz
Upload date: Jun 2, 2026
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for cf_killer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`411cd9f09991cd037e6f8f61cbe98219aed4acdaad6e5649b80e9e44c0904f19`
MD5	`3930a388ea5edc2d73b3419884002aea`
BLAKE2b-256	`f4d3e68982e5ae22cc20807b5bdeaff392b3cc00a1f866fa65c60ac7b4cc9ccc`

See more details on using hashes here.

File details

Details for the file cf_killer-0.1.0-py3-none-any.whl.

File metadata

Download URL: cf_killer-0.1.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for cf_killer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`77d59c8a2c2a1f20132f38a489501d7830e51d64535bf256327fedf75116d254`
MD5	`9a6facb6fd5992914a7cc3db4c0556b3`
BLAKE2b-256	`93e058ea65019f4db9265754c029c75afec34ccb6ae6190ee34476f8245a6430`

See more details on using hashes here.

cf-killer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CF Killer

1. 运行环境

2. 依赖安装

3. 功能概述

3.1 Cloudflare 自动解盾 (CFSolver)

3.2 页面批量抓取 (CFPageFetcher)

3.3 文件下载 (download_file)

3.4 多实例并行 (fetch_all)

3.5 主入口函数

4. 测试案例

案例 A：批量页面抓取

案例 B：PDF 文件下载

5. 主要 API 参数

CFPageFetcher

fetch_all

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

3.1 Cloudflare 自动解盾 (`CFSolver`)

3.2 页面批量抓取 (`CFPageFetcher`)

3.3 文件下载 (`download_file`)

3.4 多实例并行 (`fetch_all`)

`CFPageFetcher`

`fetch_all`