Skip to main content

A tool for scraping user data from X (Twitter)

Project description

X (Twitter) Scraper

基于 twscrape 的 X (Twitter) 数据爬取工具,支持用户资料、推文、粉丝列表和互动数据的抓取,输出为 JSON 或 CSV。

功能

  • 用户资料(profile)爬取
  • 用户推文时间线(最多 3200 条,X API 硬限制)
  • 粉丝列表(followers)和关注列表(following)
  • 推文回复(replies)和转推用户(retweeters)
  • 多账号自动轮换,cookie 导入绕过 CAPTCHA
  • JSON / CSV 双格式输出

前置条件

  • Python >= 3.11
  • uv 包管理器
  • 至少一个 X 账号的 cookie(非主账号)

安装

git clone https://github.com/ChaNg1o1/x-scraper.git
cd x-scraper
uv sync

配置

复制示例配置并填入你的账号 cookie:

cp config.example.toml config.toml

编辑 config.toml

[scraper]
output_dir = "./output"
request_delay = 1.5
max_tweets = 3200

[[accounts]]
username = "your_account"
cookies = "ct0=YOUR_CT0; auth_token=YOUR_AUTH_TOKEN"
# proxy = "http://user:pass@host:port"  # 可选

获取 Cookie

  1. 在浏览器中登录 X
  2. 打开 DevTools(F12)-> Application -> Cookies -> https://x.com
  3. 复制 ct0auth_token 的值

使用方法

# 爬取用户全部数据(资料 + 推文 + 粉丝 + 关注)
x-scraper scrape --user elonmusk --all

# 只爬取推文和粉丝
x-scraper scrape --user elonmusk --tweets --followers

# 爬取推文互动数据
x-scraper scrape --tweet 1234567890 --replies --retweeters

# 指定输出格式为 CSV
x-scraper scrape --user elonmusk --all --format csv

# 限制爬取数量
x-scraper scrape --user elonmusk --tweets --limit 100

# 使用指定配置文件
x-scraper scrape --config ./my_config.toml --user elonmusk --all

# 查看已配置账号
x-scraper accounts

输出结构

output/
  {username}/
    profile.json
    tweets.json
    followers.json
    following.json
  tweet_{id}/
    replies.json
    retweeters.json

项目结构

x-scraper/
  pyproject.toml
  config.example.toml
  src/x_scraper/
    cli.py          # CLI 入口(click)
    config.py       # TOML 配置加载
    auth.py         # 账号管理与 cookie 导入
    scraper.py      # 核心爬取逻辑
    models.py       # 数据模型(dataclass)
    export.py       # JSON/CSV 导出
  tests/
    test_models.py  # 数据模型单元测试

注意事项

  • 请勿使用主账号,账号存在被封禁的风险
  • X 的 ToS 禁止未授权的自动化访问,请自行承担使用风险
  • X 的反爬策略大约每 2-4 周变更,工具可能需要随之调整
  • 本工具已内置对 twscrape xclid 脚本解析错误的 monkey-patch 修复(vladkens/twscrape#284

许可证

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

x_scraper_tool-0.1.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

x_scraper_tool-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file x_scraper_tool-0.1.0.tar.gz.

File metadata

  • Download URL: x_scraper_tool-0.1.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for x_scraper_tool-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2e77cf26951ca497b9c9d8ff32b8fcd0f0640c315f71ffb080373755552935a
MD5 6b2915e158f1f12feafe9d5024ec4b14
BLAKE2b-256 fa8357f020edbaf0ced0c41f1c006abf07f2c008dc73e88be4565dc22e36b605

See more details on using hashes here.

File details

Details for the file x_scraper_tool-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: x_scraper_tool-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for x_scraper_tool-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ee6753324ceac2f425b44c63cf2c33da5044055f52fb9a466842e64cb8314df
MD5 19b17645534af4726230242e64dce2b5
BLAKE2b-256 b605ebe5f357d0ebe773e97382de571a7c6cb85b898593b0cad42ff08f3103a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page