Implementing X (Twitter) data crawling based on Camoufox, for personal use.
Project description
X-CrawlFox 🦊
A free, high-anonymity X (Twitter) human-like scraping CLI tool.
🌐 English | 中文
🚀 Key Features
Features: Free, highly customizable, incremental crawling, and built-in human-like behavior for anti-bot protection.
- Human-like Interaction: Integrates Camoufox fingerprint obfuscation to simulate real human scrolling, random delays, and typing interactions, significantly reducing the risk of detection.
- Timeline Scraping: Supports crawling "Following" and "For you" feeds with configurable item limits.
- Deep News Scraping: Automatically scrapes the "Today's News" sidebar, with support for clicking into details to extract Grok summaries and related popular posts.
- Keyword Search: Simulates real keyboard input for search queries to bypass anti-bot detection.
- Incremental Account Monitoring: Supports multi-account monitoring with automatic tracking of the last crawled tweet ID to only fetch new content.
- One-click Composite Tasks: Launch composite tasks (Timeline, News, Monitoring, Search) via a unified JSON configuration file.
- Automatic State Management: Automatically saves login sessions (Cookie) and crawling progress (Crawler State).
📦 Quick Start
Installation
-
Install from PyPI:
pip install x-crawlfox
-
Build from source: This project uses
uvfor package management.git clone https://github.com/Jiutwo/x-crawlfox.git cd x-crawlfox uv sync
How to Use
1. Initialize Config Directory
Before first use, run the following command to generate the .x-crawlfox configuration folder and default settings in the current directory:
x-crawlfox init
# To save the configuration to the user home directory (Global Mode):
x-crawlfox init --global
2. Account Login or Cookie Export (Required)
You must have a logged-in session (Cookie) before scraping.
Note: Scraping immediately with a newly registered account is risky; it is recommended to use the account normally for a while first.
Method 1: Export via Cookie Editor Extension (Recommended)
Use the browser extension Cookie Editor to export your current session cookies as JSON and save them to .x-crawlfox/x_cookies.json.
The .x-crawlfox folder can be located in the current directory or the user home directory. X-CrawlFox will automatically recognize and convert the Cookie Editor format to the required internal format upon loading.
Method 2: Command Line Login
x-crawlfox x login
Complete the login in the popup browser window, then return to the terminal and press Enter to save the state. The login state will be automatically saved to .x-crawlfox/x_cookies.json.
If X blocks the login as a "suspicious attempt," please switch to Method 1.
3. Scrape Personal Timeline
# Scrape the first 20 items from the Following feed
# Add --no-headless to visualize the process
x-crawlfox x timeline --type Following --max-items 20
# Scrape the For You feed
x-crawlfox x timeline --type "For you" --max-items 50
4. Scrape Today's News
# Scrape sidebar list only
x-crawlfox x news
# Deep scraping: Enter details to get summaries and related posts
x-crawlfox x news --detail --max-items 3
5. Scrape/Monitor Specific User
# Fetch the latest 20 tweets from a specific user
x-crawlfox x user elonmusk --max-tweets 20
# Incremental fetch: Only get new content since the last run
x-crawlfox x user elonmusk --only-new
Run multi-account monitoring independently (reads x.monitor from crawl_config.json):
x-crawlfox x monitor
You can also specify a custom config file (flat list format):
x-crawlfox x monitor --config my_accounts.json
6. One-click Composite Tasks
Edit .x-crawlfox/crawl_config.json, then run:
x-crawlfox x all
You can also specify a different config file path via --config:
x-crawlfox x all --config /path/to/crawl_config.json
Example crawl_config.json format:
{
"global": {
"output_dir": "output",
"headless": true
},
"x": {
"timeline": [
{ "type": "For you", "max_scrolls": 2, "max_items": 10 },
{ "type": "Following", "max_scrolls": 3, "max_items": 10 }
],
"news": {
"enabled": true,
"detail": true,
"max_items": 5
},
"monitor": [
{ "username": "elonmusk", "only_new": true, "max_tweets": 10 },
{ "username": "OpenAI", "only_new": true, "max_tweets": 10 }
]
}
}
📂 Storage & Configuration (.x-crawlfox)
To protect privacy and support persistence, X-CrawlFox uses the .x-crawlfox folder to store sensitive data:
-
Storage Location:
- Local Mode: The program first checks if
.x-crawlfoxexists in the current working directory. If found, all data is stored here (ideal for account isolation). - Global Mode: If the local directory does not exist, it defaults to
~/.x-crawlfoxin the user home directory (Windows:%USERPROFILE%\.x-crawlfox).
- Local Mode: The program first checks if
-
Stored Content:
x_cookies.json: Stores X login cookies and auth tokens. Do not share this file.crawl_config.json: Unified configuration file for theallandmonitorcommands.x_crawl_state.json: Stores the last tweet ID fetched for each monitored account to enable incremental fetching.
-
Output Location: All scraping results are saved in
.jsonlformat in theoutput/directory for easy analysis or database import.
🙏 Acknowledgments
This project is deeply inspired by the open-source community and integrates excellent open-source projects such as Camoufox. Sincere thanks to all the open-source libraries and developers who provide foundational support for this project.
⚠️ Disclaimer
This tool is for educational and research purposes only. Please comply with the X (Twitter) Terms of Service. The developers are not responsible for any account restrictions or legal issues resulting from the use of this tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file x_crawlfox-0.1.1.tar.gz.
File metadata
- Download URL: x_crawlfox-0.1.1.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7ded4ca59e29ecd9ad52bc10bc874efdc2ff5d8dcdc7a2471276c292f78b598
|
|
| MD5 |
4ed4d1902b5d3fb0dfcad0cce81306d3
|
|
| BLAKE2b-256 |
444a70f7691bf919e611c6003892788f4845d06c91d606bf85d12bda37d67ebf
|
File details
Details for the file x_crawlfox-0.1.1-py3-none-any.whl.
File metadata
- Download URL: x_crawlfox-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
591f5d20f5f742ee06c63845d08eecb0fc231bb80e4b768d00ac20fd058a6161
|
|
| MD5 |
5deddc75687e42aa5ccb64809b6fbd99
|
|
| BLAKE2b-256 |
c0d0b0b5687f2a9780869cbf2383e686c4f8f741793e2eeb76eb1f9052439f40
|