Implementing X (Twitter) data crawling based on Camoufox, for personal use.
Project description
X-CrawlFox 🦊
A free, highly anonymous, human-like scraping CLI tool for X (Twitter) and search engines.
🌐 English | 中文
🚀 Key Features
Features: Free, highly customizable, incremental crawling, and built-in human-like behavior for anti-bot protection.
- Human-like Interaction: Integrates Camoufox fingerprint obfuscation to simulate real human scrolling, random delays, and typing interactions, significantly reducing the risk of detection.
- Timeline Scraping: Supports crawling "Following" and "For you" feeds with configurable item limits.
- Deep News Scraping: Automatically scrapes the "Today's News" sidebar, with support for clicking into details to extract Grok summaries and related popular posts.
- Incremental Account Monitoring: Supports multi-account monitoring with automatic tracking of the last crawled tweet ID to only fetch new content.
- One-click Composite Tasks: Launch composite tasks (Timeline, News, Monitoring, Search) via a unified JSON configuration file.
- Automatic State Management: Automatically saves login sessions (Cookie) and crawling progress (Crawler State).
- Multi-Search Engine Support: Supports 18 different search engines, including Google, Bing, Baidu, Brave, DuckDuckGo, and more.
📦 Quick Start
Installation
-
Install from PyPI:
pip install x-crawlfox
-
Build from source: This project uses
uvfor package management.git clone https://github.com/Jiutwo/x-crawlfox.git cd x-crawlfox uv sync
How to Use
1. Initialize Config Directory
Before first use, run the following command to generate the .x-crawlfox configuration folder and default settings in the current directory:
x-crawlfox init
# To save the configuration to the user home directory (Global Mode):
x-crawlfox init --global
2. Account Login or Cookie Export (Required)
You must have a logged-in session (Cookie) before scraping.
Note: Scraping immediately with a newly registered account is risky; it is recommended to use the account normally for a while first.
Method 1: Export via Cookie Editor Extension (Recommended)
Use the browser extension Cookie Editor to export your current session cookies as JSON and save them to .x-crawlfox/x_cookies.json.
The .x-crawlfox folder can be located in the current directory or the user home directory. X-CrawlFox will automatically recognize and convert the Cookie Editor format to the required internal format upon loading.
Method 2: Command Line Login
x-crawlfox x login
Complete the login in the popup browser window, then return to the terminal and press Enter to save the state. The login state will be automatically saved to .x-crawlfox/x_cookies.json.
If X blocks the login as a "suspicious attempt," please switch to Method 1.
3. Scrape Personal Timeline
# Scrape the first 20 items from the Following feed
# Add --no-headless to visualize the process
x-crawlfox x timeline --type Following --max-items 20
# Scrape the For You feed
x-crawlfox x timeline --type "For you" --max-items 50
4. Scrape Today's News
# Scrape sidebar list only
x-crawlfox x news
# Deep scraping: Enter details to get summaries and related posts
x-crawlfox x news --detail --max-items 3
5. Scrape/Monitor Specific User
# Fetch the latest 20 tweets from a specific user
x-crawlfox x user elonmusk --max-tweets 20
# Incremental fetch: Only get new content since the last run
x-crawlfox x user elonmusk --only-new
Run multi-account monitoring independently (reads x.monitor from crawl_config.json):
x-crawlfox x monitor
You can also specify a custom config file (flat list format):
x-crawlfox x monitor --config my_accounts.json
6. Search Engine Scraping
X-CrawlFox supports scraping search results from 18 search engines (8 CN + 10 Global) via the se subcommand. No login is required.
Single engine search
# Fast mode: navigate directly to the search URL (default)
x-crawlfox se search "LangGraph" --engine google --max-results 10
# Simulate mode: open homepage and type like a human (better anti-detection)
x-crawlfox se search "LangGraph" --engine google --mode simulate
# Time filter: hour | day | week | month | year
x-crawlfox se search "AI news" --engine bing --time-range day
# Domain restriction
x-crawlfox se search "python async" --engine google --site github.com
# File type filter
x-crawlfox se search "machine learning" --engine baidu --filetype pdf
# Exact phrase match
x-crawlfox se search "anything" --engine duckduckgo --exact-phrase "large language model"
# Disable headless mode (useful when bot detection is triggered)
x-crawlfox se search "隐私工具" --engine qwant --no-headless
Multi-engine search — query multiple engines in one run and merge results into a single .jsonl file:
x-crawlfox se multi "AI Agent" --engines google,bing,duckduckgo --max-results 10
x-crawlfox se multi "量化投资" --engines baidu,sogou,jisilu,wechat
x-crawlfox se multi "rust async" --engines google,bing --time-range month
Available engines
| Region | Engines |
|---|---|
| CN | baidu bing-cn bing-int 360 sogou wechat toutiao jisilu |
| Global | google google-hk bing duckduckgo yahoo startpage brave ecosia qwant wolframalpha |
Results are saved as .jsonl to the output/ directory (e.g. output/se_google_LangGraph_20260419_120000.jsonl).
7. One-click Composite Tasks
Edit .x-crawlfox/crawl_config.json, then run:
x-crawlfox x all
You can also specify a different config file path via --config:
x-crawlfox x all --config /path/to/crawl_config.json
Example crawl_config.json format:
{
"global": {
"output_dir": "output",
"headless": true
},
"x": {
"timeline": [
{ "type": "For you", "max_scrolls": 2, "max_items": 10 },
{ "type": "Following", "max_scrolls": 3, "max_items": 10 }
],
"news": {
"enabled": true,
"detail": true,
"max_items": 5
},
"monitor": [
{ "username": "elonmusk", "only_new": true, "max_tweets": 10 },
{ "username": "OpenAI", "only_new": true, "max_tweets": 10 }
]
}
}
📂 Storage & Configuration (.x-crawlfox)
To protect privacy and support persistence, X-CrawlFox uses the .x-crawlfox folder to store sensitive data:
-
Storage Location:
- Local Mode: The program first checks if
.x-crawlfoxexists in the current working directory. If found, all data is stored here (ideal for account isolation). - Global Mode: If the local directory does not exist, it defaults to
~/.x-crawlfoxin the user home directory (Windows:%USERPROFILE%\.x-crawlfox).
- Local Mode: The program first checks if
-
Stored Content:
x_cookies.json: Stores X login cookies and auth tokens. Do not share this file.crawl_config.json: Unified configuration file for theallandmonitorcommands.x_crawl_state.json: Stores the last tweet ID fetched for each monitored account to enable incremental fetching.
-
Output Location: All scraping results are saved in
.jsonlformat in theoutput/directory for easy analysis or database import.
🙏 Acknowledgments
This project is deeply inspired by the open-source community and integrates excellent open-source projects such as Camoufox. Sincere thanks to all the open-source libraries and developers who provide foundational support for this project.
⚠️ Disclaimer
This tool is for educational and research purposes only. Please comply with the X (Twitter) Terms of Service. The developers are not responsible for any account restrictions or legal issues resulting from the use of this tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file x_crawlfox-0.1.2.tar.gz.
File metadata
- Download URL: x_crawlfox-0.1.2.tar.gz
- Upload date:
- Size: 53.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7219aaf21264f831f34084a27717f4bb93f6b9082f72735cd081afe0765f277
|
|
| MD5 |
2ace7f71cf5e01d3753f9d9d5f098422
|
|
| BLAKE2b-256 |
50eebbc93fa09f0c0ce761781a3526c4944779fff03d0127a7176c584182fa33
|