Xiaohongshu contact lead crawler for fashion creators
Project description
red-crawler
CLI crawler for collecting Xiaohongshu creator contact leads from profile bios and recommendation chains, with SQLite persistence and nightly automation.
Usage
Install the published CLI:
uv tool install red-crawler==0.1.3
Install the Playwright browser runtime:
red-crawler install-browsers
For local development from a checkout:
uv sync
uv run playwright install chromium
Run without logging in. If Xiaohongshu shows a login popup, the crawler tries to close it and continue.
Collect creators from the Xiaohongshu fashion homefeed:
red-crawler crawl-homefeed \
--max-accounts 20 \
--db-path "./data/red_crawler.db" \
--output-dir "./output"
The default homefeed URL is https://www.xiaohongshu.com/explore?channel_id=homefeed.fashion_v3. The crawler reads each card's author link and opens the user profile, not the note page.
Run a manual crawl from a known user profile:
red-crawler crawl-seed \
--seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
--max-accounts 20 \
--max-depth 2 \
--gender-filter "女" \
--db-path "./data/red_crawler.db" \
--output-dir "./output"
crawl-seed defaults to safe mode, adding slower request pacing and dwell/scroll delays that look more like a normal browsing session. Use --no-safe-mode only if you explicitly want a faster run.
Use --gender-filter "男" or --gender-filter "女" to keep only inferred male or female accounts in the exported and persisted crawl results.
Use Bright Data Browser API instead of launching local Chromium:
export BRIGHT_DATA_BROWSER_API_AUTH="SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD"
red-crawler crawl-search \
--search-term "抗痘博主" \
--browser-mode bright-data \
--output-dir "./output"
You can also pass the full CDP endpoint directly:
red-crawler crawl-seed \
--seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
--browser-mode bright-data \
--browser-endpoint "wss://SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD@brd.superproxy.io:9222"
crawl-seed now does both:
- exports
accounts.csv,contact_leads.csv,run_report.json - upserts the same result into SQLite
Browser IP rotation
Bright Data Browser API mode rotates by opening a fresh browser session on retry. If your Bright Data username, password, or endpoint contains {session}, red-crawler replaces it with a random session id for each browser session:
red-crawler crawl-homefeed \
--browser-mode bright-data \
--browser-auth "brd-customer-xxx-zone-xxx-session-{session}:PASSWORD" \
--rotation-mode session \
--rotation-retries 2
In local browser mode, red-crawler cannot rotate the machine's real IP by itself. Provide one proxy with --proxy or a newline-delimited proxy pool with --proxy-list; session rotation will launch a new Chromium session with the next proxy after 403 or 429.
red-crawler crawl-homefeed \
--proxy-list "./proxies.txt" \
--rotation-mode session \
--rotation-retries 3 \
--output-dir "./output"
Proxy entries can be host:port, http://user:pass@host:port, or socks5://host:port. By default, each proxy maps deterministically to one User-Agent, Accept-Language, and Sec-CH-UA header set, so the same outbound IP does not appear with a different browser fingerprint on a later retry. Direct local mode uses one stable local fingerprint. Use --no-randomize-headers only for debugging.
Run a manual crawl for one explicit search term without a seed_url:
red-crawler crawl-search \
--search-term "抗痘博主" \
--max-accounts 20 \
--search-scroll-rounds 8 \
--creator-only \
--min-followers 5000 \
--min-relevance-score 0.7 \
--db-path "./data/red_crawler.db" \
--output-dir "./output"
想尽量覆盖某个搜索词下的博主时,可以把 --search-scroll-rounds 和 --max-accounts 调大,同时配合 --creator-only、--min-followers、--min-relevance-score 做收口。这里的“全量”只能是尽量覆盖,平台搜索结果本身不是稳定全量接口,而且滚动过深会明显增加风控概率。
Optional note-page expansion:
red-crawler crawl-seed \
--seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
--include-note-recommendations
List high-quality contactable creators from the SQLite database:
red-crawler list-contactable \
--db-path "./data/red_crawler.db" \
--min-relevance-score 0.7 \
--limit 20
Run nightly auto-collection with queue, search bootstrap, seed promotion, and daily report output:
red-crawler collect-nightly \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--cache-dir "./.cache/red-crawler" \
--crawl-budget 12 \
--daily-account-budget 12 \
--daily-search-term-budget 2
collect-nightly now enforces a daily budget across all runs started on the same UTC day. If you schedule multiple slots, later runs automatically shrink or skip once the daily profile/search-term budget is used up.
Run the same discovery flow manually without a seed_url:
red-crawler crawl-discover \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--cache-dir "./.cache/red-crawler" \
--crawl-budget 6
Export weekly growth report and a contactable creator CSV:
red-crawler report-weekly \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--days 7
Key outputs:
- manual crawl:
accounts.csvcontact_leads.csvrun_report.json
- nightly automation:
reports/daily-run-report.jsonreports/weekly-growth-report.jsonreports/contactable_creators.csv
- SQLite database:
data/red_crawler.db
OpenClaw
The OpenClaw skill for this project lives at openclaw-skills/red-crawler-ops/.
To install it from a local path, point OpenClaw at that folder, or copy the skill directory into your OpenClaw skills location and register the same path.
Use the OpenClaw skill actions in this order:
bootstrapvalidates a local working directory and can run Chromium installation when explicitly requested.crawl_homefeedcollects from the default fashion homefeed without requiring login.logincreates an optional Playwright storage state explicitly.crawl_seedandcollect_nightlycan run without--storage-state; pass one only when you want to reuse an authenticated session.report_weeklyandlist_contactablerun from the SQLite database and do not require--storage-state.
For long crawls, pass run_mode: background. The skill returns a job_id immediately, writes job state under ./.openclaw/red-crawler, and maintains HEARTBEAT.md for OpenClaw heartbeat polling. Use job_status, job_logs, or job_stop with the returned job_id for manual follow-up. After OpenClaw reports a pending heartbeat event to the user, call ack_event with its event_id to avoid duplicate notifications.
The skill does not clone repositories or create login sessions implicitly. Install the red-crawler CLI package first, point workspace_path at a local working directory, and run bootstrap only for reviewed local setup steps.
Publishing
The package builds as a standard Python wheel and source distribution:
uv build
See docs/publishing.md for the release checklist and PyPI/TestPyPI commands.
launchd
For macOS local scheduling, use the template at docs/launchd/red-crawler.collect-nightly.plist.
Replace the placeholder paths:
__WORKDIR____UV_BIN____STORAGE_STATE____DB_PATH____REPORT_DIR____CACHE_DIR____LOG_DIR__
Then load it with:
launchctl unload ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist 2>/dev/null || true
cp docs/launchd/red-crawler.collect-nightly.plist ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist
launchctl load ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file red_crawler-0.1.3.tar.gz.
File metadata
- Download URL: red_crawler-0.1.3.tar.gz
- Upload date:
- Size: 70.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce0a5a1a8fa2a1c1c4ea94aa52a52b2494187da775953f38d6e7140fbeeee88a
|
|
| MD5 |
24063a3f7cc7c28fd1666217450f537c
|
|
| BLAKE2b-256 |
cabebd66a4ca31ad895a25c6e979964b085a066271e1cb58d2c74533d94822f0
|
File details
Details for the file red_crawler-0.1.3-py3-none-any.whl.
File metadata
- Download URL: red_crawler-0.1.3-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09a2641f59aa059085317098ee14ceb10cac17d50f698ebfd125d2740ff4f0b0
|
|
| MD5 |
e8765d17b144ac0d94d7471ed5c1fa3e
|
|
| BLAKE2b-256 |
c528ab374d43b82cee447deaedf3f8640ea4a4bcd35974207f26ea6dd3bb0038
|