Xiaohongshu contact lead crawler for fashion creators

These details have not been verified by PyPI

Project links

Project description

red-crawler

CLI crawler for collecting Xiaohongshu creator contact leads from profile bios and recommendation chains, with SQLite persistence and nightly automation.

Usage

Install the published CLI:

uv tool install red-crawler==0.1.3

Install the Playwright browser runtime:

red-crawler install-browsers

For local development from a checkout:

uv sync
uv run playwright install chromium

Run without logging in. If Xiaohongshu shows a login popup, the crawler tries to close it and continue.

Collect creators from the Xiaohongshu fashion homefeed:

red-crawler crawl-homefeed \
  --max-accounts 20 \
  --db-path "./data/red_crawler.db" \
  --output-dir "./output"

The default homefeed URL is https://www.xiaohongshu.com/explore?channel_id=homefeed.fashion_v3. The crawler reads each card's author link and opens the user profile, not the note page.

Run a manual crawl from a known user profile:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --max-accounts 20 \
  --max-depth 2 \
  --gender-filter "女" \
  --db-path "./data/red_crawler.db" \
  --output-dir "./output"

crawl-seed defaults to safe mode, adding slower request pacing and dwell/scroll delays that look more like a normal browsing session. Use --no-safe-mode only if you explicitly want a faster run.

Use --gender-filter "男" or --gender-filter "女" to keep only inferred male or female accounts in the exported and persisted crawl results.

Use Bright Data Browser API instead of launching local Chromium:

export BRIGHT_DATA_BROWSER_API_AUTH="SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD"

red-crawler crawl-search \
  --search-term "抗痘博主" \
  --browser-mode bright-data \
  --output-dir "./output"

You can also pass the full CDP endpoint directly:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --browser-mode bright-data \
  --browser-endpoint "wss://SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD@brd.superproxy.io:9222"

crawl-seed now does both:

exports accounts.csv, contact_leads.csv, run_report.json
upserts the same result into SQLite

Browser IP rotation

Bright Data Browser API mode rotates by opening a fresh browser session on retry. If your Bright Data username, password, or endpoint contains {session}, red-crawler replaces it with a random session id for each browser session:

red-crawler crawl-homefeed \
  --browser-mode bright-data \
  --browser-auth "brd-customer-xxx-zone-xxx-session-{session}:PASSWORD" \
  --rotation-mode session \
  --rotation-retries 2

In local browser mode, red-crawler cannot rotate the machine's real IP by itself. Provide one proxy with --proxy or a newline-delimited proxy pool with --proxy-list; session rotation will launch a new Chromium session with the next proxy after 403 or 429.

red-crawler crawl-homefeed \
  --proxy-list "./proxies.txt" \
  --rotation-mode session \
  --rotation-retries 3 \
  --output-dir "./output"

Proxy entries can be host:port, http://user:pass@host:port, or socks5://host:port. By default, each proxy maps deterministically to one User-Agent, Accept-Language, and Sec-CH-UA header set, so the same outbound IP does not appear with a different browser fingerprint on a later retry. Direct local mode uses one stable local fingerprint. Use --no-randomize-headers only for debugging.

Run a manual crawl for one explicit search term without a seed_url:

red-crawler crawl-search \
  --search-term "抗痘博主" \
  --max-accounts 20 \
  --search-scroll-rounds 8 \
  --creator-only \
  --min-followers 5000 \
  --min-relevance-score 0.7 \
  --db-path "./data/red_crawler.db" \
  --output-dir "./output"

想尽量覆盖某个搜索词下的博主时，可以把 --search-scroll-rounds 和 --max-accounts 调大，同时配合 --creator-only、--min-followers、--min-relevance-score 做收口。这里的“全量”只能是尽量覆盖，平台搜索结果本身不是稳定全量接口，而且滚动过深会明显增加风控概率。

Optional note-page expansion:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --include-note-recommendations

List high-quality contactable creators from the SQLite database:

red-crawler list-contactable \
  --db-path "./data/red_crawler.db" \
  --min-relevance-score 0.7 \
  --limit 20

Run nightly auto-collection with queue, search bootstrap, seed promotion, and daily report output:

red-crawler collect-nightly \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --cache-dir "./.cache/red-crawler" \
  --crawl-budget 12 \
  --daily-account-budget 12 \
  --daily-search-term-budget 2

collect-nightly now enforces a daily budget across all runs started on the same UTC day. If you schedule multiple slots, later runs automatically shrink or skip once the daily profile/search-term budget is used up.

Run the same discovery flow manually without a seed_url:

red-crawler crawl-discover \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --cache-dir "./.cache/red-crawler" \
  --crawl-budget 6

Export weekly growth report and a contactable creator CSV:

red-crawler report-weekly \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --days 7

Key outputs:

manual crawl:
- accounts.csv
- contact_leads.csv
- run_report.json
nightly automation:
- reports/daily-run-report.json
- reports/weekly-growth-report.json
- reports/contactable_creators.csv
SQLite database:
- data/red_crawler.db

OpenClaw

The OpenClaw skill for this project lives at openclaw-skills/red-crawler-ops/.

To install it from a local path, point OpenClaw at that folder, or copy the skill directory into your OpenClaw skills location and register the same path.

Use the OpenClaw skill actions in this order:

bootstrap validates a local working directory and can run Chromium installation when explicitly requested.
crawl_homefeed collects from the default fashion homefeed without requiring login.
login creates an optional Playwright storage state explicitly.
crawl_seed and collect_nightly can run without --storage-state; pass one only when you want to reuse an authenticated session.
report_weekly and list_contactable run from the SQLite database and do not require --storage-state.

For long crawls, pass run_mode: background. The skill returns a job_id immediately, writes job state under ./.openclaw/red-crawler, and maintains HEARTBEAT.md for OpenClaw heartbeat polling. Use job_status, job_logs, or job_stop with the returned job_id for manual follow-up. After OpenClaw reports a pending heartbeat event to the user, call ack_event with its event_id to avoid duplicate notifications.

The skill does not clone repositories or create login sessions implicitly. Install the red-crawler CLI package first, point workspace_path at a local working directory, and run bootstrap only for reviewed local setup steps.

Publishing

The package builds as a standard Python wheel and source distribution:

uv build

See docs/publishing.md for the release checklist and PyPI/TestPyPI commands.

launchd

For macOS local scheduling, use the template at docs/launchd/red-crawler.collect-nightly.plist.

Replace the placeholder paths:

__WORKDIR__
__UV_BIN__
__STORAGE_STATE__
__DB_PATH__
__REPORT_DIR__
__CACHE_DIR__
__LOG_DIR__

Then load it with:

launchctl unload ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist 2>/dev/null || true
cp docs/launchd/red-crawler.collect-nightly.plist ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist
launchctl load ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Apr 30, 2026

0.1.2

Apr 19, 2026

0.1.1

Apr 19, 2026

0.1.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

red_crawler-0.1.3.tar.gz (70.4 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

red_crawler-0.1.3-py3-none-any.whl (52.3 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file red_crawler-0.1.3.tar.gz.

File metadata

Download URL: red_crawler-0.1.3.tar.gz
Upload date: Apr 30, 2026
Size: 70.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`ce0a5a1a8fa2a1c1c4ea94aa52a52b2494187da775953f38d6e7140fbeeee88a`
MD5	`24063a3f7cc7c28fd1666217450f537c`
BLAKE2b-256	`cabebd66a4ca31ad895a25c6e979964b085a066271e1cb58d2c74533d94822f0`

See more details on using hashes here.

File details

Details for the file red_crawler-0.1.3-py3-none-any.whl.

File metadata

Download URL: red_crawler-0.1.3-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 52.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09a2641f59aa059085317098ee14ceb10cac17d50f698ebfd125d2740ff4f0b0`
MD5	`e8765d17b144ac0d94d7471ed5c1fa3e`
BLAKE2b-256	`c528ab374d43b82cee447deaedf3f8640ea4a4bcd35974207f26ea6dd3bb0038`

See more details on using hashes here.

red-crawler 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

red-crawler

Usage

Browser IP rotation

OpenClaw

Publishing

launchd

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes