Skip to main content

Xiaohongshu contact lead crawler for fashion creators

Project description

red-crawler

CLI crawler for collecting Xiaohongshu beauty creator contact leads from profile bios and recommendation chains, with SQLite persistence and nightly automation.

Usage

Install the published CLI:

uv tool install red-crawler==0.1.2

Install the Playwright browser runtime:

red-crawler install-browsers

For local development from a checkout:

uv sync
uv run playwright install chromium

Save a reusable login session first:

red-crawler login --save-state "./state.json"

It will open a visible browser. Log in to Xiaohongshu there, then come back to the terminal and press Enter to save the session file.

Run a manual crawl with an existing Playwright storage state file:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --storage-state "./state.json" \
  --max-accounts 20 \
  --max-depth 2 \
  --db-path "./data/red_crawler.db" \
  --output-dir "./output"

crawl-seed defaults to safe mode, adding slower request pacing and dwell/scroll delays that look more like a normal browsing session. Use --no-safe-mode only if you explicitly want a faster run.

crawl-seed now does both:

  • exports accounts.csv, contact_leads.csv, run_report.json
  • upserts the same result into SQLite

Optional note-page expansion:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --storage-state "./state.json" \
  --include-note-recommendations

List high-quality contactable creators from the SQLite database:

red-crawler list-contactable \
  --db-path "./data/red_crawler.db" \
  --min-relevance-score 0.7 \
  --limit 20

Run nightly auto-collection with queue, search bootstrap, seed promotion, and daily report output:

red-crawler collect-nightly \
  --storage-state "./state.json" \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --cache-dir "./.cache/red-crawler" \
  --crawl-budget 30

Export weekly growth report and a contactable creator CSV:

red-crawler report-weekly \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --days 7

Key outputs:

  • manual crawl:
    • accounts.csv
    • contact_leads.csv
    • run_report.json
  • nightly automation:
    • reports/daily-run-report.json
    • reports/weekly-growth-report.json
    • reports/contactable_creators.csv
  • SQLite database:
    • data/red_crawler.db

OpenClaw

The OpenClaw skill for this project lives at openclaw-skills/red-crawler-ops/.

To install it from a local path, point OpenClaw at that folder, or copy the skill directory into your OpenClaw skills location and register the same path.

Use the OpenClaw skill actions in this order:

  • bootstrap validates a local working directory and can run Chromium installation when explicitly requested.
  • login creates the Playwright storage state explicitly.
  • crawl_seed and collect_nightly require an authenticated Playwright storage state file.
  • report_weekly and list_contactable run from the SQLite database and do not require --storage-state.

The skill does not clone repositories or create login sessions implicitly. Install the red-crawler CLI package first, point workspace_path at a local working directory, run bootstrap only for reviewed local setup steps, then run login when you are ready to create state.json.

Publishing

The package builds as a standard Python wheel and source distribution:

uv build

See docs/publishing.md for the release checklist and PyPI/TestPyPI commands.

launchd

For macOS local scheduling, use the template at docs/launchd/red-crawler.collect-nightly.plist.

Replace the placeholder paths:

  • __WORKDIR__
  • __UV_BIN__
  • __STORAGE_STATE__
  • __DB_PATH__
  • __REPORT_DIR__
  • __CACHE_DIR__
  • __LOG_DIR__

Then load it with:

launchctl unload ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist 2>/dev/null || true
cp docs/launchd/red-crawler.collect-nightly.plist ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist
launchctl load ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

red_crawler-0.1.2.tar.gz (40.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

red_crawler-0.1.2-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file red_crawler-0.1.2.tar.gz.

File metadata

  • Download URL: red_crawler-0.1.2.tar.gz
  • Upload date:
  • Size: 40.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.2.tar.gz
Algorithm Hash digest
SHA256 14e785f21d3e1f55a86c88ed41bf5dfed52d0dc2f939d8aaeed515d2b03305a9
MD5 2065b8041445b99b7524d6c81986b3e1
BLAKE2b-256 82c2ea96510c7e541a98bf3d8def0d8b06777c635debcab89e071e8dc2b0e2d0

See more details on using hashes here.

File details

Details for the file red_crawler-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: red_crawler-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d5a210ea370a2145c849eb20edfafedeeabb1fc62e6762c5da335341a90d491f
MD5 f174f8533ea3446ac142e875a52dd8ea
BLAKE2b-256 3d4d92c6166fe8c46f063bc261500ce57ced8f7c749bed3701f5785d76bd38a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page