Skip to main content

Xiaohongshu contact lead crawler for fashion creators

Project description

red-crawler

CLI crawler for collecting Xiaohongshu beauty creator contact leads from profile bios and recommendation chains, with SQLite persistence and nightly automation.

Usage

Install the published CLI:

uv tool install red-crawler==0.1.1

Install the Playwright browser runtime:

red-crawler install-browsers

For local development from a checkout:

uv sync
uv run playwright install chromium

Save a reusable login session first:

red-crawler login --save-state "./state.json"

It will open a visible browser. Log in to Xiaohongshu there, then come back to the terminal and press Enter to save the session file.

Run a manual crawl with an existing Playwright storage state file:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --storage-state "./state.json" \
  --max-accounts 20 \
  --max-depth 2 \
  --db-path "./data/red_crawler.db" \
  --output-dir "./output"

crawl-seed defaults to safe mode, adding slower request pacing and dwell/scroll delays that look more like a normal browsing session. Use --no-safe-mode only if you explicitly want a faster run.

crawl-seed now does both:

  • exports accounts.csv, contact_leads.csv, run_report.json
  • upserts the same result into SQLite

Optional note-page expansion:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --storage-state "./state.json" \
  --include-note-recommendations

List high-quality contactable creators from the SQLite database:

red-crawler list-contactable \
  --db-path "./data/red_crawler.db" \
  --min-relevance-score 0.7 \
  --limit 20

Run nightly auto-collection with queue, search bootstrap, seed promotion, and daily report output:

red-crawler collect-nightly \
  --storage-state "./state.json" \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --cache-dir "./.cache/red-crawler" \
  --crawl-budget 30

Export weekly growth report and a contactable creator CSV:

red-crawler report-weekly \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --days 7

Key outputs:

  • manual crawl:
    • accounts.csv
    • contact_leads.csv
    • run_report.json
  • nightly automation:
    • reports/daily-run-report.json
    • reports/weekly-growth-report.json
    • reports/contactable_creators.csv
  • SQLite database:
    • data/red_crawler.db

OpenClaw

The OpenClaw skill for this project lives at openclaw-skills/red-crawler-ops/.

To install it from a local path, point OpenClaw at that folder, or copy the skill directory into your OpenClaw skills location and register the same path.

Use the OpenClaw skill actions in this order:

  • bootstrap validates a local working directory and can run Chromium installation when explicitly requested.
  • login creates the Playwright storage state explicitly.
  • crawl_seed and collect_nightly require an authenticated Playwright storage state file.
  • report_weekly and list_contactable run from the SQLite database and do not require --storage-state.

The skill does not clone repositories or create login sessions implicitly. Install the red-crawler CLI package first, point workspace_path at a local working directory, run bootstrap only for reviewed local setup steps, then run login when you are ready to create state.json.

Publishing

The package builds as a standard Python wheel and source distribution:

uv build

See docs/publishing.md for the release checklist and PyPI/TestPyPI commands.

launchd

For macOS local scheduling, use the template at docs/launchd/red-crawler.collect-nightly.plist.

Replace the placeholder paths:

  • __WORKDIR__
  • __UV_BIN__
  • __STORAGE_STATE__
  • __DB_PATH__
  • __REPORT_DIR__
  • __CACHE_DIR__
  • __LOG_DIR__

Then load it with:

launchctl unload ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist 2>/dev/null || true
cp docs/launchd/red-crawler.collect-nightly.plist ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist
launchctl load ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

red_crawler-0.1.1.tar.gz (40.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

red_crawler-0.1.1-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file red_crawler-0.1.1.tar.gz.

File metadata

  • Download URL: red_crawler-0.1.1.tar.gz
  • Upload date:
  • Size: 40.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1235fb9412ea7ac77bae4123d5360db55a141d0d08a149fab54a3b0193dfddbd
MD5 887901e44286457ba04459e52ecc1dca
BLAKE2b-256 c268f3472b206a2430b4580bae94080f66a80f7fcfa955a65637b8462a0ccad5

See more details on using hashes here.

File details

Details for the file red_crawler-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: red_crawler-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fbe13701f6efa91313beec12531bad21ee2d6af50eefcde3053b23f06740f866
MD5 5bda7fab24d7ea4ff8d4a0f9cf990d87
BLAKE2b-256 2051c358cf4372b27122012c9c693fc89df1fa412b64bfc5fc9e06dce892fcd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page