Skip to main content

Xiaohongshu contact lead crawler for fashion creators

Project description

red-crawler

CLI crawler for collecting Xiaohongshu beauty creator contact leads from profile bios and recommendation chains, with SQLite persistence and nightly automation.

Usage

Install the published CLI:

uv tool install red-crawler==0.1.0

Install the Playwright browser runtime:

red-crawler install-browsers

For local development from a checkout:

uv sync
uv run playwright install chromium

Save a reusable login session first:

red-crawler login --save-state "./state.json"

It will open a visible browser. Log in to Xiaohongshu there, then come back to the terminal and press Enter to save the session file.

Run a manual crawl with an existing Playwright storage state file:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --storage-state "./state.json" \
  --max-accounts 20 \
  --max-depth 2 \
  --db-path "./data/red_crawler.db" \
  --output-dir "./output"

crawl-seed defaults to safe mode, adding slower request pacing and dwell/scroll delays that look more like a normal browsing session. Use --no-safe-mode only if you explicitly want a faster run.

crawl-seed now does both:

  • exports accounts.csv, contact_leads.csv, run_report.json
  • upserts the same result into SQLite

Optional note-page expansion:

red-crawler crawl-seed \
  --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
  --storage-state "./state.json" \
  --include-note-recommendations

List high-quality contactable creators from the SQLite database:

red-crawler list-contactable \
  --db-path "./data/red_crawler.db" \
  --min-relevance-score 0.7 \
  --limit 20

Run nightly auto-collection with queue, search bootstrap, seed promotion, and daily report output:

red-crawler collect-nightly \
  --storage-state "./state.json" \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --cache-dir "./.cache/red-crawler" \
  --crawl-budget 30

Export weekly growth report and a contactable creator CSV:

red-crawler report-weekly \
  --db-path "./data/red_crawler.db" \
  --report-dir "./reports" \
  --days 7

Key outputs:

  • manual crawl:
    • accounts.csv
    • contact_leads.csv
    • run_report.json
  • nightly automation:
    • reports/daily-run-report.json
    • reports/weekly-growth-report.json
    • reports/contactable_creators.csv
  • SQLite database:
    • data/red_crawler.db

OpenClaw

The OpenClaw skill for this project lives at openclaw-skills/red-crawler-ops/.

To install it from a local path, point OpenClaw at that folder, or copy the skill directory into your OpenClaw skills location and register the same path.

Use the OpenClaw skill actions in this order:

  • bootstrap validates a local working directory and can run Chromium installation when explicitly requested.
  • login creates the Playwright storage state explicitly.
  • crawl_seed and collect_nightly require an authenticated Playwright storage state file.
  • report_weekly and list_contactable run from the SQLite database and do not require --storage-state.

The skill does not clone repositories or create login sessions implicitly. Install the red-crawler CLI package first, point workspace_path at a local working directory, run bootstrap only for reviewed local setup steps, then run login when you are ready to create state.json.

Publishing

The package builds as a standard Python wheel and source distribution:

uv build

See docs/publishing.md for the release checklist and PyPI/TestPyPI commands.

launchd

For macOS local scheduling, use the template at docs/launchd/red-crawler.collect-nightly.plist.

Replace the placeholder paths:

  • __WORKDIR__
  • __UV_BIN__
  • __STORAGE_STATE__
  • __DB_PATH__
  • __REPORT_DIR__
  • __CACHE_DIR__
  • __LOG_DIR__

Then load it with:

launchctl unload ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist 2>/dev/null || true
cp docs/launchd/red-crawler.collect-nightly.plist ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist
launchctl load ~/Library/LaunchAgents/com.red-crawler.collect-nightly.plist

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

red_crawler-0.1.0.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

red_crawler-0.1.0-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file red_crawler-0.1.0.tar.gz.

File metadata

  • Download URL: red_crawler-0.1.0.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0246d31ee71cbf51057b6755723c27abc75a0d4ef288a8742e7ca23cdb098ec5
MD5 1ed2628202249a582c073bf8af9b6355
BLAKE2b-256 dd96013af1fd229ded1838de7784c9eb6c8e0626d74f2e2acf4a6bf8eeade574

See more details on using hashes here.

File details

Details for the file red_crawler-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: red_crawler-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for red_crawler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1c4117ea5b863f8d2f85f152112f9046a6405d359ab2e10b342b9719f7864a5
MD5 c37f19baa8a2b25123daed0cacdedc10
BLAKE2b-256 a0d1194f23ad4db3173749f2cda78d1c9206a0f6110f5ec048fc5f686bdfc2d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page