Skip to main content

Search Xiaohongshu and generate AI-powered research reports

Project description

xhs-research

Search Xiaohongshu (小红书) and generate AI-powered research reports from the command line.

Instead of scrolling through dozens of posts one by one, type one command and get a structured summary with recommendations, price comparisons, and user sentiment — all powered by AI.

Features

  • One command, full report — search a keyword → scrape posts → AI generates a structured Markdown report
  • Multi-model support — OpenAI, Claude, DeepSeek, or run locally with llama.cpp / Ollama (zero cost)
  • Smart summarization — chunks large result sets and merges summaries to fit any model's context window
  • Structured output — recommendations table, buying advice, red flags, sentiment breakdown
  • Login state persistence — scan QR code once, cookies saved for reuse

Quick Start

Option A: Install from GitHub (recommended)

pip install git+https://github.com/yongsinfok/xhs-research.git
playwright install firefox

Option B: Clone and install

git clone https://github.com/yongsinfok/xhs-research.git
cd xhs-research
pip install -e .
playwright install firefox

2. Configure AI model

mkdir -p ~/.xhs-research
cp config.example.yaml ~/.xhs-research/config.yaml

Edit ~/.xhs-research/config.yaml:

ai:
  api_key: sk-your-key        # not needed for local models
  base_url: null              # local models: http://localhost:11434/v1
  model: gpt-4o               # or deepseek-chat, llama3, etc.

3. Run

xhs-research search "马来西亚高性价比扫地机器人"

A browser window opens. Scan the QR code with the Xiaohongshu app to log in (only needed the first time). The tool then scrapes posts and generates a report.

Usage

# Basic search (default 20 posts)
xhs-research search "吉隆坡美食推荐"

# More posts for better coverage
xhs-research search "新加坡PR申请攻略" --limit 30

# Use a specific model
xhs-research search "MacBook Pro M4 值得买吗" --model deepseek-chat

# Save to a specific path
xhs-research search "装修避坑指南" --output ./my-report.md

# Also export raw data as JSON
xhs-research search "搬家攻略" --json

# View config file location
xhs-research config-path

Report Example

# 马来西亚扫地机器人推荐 调研报告

> 基于 20 篇小红书帖子 · 2026-05-25

## 核心发现

- 小米 X20+ 关注度最高(447赞),有实测背书
- Dreame D20/Ultra 为热门候选,性价比讨论多
- Mova E40 作为竞品出现

## 推荐清单

| 品牌/型号   | 提及次数 | 最高赞 | 定位       |
|-------------|---------|--------|------------|
| 小米 X20+   | 1+      | 447    | 高关注实测 |
| Dreame D20 Ultra | 2  | 131   | 热门候选   |
| Dreame D20  | 1       | 28     | 性价比讨论 |
| Mova E40    | 1       | 28     | 竞品对比   |

## 购买建议 / 踩坑提醒 / 观点分布
...

Supported AI Models

Provider base_url model example Cost
OpenAI (default) gpt-4o Paid
Anthropic (default) claude-sonnet-4-6 Paid
DeepSeek https://api.deepseek.com/v1 deepseek-chat Low cost
Ollama http://localhost:11434/v1 llama3, qwen2 Free
llama.cpp http://localhost:8080/v1 (local model) Free
Any OpenAI-compatible API (your endpoint) (your model) Varies

Any endpoint that exposes an OpenAI-compatible /v1/chat/completions API works out of the box.

Project Structure

xhs-research/
├── xhs_research/
│   ├── cli.py              # CLI entry point (typer)
│   ├── config.py           # YAML config loader
│   ├── models/post.py      # Post / Comment data models
│   ├── ai/
│   │   ├── client.py       # Unified AI client (OpenAI SDK)
│   │   └── summarizer.py   # Chunk + merge summarization
│   └── scraper/
│       ├── browser.py      # Playwright browser manager
│       ├── login.py        # QR code login handler
│       └── parser.py       # Search result scraper
├── config.example.yaml
├── requirements.txt
└── README.md

Limitations

  • Xiaohongshu web restrictions — post detail pages are often blocked on web, so reports are primarily based on search result titles and card summaries. Increasing --limit improves coverage.
  • Anti-scraping — uses standard Playwright Firefox. For better evasion, consider camoufox.
  • Login expiration — cookies may expire; re-scan QR code when prompted.
  • Personal use only — respect Xiaohongshu's terms of service. Do not use for commercial scraping.

Contributing

Issues and pull requests are welcome. Areas to contribute:

  • Mobile API support for full post content
  • Better anti-detection (camoufox integration)
  • Web UI or API server mode
  • Support for other platforms (Douyin, Bilibili, etc.)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_research-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xhs_research-0.1.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file xhs_research-0.1.0.tar.gz.

File metadata

  • Download URL: xhs_research-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for xhs_research-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5b33e543e84ff8595ba48bfee621285b84b8e4a59382eb4eaf0ce1031d004a9e
MD5 03ec3a9e5a4851df378c1be500f66c8f
BLAKE2b-256 7c6bf8cd9b1fa43f847f1b52a75a2d681c8f047ab8471146461403b30166ad90

See more details on using hashes here.

File details

Details for the file xhs_research-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xhs_research-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for xhs_research-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3230af213cd91e12cf3f7830f1e6bf55bd27ef4989fedd67f2acc234cc5dd02d
MD5 2810f62e2e46e31d958d5f9fd8a5ac2f
BLAKE2b-256 77fc1752d338783c4e4f616c3a6eb19a7c0987bbbf6de9c12ccd69a6cdc70211

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page