Skip to main content

Search Xiaohongshu and generate AI-powered research reports

Project description

xhs-research

Search Xiaohongshu (小红书) and generate AI-powered research reports from the command line.

Instead of scrolling through dozens of posts one by one, type one command and get a structured summary with recommendations, price comparisons, and user sentiment — all powered by AI.

Features

  • One command, full report — search a keyword → scrape posts → AI generates a structured Markdown report
  • Multi-model support — OpenAI, Claude, DeepSeek, or run locally with llama.cpp / Ollama (zero cost)
  • Smart summarization — chunks large result sets and merges summaries to fit any model's context window
  • Structured output — recommendations table, buying advice, red flags, sentiment breakdown
  • Login state persistence — scan QR code once, cookies saved for reuse

Quick Start

Option A: Install from PyPI (recommended)

pip install xhs-research
playwright install firefox

Option B: Install from GitHub

pip install git+https://github.com/yongsinfok/xhs-research.git
playwright install firefox

Option C: Clone for development

git clone https://github.com/yongsinfok/xhs-research.git
cd xhs-research
pip install -e .
playwright install firefox

2. Configure AI model

mkdir -p ~/.xhs-research
cp config.example.yaml ~/.xhs-research/config.yaml

Edit ~/.xhs-research/config.yaml:

ai:
  api_key: sk-your-key        # not needed for local models
  base_url: null              # local models: http://localhost:11434/v1
  model: gpt-4o               # or deepseek-chat, llama3, etc.

3. Run

xhs-research search "马来西亚高性价比扫地机器人"

A browser window opens. Scan the QR code with the Xiaohongshu app to log in (only needed the first time). The tool then scrapes posts and generates a report.

Usage

# Basic search (default 20 posts)
xhs-research search "吉隆坡美食推荐"

# More posts for better coverage
xhs-research search "新加坡PR申请攻略" --limit 30

# Use a specific model
xhs-research search "MacBook Pro M4 值得买吗" --model deepseek-chat

# Save to a specific path
xhs-research search "装修避坑指南" --output ./my-report.md

# Also export raw data as JSON
xhs-research search "搬家攻略" --json

# View config file location
xhs-research config-path

Report Example

# 马来西亚扫地机器人推荐 调研报告

> 基于 20 篇小红书帖子 · 2026-05-25

## 核心发现

- 小米 X20+ 关注度最高(447赞),有实测背书
- Dreame D20/Ultra 为热门候选,性价比讨论多
- Mova E40 作为竞品出现

## 推荐清单

| 品牌/型号   | 提及次数 | 最高赞 | 定位       |
|-------------|---------|--------|------------|
| 小米 X20+   | 1+      | 447    | 高关注实测 |
| Dreame D20 Ultra | 2  | 131   | 热门候选   |
| Dreame D20  | 1       | 28     | 性价比讨论 |
| Mova E40    | 1       | 28     | 竞品对比   |

## 购买建议 / 踩坑提醒 / 观点分布
...

Supported AI Models

Provider base_url model example Cost
OpenAI (default) gpt-4o Paid
Anthropic (default) claude-sonnet-4-6 Paid
DeepSeek https://api.deepseek.com/v1 deepseek-chat Low cost
Ollama http://localhost:11434/v1 llama3, qwen2 Free
llama.cpp http://localhost:8080/v1 (local model) Free
Any OpenAI-compatible API (your endpoint) (your model) Varies

Any endpoint that exposes an OpenAI-compatible /v1/chat/completions API works out of the box.

Project Structure

xhs-research/
├── xhs_research/
│   ├── cli.py              # CLI entry point (typer)
│   ├── config.py           # YAML config loader
│   ├── models/post.py      # Post / Comment data models
│   ├── ai/
│   │   ├── client.py       # Unified AI client (OpenAI SDK)
│   │   └── summarizer.py   # Chunk + merge summarization
│   └── scraper/
│       ├── browser.py      # Playwright browser manager
│       ├── login.py        # QR code login handler
│       └── parser.py       # Search result scraper
├── config.example.yaml
├── requirements.txt
└── README.md

Limitations

  • Xiaohongshu web restrictions — post detail pages are often blocked on web, so reports are primarily based on search result titles and card summaries. Increasing --limit improves coverage.
  • Anti-scraping — uses standard Playwright Firefox. For better evasion, consider camoufox.
  • Login expiration — cookies may expire; re-scan QR code when prompted.
  • Personal use only — respect Xiaohongshu's terms of service. Do not use for commercial scraping.

Contributing

Issues and pull requests are welcome. Areas to contribute:

  • Mobile API support for full post content
  • Better anti-detection (camoufox integration)
  • Web UI or API server mode
  • Support for other platforms (Douyin, Bilibili, etc.)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_research-0.2.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xhs_research-0.2.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file xhs_research-0.2.0.tar.gz.

File metadata

  • Download URL: xhs_research-0.2.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhs_research-0.2.0.tar.gz
Algorithm Hash digest
SHA256 350c83411a340c46eacb9f2c61d804239a7af5bfaa9dbc8d70c7e4f98f609206
MD5 6caadcd3b652cb91eb5bd0f9759888b9
BLAKE2b-256 dcccfc77bdba271e2d7e5e76e81a75bc9899a41ba9b2502b5e28395851f420a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhs_research-0.2.0.tar.gz:

Publisher: publish.yml on yongsinfok/xhs-research

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhs_research-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: xhs_research-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhs_research-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a833ca7ca2d0a8c5df18028bfee312f3ae95f848a1dddd636ba14d6f364291f3
MD5 61ecd3c36bc6f09451eb606bf24a5aa2
BLAKE2b-256 17b69467ddf0529369892dfd06b25315da77302509a9c037b003566189c32158

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhs_research-0.2.0-py3-none-any.whl:

Publisher: publish.yml on yongsinfok/xhs-research

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page