Search Xiaohongshu and generate AI-powered research reports
Project description
xhs-research
Search Xiaohongshu (小红书) and generate AI-powered research reports from the command line.
Instead of scrolling through dozens of posts one by one, type one command and get a structured summary with recommendations, price comparisons, and user sentiment — all powered by AI.
Features
- One command, full report — search a keyword → scrape posts → AI generates a structured Markdown report
- Multi-model support — OpenAI, Claude, DeepSeek, or run locally with llama.cpp / Ollama (zero cost)
- Smart summarization — chunks large result sets and merges summaries to fit any model's context window
- Structured output — recommendations table, buying advice, red flags, sentiment breakdown
- Login state persistence — scan QR code once, cookies saved for reuse
Quick Start
Option A: Install from PyPI (recommended)
pip install xhs-research
playwright install firefox
Option B: Install from GitHub
pip install git+https://github.com/yongsinfok/xhs-research.git
playwright install firefox
Option C: Clone for development
git clone https://github.com/yongsinfok/xhs-research.git
cd xhs-research
pip install -e .
playwright install firefox
2. Configure AI model
mkdir -p ~/.xhs-research
cp config.example.yaml ~/.xhs-research/config.yaml
Edit ~/.xhs-research/config.yaml:
ai:
api_key: sk-your-key # not needed for local models
base_url: null # local models: http://localhost:11434/v1
model: gpt-4o # or deepseek-chat, llama3, etc.
3. Run
xhs-research search "马来西亚高性价比扫地机器人"
A browser window opens. Scan the QR code with the Xiaohongshu app to log in (only needed the first time). The tool then scrapes posts and generates a report.
Usage
# Basic search (default 20 posts)
xhs-research search "吉隆坡美食推荐"
# More posts for better coverage
xhs-research search "新加坡PR申请攻略" --limit 30
# Use a specific model
xhs-research search "MacBook Pro M4 值得买吗" --model deepseek-chat
# Save to a specific path
xhs-research search "装修避坑指南" --output ./my-report.md
# Also export raw data as JSON
xhs-research search "搬家攻略" --json
# View config file location
xhs-research config-path
Report Example
# 马来西亚扫地机器人推荐 调研报告
> 基于 20 篇小红书帖子 · 2026-05-25
## 核心发现
- 小米 X20+ 关注度最高(447赞),有实测背书
- Dreame D20/Ultra 为热门候选,性价比讨论多
- Mova E40 作为竞品出现
## 推荐清单
| 品牌/型号 | 提及次数 | 最高赞 | 定位 |
|-------------|---------|--------|------------|
| 小米 X20+ | 1+ | 447 | 高关注实测 |
| Dreame D20 Ultra | 2 | 131 | 热门候选 |
| Dreame D20 | 1 | 28 | 性价比讨论 |
| Mova E40 | 1 | 28 | 竞品对比 |
## 购买建议 / 踩坑提醒 / 观点分布
...
Supported AI Models
| Provider | base_url |
model example |
Cost |
|---|---|---|---|
| OpenAI | (default) | gpt-4o |
Paid |
| Anthropic | (default) | claude-sonnet-4-6 |
Paid |
| DeepSeek | https://api.deepseek.com/v1 |
deepseek-chat |
Low cost |
| Ollama | http://localhost:11434/v1 |
llama3, qwen2 |
Free |
| llama.cpp | http://localhost:8080/v1 |
(local model) | Free |
| Any OpenAI-compatible API | (your endpoint) | (your model) | Varies |
Any endpoint that exposes an OpenAI-compatible /v1/chat/completions API works out of the box.
Project Structure
xhs-research/
├── xhs_research/
│ ├── cli.py # CLI entry point (typer)
│ ├── config.py # YAML config loader
│ ├── models/post.py # Post / Comment data models
│ ├── ai/
│ │ ├── client.py # Unified AI client (OpenAI SDK)
│ │ └── summarizer.py # Chunk + merge summarization
│ └── scraper/
│ ├── browser.py # Playwright browser manager
│ ├── login.py # QR code login handler
│ └── parser.py # Search result scraper
├── config.example.yaml
├── requirements.txt
└── README.md
Limitations
- Xiaohongshu web restrictions — post detail pages are often blocked on web, so reports are primarily based on search result titles and card summaries. Increasing
--limitimproves coverage. - Anti-scraping — uses standard Playwright Firefox. For better evasion, consider camoufox.
- Login expiration — cookies may expire; re-scan QR code when prompted.
- Personal use only — respect Xiaohongshu's terms of service. Do not use for commercial scraping.
Contributing
Issues and pull requests are welcome. Areas to contribute:
- Mobile API support for full post content
- Better anti-detection (camoufox integration)
- Web UI or API server mode
- Support for other platforms (Douyin, Bilibili, etc.)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xhs_research-0.2.0.tar.gz.
File metadata
- Download URL: xhs_research-0.2.0.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
350c83411a340c46eacb9f2c61d804239a7af5bfaa9dbc8d70c7e4f98f609206
|
|
| MD5 |
6caadcd3b652cb91eb5bd0f9759888b9
|
|
| BLAKE2b-256 |
dcccfc77bdba271e2d7e5e76e81a75bc9899a41ba9b2502b5e28395851f420a1
|
Provenance
The following attestation bundles were made for xhs_research-0.2.0.tar.gz:
Publisher:
publish.yml on yongsinfok/xhs-research
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xhs_research-0.2.0.tar.gz -
Subject digest:
350c83411a340c46eacb9f2c61d804239a7af5bfaa9dbc8d70c7e4f98f609206 - Sigstore transparency entry: 1627162604
- Sigstore integration time:
-
Permalink:
yongsinfok/xhs-research@c20213ca2ca11119c55b9ca72b32310199661704 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/yongsinfok
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c20213ca2ca11119c55b9ca72b32310199661704 -
Trigger Event:
push
-
Statement type:
File details
Details for the file xhs_research-0.2.0-py3-none-any.whl.
File metadata
- Download URL: xhs_research-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a833ca7ca2d0a8c5df18028bfee312f3ae95f848a1dddd636ba14d6f364291f3
|
|
| MD5 |
61ecd3c36bc6f09451eb606bf24a5aa2
|
|
| BLAKE2b-256 |
17b69467ddf0529369892dfd06b25315da77302509a9c037b003566189c32158
|
Provenance
The following attestation bundles were made for xhs_research-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on yongsinfok/xhs-research
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xhs_research-0.2.0-py3-none-any.whl -
Subject digest:
a833ca7ca2d0a8c5df18028bfee312f3ae95f848a1dddd636ba14d6f364291f3 - Sigstore transparency entry: 1627162735
- Sigstore integration time:
-
Permalink:
yongsinfok/xhs-research@c20213ca2ca11119c55b9ca72b32310199661704 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/yongsinfok
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c20213ca2ca11119c55b9ca72b32310199661704 -
Trigger Event:
push
-
Statement type: