A Python library to scrape Feishu wiki pages and convert them to Markdown

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

Feishu Wiki Scraper

A Python library to scrape Feishu (飞书) wiki pages and convert them to Markdown format, similar to Firecrawl. This tool can scrape entire wiki sites by following sidebar links and extracting all content.

Features

🚀 Scrape single Feishu wiki pages or entire wiki sites
📝 Convert HTML content to clean Markdown format
🔗 Automatically follow sidebar links to scrape related pages
🍪 Support for authentication via cookies and custom headers
⚙️ Configurable scraping options (delays, max pages, etc.)
💾 Export to Markdown files or JSON format
📂 Directory output mode — save each page as a separate .md file preserving wiki tree structure
🎯 Command-line interface for easy usage
🔥 Firecrawl-compatible JSON output with metadata

Installation

From source

git clone https://github.com/rwifeng/feishu-wiki-scrape.git
cd feishu-wiki-scrape
pip install -e .

Dependencies

pip install -r requirements.txt

Usage

Command Line Interface

Basic usage to scrape a Feishu wiki:

# Save to a single Markdown file
feishu-wiki-scrape https://zcn3fx96oxg4.feishu.cn/wiki/H5V5wMczPif5A5khSG3cWx65nbc -o output.md

# Save as a directory tree (one .md file per page, preserving wiki structure)
feishu-wiki-scrape https://zcn3fx96oxg4.feishu.cn/wiki/H5V5wMczPif5A5khSG3cWx65nbc -o ./wiki-docs/

Options

-o, --output: Output path (default: output.md). If the path ends with /, is an existing directory, or has no file extension, each page is saved as a separate .md file in a nested directory tree matching the wiki structure
--max-pages: Maximum number of pages to scrape (default: unlimited)
--no-sidebar: Don't follow sidebar links (scrape only the given URL)
--delay: Delay between requests in seconds (default: 1.0)
--cookies: Cookies as JSON string for authentication
--headers: Custom headers as JSON string
--json-output: Output as JSON instead of Markdown file
--firecrawl-format: Output in Firecrawl-compatible JSON format with metadata
-v, --verbose: Enable verbose logging

Examples

Scrape a single page without following links:

feishu-wiki-scrape https://example.feishu.cn/wiki/page --no-sidebar -o single_page.md

Scrape with authentication cookies:

feishu-wiki-scrape https://example.feishu.cn/wiki/page \
  --cookies '{"session_id": "your-session-id"}' \
  -o authenticated_output.md

Limit to 10 pages with custom delay:

feishu-wiki-scrape https://example.feishu.cn/wiki/page \
  --max-pages 10 \
  --delay 2.0 \
  -o limited_output.md

Output as JSON:

feishu-wiki-scrape https://example.feishu.cn/wiki/page --json-output > output.json

Save as directory tree preserving wiki structure:

feishu-wiki-scrape https://example.feishu.cn/wiki/page -o ./docs/

This produces a directory tree like:

docs/
  🚀 Introduction/
    index.md          # parent page with children
    Getting Started.md
  FAQ/
    index.md
    Common Errors.md
  Claude.md           # leaf page (no children)

Python API

from feishu_wiki_scrape import FeishuWikiScraper

# Create scraper instance
scraper = FeishuWikiScraper(
    cookies={"session_id": "your-session-id"},  # Optional
    headers={"Custom-Header": "value"},          # Optional
    delay=1.0                                    # Delay between requests
)

# Scrape a single page
page = scraper.scrape_page("https://example.feishu.cn/wiki/page")
print(page["title"])
print(page["markdown"])

# Scrape entire wiki (follows sidebar links)
results = scraper.scrape_wiki(
    start_url="https://example.feishu.cn/wiki/page",
    max_pages=50,           # Optional: limit number of pages
    include_sidebar=True    # Follow sidebar links
)

for page in results:
    print(f"Title: {page['title']}")
    print(f"URL: {page['url']}")
    print(f"Content:\n{page['markdown']}\n")

# Save to file
scraper.scrape_to_file(
    start_url="https://example.feishu.cn/wiki/page",
    output_file="output.md",
    max_pages=None,         # No limit
    include_sidebar=True
)

# Save to directory tree (preserves wiki sidebar structure)
count = scraper.scrape_wiki_to_directory(
    start_url="https://example.feishu.cn/wiki/page",
    output_dir="./docs/",
    max_pages=None          # No limit
)
print(f"Saved {count} pages")

Firecrawl-Compatible Output

This library supports Firecrawl-compatible JSON output with rich metadata, making it easy to build API-compatible tools.

Using CLI

feishu-wiki-scrape https://example.feishu.cn/wiki/page \
  --firecrawl-format \
  --max-pages 10 > output.json

Output format:

{
  "success": true,
  "status": "completed",
  "completed": 10,
  "total": 10,
  "data": [
    {
      "markdown": "# Page Title\n\nPage content...",
      "metadata": {
        "url": "https://example.feishu.cn/wiki/page",
        "title": "Page Title",
        "keywords": "keyword1, keyword2",
        "language": "zh-CN",
        "sourceURL": "https://example.feishu.cn/wiki/page",
        "statusCode": 200,
        "contentType": "text/html; charset=utf-8",
        "description": "Page description"
      }
    }
  ]
}

Using Python API

from feishu_wiki_scrape import FeishuWikiScraper

scraper = FeishuWikiScraper()

# Scrape with metadata
results = scraper.scrape_wiki_with_metadata(
    start_url="https://example.feishu.cn/wiki/page",
    max_pages=10,
    include_sidebar=True
)

# Format as Firecrawl response (automatically handles metadata format)
firecrawl_response = scraper.format_as_firecrawl(results, start_url)

print(firecrawl_response)

For a complete example of building a Firecrawl-compatible API, see example_firecrawl.py.

How It Works

Page Fetching: Uses requests to fetch wiki pages with configurable headers and cookies
Content Extraction: Parses HTML with BeautifulSoup to extract main content area
Link Discovery: Finds all wiki links in sidebars and navigation elements
Markdown Conversion: Converts HTML to clean Markdown using html2text
Crawling: Follows links breadth-first to scrape entire wiki sites
Rate Limiting: Respects configurable delays between requests

Authentication

Feishu wikis may require authentication. You can provide cookies or headers:

Getting Cookies

Open the Feishu wiki in your browser
Open Developer Tools (F12)
Go to Application/Storage > Cookies
Copy the relevant cookie values
Pass them using --cookies option or in Python code

Example:

feishu-wiki-scrape https://example.feishu.cn/wiki/page \
  --cookies '{"session_id": "abc123", "other_cookie": "value"}'

Output Format

Single Markdown File (`-o output.md`)

Pages are separated by horizontal rules (---) with each page containing:

Page title as H1 heading
Source URL
Markdown content

Directory Tree (`-o dir/`)

Each wiki page is saved as a separate .md file. The directory structure mirrors the wiki's sidebar tree:

Pages with children become a directory containing index.md (the page content) plus child pages
Leaf pages (no children) are saved as {title}.md in the parent directory
The wiki space root container is skipped so the output directory maps directly to the top-level pages

JSON Format

[
  {
    "url": "https://example.feishu.cn/wiki/page1",
    "title": "Page Title",
    "markdown": "# Content\n\nPage content in markdown..."
  },
  {
    "url": "https://example.feishu.cn/wiki/page2",
    "title": "Another Page",
    "markdown": "# Content\n\nMore content..."
  }
]

Troubleshooting

Pages not loading

Check if authentication is required (try with cookies)
Verify the URL is accessible in a browser
Increase delay between requests

Missing content

Some content may be loaded dynamically with JavaScript
Try using cookies from an authenticated session
Check verbose output with -v flag

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.1.0

Feb 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feishu_wiki_scrape-0.1.0.tar.gz (21.2 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

feishu_wiki_scrape-0.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file feishu_wiki_scrape-0.1.0.tar.gz.

File metadata

Download URL: feishu_wiki_scrape-0.1.0.tar.gz
Upload date: Feb 7, 2026
Size: 21.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for feishu_wiki_scrape-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a90e36105fe43dcc117955e2fab8cf384fb3704cbc818da93884f738c46adc7b`
MD5	`718d38136b2c068acaebbc6e72d72a1d`
BLAKE2b-256	`ce78033b65cbb1e0ec7303d9e9a180fae44993c4284f127b330cf77838808318`

See more details on using hashes here.

File details

Details for the file feishu_wiki_scrape-0.1.0-py3-none-any.whl.

File metadata

Download URL: feishu_wiki_scrape-0.1.0-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for feishu_wiki_scrape-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4d18ce30f28173194890c6460c7d31d15f4ffe4d5fd1b190f4da898cbf3dac0`
MD5	`85a44684e46bb6695bd1c4f3529b5342`
BLAKE2b-256	`39b923f7bac8ae539d5391006d3ba753867ec41abe3ce9ad92f7a7443da9e45a`

See more details on using hashes here.

feishu-wiki-scrape 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Feishu Wiki Scraper

Features

Installation

From source

Dependencies

Usage

Command Line Interface

Options

Examples

Python API

Firecrawl-Compatible Output

Using CLI

Using Python API

How It Works

Authentication

Getting Cookies

Output Format

Single Markdown File (-o output.md)

Directory Tree (-o dir/)

JSON Format

Troubleshooting

Pages not loading

Missing content

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Single Markdown File (`-o output.md`)

Directory Tree (`-o dir/`)