Scrape learn.liferay.com/w/dxp into local Markdown docs (raw/{capability}/*.md) for the liferay-expert Claude Code skill.

Project description

liferay-docs-scraper

Scrape learn.liferay.com/w/dxp/* into local Markdown, then let Claude Code answer Liferay DXP questions by searching those files. No bundled Liferay content, no embeddings, no vector DB.

Quickstart

From zero to asking Liferay questions in Claude Code:

# 1. One-time browser setup for crawl4ai/Playwright
uvx --from crawl4ai crawl4ai-setup

# 2. Scrape the official Liferay DXP docs (~30-40 min)
uvx liferay-docs-scraper

# 3. Install the Claude Code skill in your project
npx skills add mordonez/liferay-docs-scraper --skill liferay-expert -a claude-code

# 4. Check that docs and skill are ready
uvx --from liferay-docs-scraper liferay-docs-scraper-doctor

Then ask Claude Code something like:

How do I configure synonym sets in Liferay Search?

The skill searches the local Markdown, reads the matching page, and cites the source URL from that file's frontmatter.

Keep -a claude-code in the install command. It avoids interactive installer edge cases where the skill can appear installed but not land in .claude/skills/.

What this repo does not do

It does not ship Liferay documentation text. The package contains the scraper and skill only; each user fetches their own local copy from learn.liferay.com.
It does not use embeddings, RAG infrastructure, or a vector database. The skill uses normal file search and reads Markdown directly.
It does not scrape automatically from the skill. If docs are missing, the skill tells you what command to run instead of starting a long crawl in the middle of a conversation.

Requirements

Python 3.10-3.13
uv
Node/npm for npx skills add

crawl4ai drives Playwright, so run this once before the first scrape:

uvx --from crawl4ai crawl4ai-setup

Scraper Reference

Run the official-docs scraper:

uvx liferay-docs-scraper

It crawls https://learn.liferay.com/w/dxp/index with crawl4ai's BFS crawler, keeps URLs under /w/dxp/*, extracts .learn-article-content, classifies each page into one of 14 Liferay capabilities, and writes Markdown to one shared docs directory.

Default docs directory:

~/.liferay-docs

Override it when needed:

export LIFERAY_DOCS_DIR="$PWD/.liferay-docs"
uvx liferay-docs-scraper

Directory layout:

~/.liferay-docs/
  raw/{capability}/*.md
  raw/_navigation/{capability}/*.md
  raw/_removed/{capability}/*.md
  reports/filtered/

Useful commands:

# Smaller smoke run
uvx liferay-docs-scraper --max-pages 200

# Check local docs and current-project skill installation
uvx --from liferay-docs-scraper liferay-docs-scraper-doctor

The scraper writes files atomically, retries page fetches through crawl4ai, uses bounded concurrency, and exits non-zero if page fetches or the crawl stream fail. If the crawl is interrupted, already written pages remain usable, but the run is marked failed and orphan quarantine is skipped so a partial crawl cannot move good pages to raw/_removed/.

Community Articles

Optional, larger, and lower-authority:

uvx --from liferay-docs-scraper liferay-docs-scraper-community

This fetches Liferay community How-To and Troubleshooting articles from learn.liferay.com/kb-article/*. It writes them separately:

raw/community-howto/{capability}/*.md
raw/community-troubleshooting/{capability}/*.md

Many community articles are not tagged with a capability by the site; those go to _uncategorized/. The liferay-expert skill treats community content as a secondary source and says so in answers.

Useful options:

uvx --from liferay-docs-scraper liferay-docs-scraper-community --resource-type howto
uvx --from liferay-docs-scraper liferay-docs-scraper-community --limit 100

Skill Reference

Install into the current project:

npx skills add mordonez/liferay-docs-scraper --skill liferay-expert -a claude-code

Manual install also works: copy skills/liferay-expert/SKILL.md to:

.claude/skills/liferay-expert/SKILL.md

The skill resolves docs exactly like the scraper:

$LIFERAY_DOCS_DIR, if set.
~/.liferay-docs, otherwise.

When answering, it searches raw/{capability}/*.md, reads the best matching files, and cites their url: frontmatter. It skips raw/_navigation/ unless there is no better source.

Development

uv sync --group dev
uv run ruff check .
uv run pytest
uv build

CI runs lint, tests, and package build on Python 3.10, 3.11, 3.12, and 3.13. It does not run a real scrape. Release publishing is documented in docs/release.md.

License

MIT applies to this tool and skill only. Liferay documentation content remains Liferay's content and is fetched locally by each user.

Project details

Release history Release notifications | RSS feed

This version

0.6.1

Jul 3, 2026

0.5.0

Jul 2, 2026

0.4.1

Jul 2, 2026

0.4.0

Jul 2, 2026

0.3.0

Jul 2, 2026

0.2.0

Jul 2, 2026

0.1.2

Jul 2, 2026

0.1.1

Jul 1, 2026

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liferay_docs_scraper-0.6.1.tar.gz (43.3 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

liferay_docs_scraper-0.6.1-py3-none-any.whl (24.3 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file liferay_docs_scraper-0.6.1.tar.gz.

File metadata

Download URL: liferay_docs_scraper-0.6.1.tar.gz
Upload date: Jul 3, 2026
Size: 43.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for liferay_docs_scraper-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`fc98d53df0fa6d1bd717a007923c6da4b31f7c80a34f6d3c2ec90562dee7cd7e`
MD5	`d369f035270f79913d09961d5523b0fd`
BLAKE2b-256	`3577a371f172a9788300c119f708211d390e1de56062aa0251a3aa3cac9bdc8f`

See more details on using hashes here.

File details

Details for the file liferay_docs_scraper-0.6.1-py3-none-any.whl.

File metadata

Download URL: liferay_docs_scraper-0.6.1-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for liferay_docs_scraper-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eb55475961f59ca1764a6c9ee061086686068ba29ae8e294994277893f79ee9`
MD5	`3cc7a22ac2a9297df0b8e5ed83a2ce81`
BLAKE2b-256	`0d7f3e762ca73b0a7d538f4767ea5cfeb4eb46090d3c60b4315e25e242e3ae15`

See more details on using hashes here.

liferay-docs-scraper 0.6.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

liferay-docs-scraper

Quickstart

What this repo does not do

Requirements

Scraper Reference

Community Articles

Skill Reference

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes