Turn any topic into an evidence-backed skills library — automatically

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

shatianming

These details have not been verified by PyPI

Project description

LangSkills: Evidence-Backed Skills for Vibe Research & Vibe Coding

Skills: 101k+ Bundles: 21 Papers: 62k+

🌐 LangSkills — Evidence-Backed Skills for AI Agents

📄 101K Skills from 62K+ Papers & 23K+ Tech Sources — Search, Generate, Reuse

Quick Start · Skill Library · Pipeline · Installation · OpenClaw · CLI Reference · Configuration

📰 News

2026-02-28 — v0.1.0: 101,330 skills across 21 domain bundles officially released
2026-02-27 — Pre-built SQLite bundles with FTS5 full-text search ready for download
2026-02-27 — Journal pipeline online: PMC, PLOS, Nature, eLife, arXiv full coverage

✨ Key Features

📚 Massive Pre-Built Skill Library: 101,330 evidence-backed skills covering 62K+ research papers and 23K+ coding/tech sources — all searchable offline via FTS5-powered SQLite bundles.
🔧 Fully Automated Skill Pipeline: Give it a topic → it discovers sources → fetches & extracts text → generates skills with an LLM → validates quality → publishes. One command, zero manual work.
🔬 Evidence-First, Never Hallucination-Only: Every skill traces back to real web pages, academic papers, or code repositories with full provenance chains — metadata, quality scores, and source links included.
🌐 Multi-Source Intelligence: Integrates Tavily, GitHub, Baidu, Zhihu, XHS, StackOverflow, arXiv, PMC, PLOS, Nature, eLife — 10+ data source providers for comprehensive coverage.
🧠 LLM-Powered Quality Gates: Each skill is generated, validated, and scored by LLMs with configurable quality thresholds — ensuring high-signal, low-noise output at scale.
⚡ Drop-In Reusability: Download domain-specific SQLite bundles, skill-search any keyword, and get structured Markdown ready to feed into any AI agent, RAG pipeline, or knowledge base.
🏗️ Extensible Architecture: Modular source providers, LLM backends (OpenAI / Ollama), queue-based batch processing, and configurable domain rules — built to scale.
📦 21 Domain Bundles: From Linux sysadmin to PLOS biology, from web development to machine learning — organized, versioned, and individually installable.

🚀 Quick Start

pip install langskills-rai

# Auto-detect your project and install only matching bundles (~50-200 MB)
langskills-rai bundle-install --auto

# Search the pre-built skill library (Vibe Research)
langskills-rai skill-search "kubernetes networking" --top 5

# Generate new skills from any topic (Vibe Coding)
cp .env.example .env   # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai capture "Docker networking@15"

Full setup details → Installation

📄 The Skill Library

62,582 research skills distilled from academic papers + 23,765 coding/tech skills from GitHub, StackOverflow, and the web — all searchable offline.

Domain	Skills	Sources
📄 research-plos-*	35,505	PLOS ONE, Biology, CompBio, Medicine, Genetics, NTD, Pathogens
📄 research-arxiv	3,483	arXiv papers
📄 research-elife	391	eLife journal
📄 research-other	23,203	Other academic sources
💻 linux	7,455	Linux / sysadmin
💻 web	6,029	Web development
💻 programming	4,071	General programming
💻 devtools	2,243	Developer tools
💻 security	1,182	Security
💻 cloud / data / ml / llm / observability	2,785	Infra & ML
🗂️ other	14,983	Uncategorized
	101,330	21 SQLite bundles

🔍 How to Use the Library

# Install a domain bundle (downloads from GitHub Releases)
langskills-rai bundle-install --domain linux

# Or auto-detect your project type and install matching bundles
langskills-rai bundle-install --auto

# Search skills offline (FTS5 full-text search)
langskills-rai skill-search "container orchestration" --top 10

# Filter by domain and minimum quality score
langskills-rai skill-search "CRISPR" --domain research --min-score 4.0

# Get full skill content as Markdown
langskills-rai skill-search "React hooks" --content --format markdown

📦 Skill Package Structure

Each skill is a structured Markdown package with full traceability:

skills/by-skill/<domain>/<topic>/
├── skill.md          # The skill content (tutorial / how-to / protocol)
├── metadata.yaml     # Provenance, tags, quality score, LLM model used
└── source.json       # Evidence trail back to original web/paper source

Every skill traces to real sources — never hallucination-only.

🔧 The Pipeline

📋 Step-by-Step Usage

1. Explore sources (optional)

langskills-rai search tavily "Linux journalctl" --limit 20
langskills-rai search github "journalctl" --limit 10

2. Capture skills from a topic

# Basic
langskills-rai capture "journalctl@15"

# Target a specific domain
langskills-rai capture "React hooks@20" --domain web

# All domains
langskills-rai capture "Kubernetes" --all --total 30

@N is shorthand for --total N. The pipeline auto-runs: search → fetch → generate → dedupe → improve → validate.

3. Validate & publish

langskills-rai validate --strict --package
langskills-rai reindex-skills --root skills/by-skill

4. Build bundles & site

langskills-rai build-site
langskills-rai build-bundle --split-by-domain

5. Batch processing (large-scale)

langskills-rai queue-seed                     # seed from config
langskills-rai topics-capture topics/arxiv.txt # or from file
langskills-rai runner                          # start worker
langskills-rai queue-watch                     # monitor

📂 Pipeline Output

captures/<run-id>/
├── manifest.json          # Run metadata
├── sources/               # Fetched evidence per source
├── skills/                # Generated skill packages
│   └── <domain>/<topic>/
│       └── skill.md
└── quality_report.md      # Validation summary

📦 Installation

LangSkills supports Linux, macOS, and Windows. Python 3.10+ required.

Option A: pip install (recommended)

pip install langskills-rai

# Download skill bundles (auto-detect your project type)
langskills-rai bundle-install --auto

# Or install a specific domain
langskills-rai bundle-install --domain linux

# Verify
langskills-rai self-check --skip-remote

Option B: From source (for development / skill generation)

🐧 Linux / 🍎 macOS

git clone https://github.com/LabRAI/LangSkills.git && cd LangSkills
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
playwright install chromium          # optional: Baidu/Zhihu/XHS sources
cp .env.example .env                 # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai self-check --skip-remote

💻 Windows

git clone https://github.com/LabRAI/LangSkills.git && cd LangSkills
python -m venv .venv && .venv\Scripts\activate
pip install -e ".[dev]"
copy .env.example .env               # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai self-check --skip-remote

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes	OpenAI-compatible API key for skill generation
`OPENAI_BASE_URL`	Yes	API base URL (e.g., `https://api.openai.com/v1`)
`OPENAI_MODEL`	No	Model name (default: `gpt-4.1-mini`)
`LLM_PROVIDER`	No	`openai` (default) or `ollama`
`GITHUB_TOKEN`	No	Recommended for GitHub search (avoids rate limits)
`TAVILY_API_KEY`	No	Required for Tavily web search
`LANGSKILLS_WORKDIR`	No	Runtime data directory (default: `var/`)

More variables → Configuration

🤖 AI CLI One-Liner — Auto Setup

Copy the prompt below and paste it into Claude Code / Codex / Cursor / Windsurf — the AI agent will automatically clone, install, configure, and verify LangSkills for you.

Do the following steps in order. Do NOT skip any step.

1. Install langskills-rai from PyPI:
   pip install langskills-rai

2. Auto-detect my project and install matching skill bundles:
   langskills-rai bundle-install --auto

3. Run the self-check to verify everything is working:
   langskills-rai self-check --skip-remote

4. If self-check passes, run a quick smoke test — search the built-in library:
   langskills-rai skill-search "machine learning" --top 3

5. If I want to generate NEW skills (not just search), ask me for my
   OPENAI_API_KEY and OPENAI_BASE_URL, then set them as environment variables.

Done. Report the results of steps 3 and 4.

🦞 OpenClaw Integration

LangSkills is available as an OpenClaw skill — giving any OpenClaw-powered agent access to 101K+ evidence-backed skills.

Install from Claw Hub (coming soon):

clawhub install langskills-search

Manual install — save the block below as ~/.openclaw/skills/langskills-search/SKILL.md:

---
name: langskills-search
version: 0.1.0
description: Search 101K evidence-backed skills from 62K+ papers & 23K+ tech sources
author: LabRAI
tags: [research, skills, knowledge-base, search, evidence]
requires:
  bins: ["python3"]
metadata: {"source": "https://github.com/LabRAI/LangSkills", "license": "MIT", "min_python": "3.10"}
---

# LangSkills Search

Search 101,330 evidence-backed skills covering 62K+ research papers and 23K+ coding/tech sources — all offline via FTS5 SQLite.

## When to Use

- User asks for best practices, how-tos, or techniques on a technical topic
- You need evidence-backed knowledge (not LLM-generated guesses)
- Research tasks that benefit from academic or real-world source citations

## First-Time Setup

```bash
pip install langskills-rai
# Install all bundles (~1 GB) or pick a domain:
langskills-rai bundle-install --auto
```

## Search Command

```bash
langskills-rai skill-search "<query>" [options]
```

### Parameters

| Flag | Description | Default |
|:---|:---|:---|
| `--top N` | Number of results | 5 |
| `--domain <d>` | Filter by domain | all |
| `--min-score N` | Minimum quality score (0-5) | 0 |
| `--content` | Include full skill body | off |
| `--format markdown` | Output as Markdown | text |

### Example

```bash
langskills-rai skill-search "CRISPR gene editing" --domain research --top 3 --content --format markdown
```

## Reading Results

Each result includes: **title**, **domain**, **quality score** (0-5), **source URL**, and optionally the full skill body. Higher scores indicate stronger evidence chains.

## Available Domains

`linux` · `web` · `programming` · `devtools` · `security` · `cloud` · `data` · `ml` · `llm` · `observability` · `research-arxiv` · `research-plos-*` · `research-elife` · `research-other`

## Tips

- Use `--content --format markdown` to get copy-paste-ready skill text
- Combine `--domain` with `--min-score 4.0` for high-quality results
- Run `bundle-install --auto` in a project directory to install only relevant domains

🖥️ CLI Reference

All commands: langskills-rai <command> (or python3 langskills_cli.py <command> from source)

⚡ Core Commands

Command	What It Does
`capture "<topic>@N"`	Full pipeline: discover → fetch → generate → validate `N` skills
`skill-search "<query>"`	Search the local skill library (FTS5 full-text)
`search <engine> "<query>"`	Search URLs via a specific provider (tavily / github / baidu)
`validate --strict --package`	Run quality gates on generated skills
`improve <run-dir>`	Re-improve an existing capture run in place

🔄 Batch Pipelines

Command	What It Does
`runner`	Resumable background worker: queue → generate → publish
`arxiv-pipeline`	arXiv papers: discover → download PDF → generate skills
`journal-pipeline`	Journals: crawl PMC / PLOS / Nature / eLife → generate
`topics-capture <file>`	Enqueue topics from a text file into the persistent queue
`queue-seed`	Auto-seed the queue from config-defined topic lists

📚 Library Management

Command	What It Does
`bundle-install --domain <d>`	Download a pre-built SQLite bundle from GitHub Releases
`bundle-install --auto`	Auto-detect project type and install matching bundles
`build-bundle --split-by-domain`	Build self-contained SQLite bundles from skills/
`build-site`	Generate `dist/index.json` + `dist/index.html`
`reindex-skills`	Rebuild `skills/index.json` from the by-skill directory

🔧 More: Utilities & Diagnostics

Command	What It Does
`self-check --skip-remote`	Local environment sanity check
`auth zhihu\|xhs`	Interactive Playwright login helper
`sources-audit`	Audit source providers (speed, auth, failures)
`auto-pr`	Create a commit/branch and optionally push + open a PR
`queue-stats`	Show queue counts by stage / status / source
`queue-watch`	Live queue stats dashboard (rich)
`queue-gc`	Reclaim expired leases
`repo-index`	Traverse + statically index repo into captures
`repo-query "<query>"`	Evidence-backed search over symbol index
`backfill-package-v2`	Generate missing package v2 files
`backfill-verification`	Ensure Verification sections include fenced code
`backfill-sources`	Backfill `sources/by-id` from existing artifacts

⚙️ Configuration

Master config: config/langskills.json — domains, URL rules, quality gates, license policy.

🤖 LLM & API Keys

Variable	Required	Description
`OPENAI_API_KEY`	Yes	OpenAI-compatible API key for skill generation
`OPENAI_BASE_URL`	Yes	API base URL (e.g., `https://api.openai.com/v1`)
`OPENAI_MODEL`	No	Model name (default: `gpt-4.1-mini`)
`LLM_PROVIDER`	No	`openai` (default) or `ollama`
`OLLAMA_BASE_URL`	No	Ollama server URL
`OLLAMA_MODEL`	No	Ollama model name

🔍 Search & Data Sources

Variable	Required	Description
`TAVILY_API_KEY`	No	Required for Tavily web search
`GITHUB_TOKEN`	No	Recommended for GitHub search (avoids rate limits)
`LANGSKILLS_WEB_SEARCH_PROVIDERS`	No	Comma-separated list (default: `tavily,baidu,zhihu,xhs`)

🎭 Playwright & Auth (optional)

Variable	Description
`LANGSKILLS_PLAYWRIGHT_HEADLESS`	`0` (visible browser) or `1` (headless, default)
`LANGSKILLS_PLAYWRIGHT_USER_DATA_DIR`	Custom Chromium user data directory
`LANGSKILLS_PLAYWRIGHT_AUTH_DIR`	Auth state dir (default: `var/runs/playwright_auth`)
`LANGSKILLS_ZHIHU_LOGIN_TYPE`	`qrcode` or `cookie`
`LANGSKILLS_ZHIHU_COOKIES`	Zhihu cookie string (when login type = `cookie`)
`LANGSKILLS_XHS_LOGIN_TYPE`	`qrcode`, `cookie`, or `phone`
`LANGSKILLS_XHS_COOKIES`	XHS cookie string (when login type = `cookie`)

Zhihu and XHS support is limited due to platform restrictions; full coverage in a future release.

📁 Project Structure

🎯 Core System

Module	Description
`langskills_cli.py`	CLI entry point (auto-detects venv)
`core/cli.py`	All CLI commands & arg parsing
`core/config.py`	Configuration management
`core/search.py`	Multi-provider search orchestration
`core/domain_config.py`	Domain rules & classification
`core/detect_project.py`	Auto-detect project type

🤖 LLM Backends (core/llm/)

Module	Description
`openai_client.py`	OpenAI-compatible client
`ollama_client.py`	Ollama local model client
`factory.py`	Client factory & routing
`base.py`	Base LLM interface

🌐 Source Providers (core/sources/)

Module	Description
`web_search.py`	Tavily web search
`github.py`	GitHub repository search
`stackoverflow.py`	StackOverflow Q&A
`arxiv.py`	arXiv paper fetcher
`baidu.py`	Baidu search (Playwright)
`zhihu.py`	Zhihu (Playwright)
`xhs.py`	XHS / RedNote (Playwright)
`journals/`	PMC, PLOS, Nature, eLife

📦 Data & Output

Directory	Description
`skills/by-skill/`	Published skills by domain/topic
`skills/by-source/`	Published skills by source
`dist/`	Pre-built SQLite bundles + site
`captures/`	Per-run capture artifacts
`config/`	Master config + schedules

🤝 Contributing

Contributions are welcome! Please follow these steps:

Open an issue to discuss the proposed change
Fork the repository and create your feature branch
Submit a pull request with a clear description

📄 License

This project is licensed under the MIT License.

🙏 Credits

Authors: Tianming Sha (Stony Brook University), Dr. Yue Zhao (University of Southern California), Dr. Lichao Sun (Lehigh University), Dr. Yushun Dong (Florida State University)
Design: Modular pipeline architecture with multi-source intelligence, built for extensibility and offline-first search
Skills: 101,330 evidence-backed skills generated from 62K+ papers and 23K+ tech sources via LLM-powered quality gates
Sources: Every skill traces to real web pages, academic papers, or code repositories (arXiv, PMC, PLOS, Nature, eLife, GitHub, etc.)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

shatianming

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langskills_rai-0.1.0.tar.gz (274.9 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langskills_rai-0.1.0-py3-none-any.whl (327.9 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file langskills_rai-0.1.0.tar.gz.

File metadata

Download URL: langskills_rai-0.1.0.tar.gz
Upload date: Mar 4, 2026
Size: 274.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langskills_rai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9c25d3a3c58963b4167221a3365e542f89f3b9710a3c671987270524d176e6e0`
MD5	`9044ff70efcbfd11319f2463aa7baa3a`
BLAKE2b-256	`271ea12f5d559d1bb749d70ae0e33a0247929c73c7a39665a0a74306f27fd037`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langskills_rai-0.1.0.tar.gz:

Publisher: publish.yml on LabRAI/LangSkills

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langskills_rai-0.1.0.tar.gz
- Subject digest: 9c25d3a3c58963b4167221a3365e542f89f3b9710a3c671987270524d176e6e0
- Sigstore transparency entry: 1024706004
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: LabRAI/LangSkills@cea13266bde4e3482016b6249a9b22f1509010eb
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/LabRAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cea13266bde4e3482016b6249a9b22f1509010eb
- Trigger Event: push

File details

Details for the file langskills_rai-0.1.0-py3-none-any.whl.

File metadata

Download URL: langskills_rai-0.1.0-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 327.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langskills_rai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87e7038e07990b343fc8346b055ea6b264ebced8d902aab34effb868adb048a6`
MD5	`6645c325b7d4cd8d30456247b8ce2d2b`
BLAKE2b-256	`cf64fb45c5ac0a677e8fe016515b1754f2555a3d19880315f97b10dd47683e14`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langskills_rai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on LabRAI/LangSkills

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langskills_rai-0.1.0-py3-none-any.whl
- Subject digest: 87e7038e07990b343fc8346b055ea6b264ebced8d902aab34effb868adb048a6
- Sigstore transparency entry: 1024706079
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: LabRAI/LangSkills@cea13266bde4e3482016b6249a9b22f1509010eb
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/LabRAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cea13266bde4e3482016b6249a9b22f1509010eb
- Trigger Event: push

langskills-rai 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LangSkills: Evidence-Backed Skills for Vibe Research & Vibe Coding

📄 101K Skills from 62K+ Papers & 23K+ Tech Sources — Search, Generate, Reuse

📰 News

✨ Key Features

🚀 Quick Start

📄 The Skill Library

🔧 The Pipeline

📦 Installation

Option A: pip install (recommended)

Option B: From source (for development / skill generation)

🤖 AI CLI One-Liner — Auto Setup

🦞 OpenClaw Integration

🖥️ CLI Reference

⚙️ Configuration

📁 Project Structure

🤝 Contributing

📄 License

🙏 Credits

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance