Turn any topic into an evidence-backed skills library โ automatically
Project description
LangSkills: Evidence-Backed Skills for Vibe Research & Vibe Coding
๐ LangSkills โ Evidence-Backed Skills for AI Agents
๐ 101K Skills from 62K+ Papers & 23K+ Tech Sources โ Search, Generate, Reuse
Quick Start ยท Skill Library ยท Pipeline ยท Installation ยท OpenClaw ยท CLI Reference ยท Configuration
๐ฐ News
- 2026-02-28 โ v0.1.0: 101,330 skills across 21 domain bundles officially released
- 2026-02-27 โ Pre-built SQLite bundles with FTS5 full-text search ready for download
- 2026-02-27 โ Journal pipeline online: PMC, PLOS, Nature, eLife, arXiv full coverage
โจ Key Features
-
๐ Massive Pre-Built Skill Library: 101,330 evidence-backed skills covering 62K+ research papers and 23K+ coding/tech sources โ all searchable offline via FTS5-powered SQLite bundles.
-
๐ง Fully Automated Skill Pipeline: Give it a topic โ it discovers sources โ fetches & extracts text โ generates skills with an LLM โ validates quality โ publishes. One command, zero manual work.
-
๐ฌ Evidence-First, Never Hallucination-Only: Every skill traces back to real web pages, academic papers, or code repositories with full provenance chains โ metadata, quality scores, and source links included.
-
๐ Multi-Source Intelligence: Integrates Tavily, GitHub, Baidu, Zhihu, XHS, StackOverflow, arXiv, PMC, PLOS, Nature, eLife โ 10+ data source providers for comprehensive coverage.
-
๐ง LLM-Powered Quality Gates: Each skill is generated, validated, and scored by LLMs with configurable quality thresholds โ ensuring high-signal, low-noise output at scale.
-
โก Drop-In Reusability: Download domain-specific SQLite bundles,
skill-searchany keyword, and get structured Markdown ready to feed into any AI agent, RAG pipeline, or knowledge base. -
๐๏ธ Extensible Architecture: Modular source providers, LLM backends (OpenAI / Ollama), queue-based batch processing, and configurable domain rules โ built to scale.
-
๐ฆ 21 Domain Bundles: From Linux sysadmin to PLOS biology, from web development to machine learning โ organized, versioned, and individually installable.
๐ Quick Start
pip install langskills-rai
# Auto-detect your project and install only matching bundles (~50-200 MB)
langskills-rai bundle-install --auto
# Search the pre-built skill library (Vibe Research)
langskills-rai skill-search "kubernetes networking" --top 5
# Generate new skills from any topic (Vibe Coding)
cp .env.example .env # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai capture "Docker networking@15"
Full setup details โ Installation
๐ The Skill Library
62,582 research skills distilled from academic papers + 23,765 coding/tech skills from GitHub, StackOverflow, and the web โ all searchable offline.
| Domain | Skills | Sources |
|---|---|---|
| ๐ research-plos-* | 35,505 | PLOS ONE, Biology, CompBio, Medicine, Genetics, NTD, Pathogens |
| ๐ research-arxiv | 3,483 | arXiv papers |
| ๐ research-elife | 391 | eLife journal |
| ๐ research-other | 23,203 | Other academic sources |
| ๐ป linux | 7,455 | Linux / sysadmin |
| ๐ป web | 6,029 | Web development |
| ๐ป programming | 4,071 | General programming |
| ๐ป devtools | 2,243 | Developer tools |
| ๐ป security | 1,182 | Security |
| ๐ป cloud / data / ml / llm / observability | 2,785 | Infra & ML |
| ๐๏ธ other | 14,983 | Uncategorized |
| 101,330 | 21 SQLite bundles |
๐ How to Use the Library
# Install a domain bundle (downloads from GitHub Releases)
langskills-rai bundle-install --domain linux
# Or auto-detect your project type and install matching bundles
langskills-rai bundle-install --auto
# Search skills offline (FTS5 full-text search)
langskills-rai skill-search "container orchestration" --top 10
# Filter by domain and minimum quality score
langskills-rai skill-search "CRISPR" --domain research --min-score 4.0
# Get full skill content as Markdown
langskills-rai skill-search "React hooks" --content --format markdown
๐ฆ Skill Package Structure
Each skill is a structured Markdown package with full traceability:
skills/by-skill/<domain>/<topic>/
โโโ skill.md # The skill content (tutorial / how-to / protocol)
โโโ metadata.yaml # Provenance, tags, quality score, LLM model used
โโโ source.json # Evidence trail back to original web/paper source
Every skill traces to real sources โ never hallucination-only.
๐ง The Pipeline
๐ Step-by-Step Usage
1. Explore sources (optional)
langskills-rai search tavily "Linux journalctl" --limit 20
langskills-rai search github "journalctl" --limit 10
2. Capture skills from a topic
# Basic
langskills-rai capture "journalctl@15"
# Target a specific domain
langskills-rai capture "React hooks@20" --domain web
# All domains
langskills-rai capture "Kubernetes" --all --total 30
@Nis shorthand for--total N. The pipeline auto-runs: search โ fetch โ generate โ dedupe โ improve โ validate.
3. Validate & publish
langskills-rai validate --strict --package
langskills-rai reindex-skills --root skills/by-skill
4. Build bundles & site
langskills-rai build-site
langskills-rai build-bundle --split-by-domain
5. Batch processing (large-scale)
langskills-rai queue-seed # seed from config
langskills-rai topics-capture topics/arxiv.txt # or from file
langskills-rai runner # start worker
langskills-rai queue-watch # monitor
๐ Pipeline Output
captures/<run-id>/
โโโ manifest.json # Run metadata
โโโ sources/ # Fetched evidence per source
โโโ skills/ # Generated skill packages
โ โโโ <domain>/<topic>/
โ โโโ skill.md
โโโ quality_report.md # Validation summary
๐ฆ Installation
LangSkills supports Linux, macOS, and Windows. Python 3.10+ required.
Option A: pip install (recommended)
pip install langskills-rai
# Download skill bundles (auto-detect your project type)
langskills-rai bundle-install --auto
# Or install a specific domain
langskills-rai bundle-install --domain linux
# Verify
langskills-rai self-check --skip-remote
Option B: From source (for development / skill generation)
๐ง Linux / ๐ macOS
git clone https://github.com/LabRAI/LangSkills.git && cd LangSkills
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
playwright install chromium # optional: Baidu/Zhihu/XHS sources
cp .env.example .env # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai self-check --skip-remote
๐ป Windows
git clone https://github.com/LabRAI/LangSkills.git && cd LangSkills
python -m venv .venv && .venv\Scripts\activate
pip install -e ".[dev]"
copy .env.example .env # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai self-check --skip-remote
Environment Variables
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI-compatible API key for skill generation |
OPENAI_BASE_URL |
Yes | API base URL (e.g., https://api.openai.com/v1) |
OPENAI_MODEL |
No | Model name (default: gpt-4.1-mini) |
LLM_PROVIDER |
No | openai (default) or ollama |
GITHUB_TOKEN |
No | Recommended for GitHub search (avoids rate limits) |
TAVILY_API_KEY |
No | Required for Tavily web search |
LANGSKILLS_WORKDIR |
No | Runtime data directory (default: var/) |
More variables โ Configuration
๐ค AI CLI One-Liner โ Auto Setup
Copy the prompt below and paste it into Claude Code / Codex / Cursor / Windsurf โ the AI agent will automatically clone, install, configure, and verify LangSkills for you.
Do the following steps in order. Do NOT skip any step.
1. Install langskills-rai from PyPI:
pip install langskills-rai
2. Auto-detect my project and install matching skill bundles:
langskills-rai bundle-install --auto
3. Run the self-check to verify everything is working:
langskills-rai self-check --skip-remote
4. If self-check passes, run a quick smoke test โ search the built-in library:
langskills-rai skill-search "machine learning" --top 3
5. If I want to generate NEW skills (not just search), ask me for my
OPENAI_API_KEY and OPENAI_BASE_URL, then set them as environment variables.
Done. Report the results of steps 3 and 4.
๐ฆ OpenClaw Integration
LangSkills is available as an OpenClaw skill โ giving any OpenClaw-powered agent access to 101K+ evidence-backed skills.
Install from Claw Hub (coming soon):
clawhub install langskills-search
Manual install โ save the block below as ~/.openclaw/skills/langskills-search/SKILL.md:
---
name: langskills-search
version: 0.1.0
description: Search 101K evidence-backed skills from 62K+ papers & 23K+ tech sources
author: LabRAI
tags: [research, skills, knowledge-base, search, evidence]
requires:
bins: ["python3"]
metadata: {"source": "https://github.com/LabRAI/LangSkills", "license": "MIT", "min_python": "3.10"}
---
# LangSkills Search
Search 101,330 evidence-backed skills covering 62K+ research papers and 23K+ coding/tech sources โ all offline via FTS5 SQLite.
## When to Use
- User asks for best practices, how-tos, or techniques on a technical topic
- You need evidence-backed knowledge (not LLM-generated guesses)
- Research tasks that benefit from academic or real-world source citations
## First-Time Setup
```bash
pip install langskills-rai
# Install all bundles (~1 GB) or pick a domain:
langskills-rai bundle-install --auto
```
## Search Command
```bash
langskills-rai skill-search "<query>" [options]
```
### Parameters
| Flag | Description | Default |
|:---|:---|:---|
| `--top N` | Number of results | 5 |
| `--domain <d>` | Filter by domain | all |
| `--min-score N` | Minimum quality score (0-5) | 0 |
| `--content` | Include full skill body | off |
| `--format markdown` | Output as Markdown | text |
### Example
```bash
langskills-rai skill-search "CRISPR gene editing" --domain research --top 3 --content --format markdown
```
## Reading Results
Each result includes: **title**, **domain**, **quality score** (0-5), **source URL**, and optionally the full skill body. Higher scores indicate stronger evidence chains.
## Available Domains
`linux` ยท `web` ยท `programming` ยท `devtools` ยท `security` ยท `cloud` ยท `data` ยท `ml` ยท `llm` ยท `observability` ยท `research-arxiv` ยท `research-plos-*` ยท `research-elife` ยท `research-other`
## Tips
- Use `--content --format markdown` to get copy-paste-ready skill text
- Combine `--domain` with `--min-score 4.0` for high-quality results
- Run `bundle-install --auto` in a project directory to install only relevant domains
๐ฅ๏ธ CLI Reference
All commands:
langskills-rai <command>(orpython3 langskills_cli.py <command>from source)
โก Core Commands
| Command | What It Does |
|---|---|
capture "<topic>@N" |
Full pipeline: discover โ fetch โ generate โ validate N skills |
skill-search "<query>" |
Search the local skill library (FTS5 full-text) |
search <engine> "<query>" |
Search URLs via a specific provider (tavily / github / baidu) |
validate --strict --package |
Run quality gates on generated skills |
improve <run-dir> |
Re-improve an existing capture run in place |
๐ Batch Pipelines
| Command | What It Does |
|---|---|
runner |
Resumable background worker: queue โ generate โ publish |
arxiv-pipeline |
arXiv papers: discover โ download PDF โ generate skills |
journal-pipeline |
Journals: crawl PMC / PLOS / Nature / eLife โ generate |
topics-capture <file> |
Enqueue topics from a text file into the persistent queue |
queue-seed |
Auto-seed the queue from config-defined topic lists |
๐ Library Management
| Command | What It Does |
|---|---|
bundle-install --domain <d> |
Download a pre-built SQLite bundle from GitHub Releases |
bundle-install --auto |
Auto-detect project type and install matching bundles |
build-bundle --split-by-domain |
Build self-contained SQLite bundles from skills/ |
build-site |
Generate dist/index.json + dist/index.html |
reindex-skills |
Rebuild skills/index.json from the by-skill directory |
๐ง More: Utilities & Diagnostics
| Command | What It Does |
|---|---|
self-check --skip-remote |
Local environment sanity check |
auth zhihu|xhs |
Interactive Playwright login helper |
sources-audit |
Audit source providers (speed, auth, failures) |
auto-pr |
Create a commit/branch and optionally push + open a PR |
queue-stats |
Show queue counts by stage / status / source |
queue-watch |
Live queue stats dashboard (rich) |
queue-gc |
Reclaim expired leases |
repo-index |
Traverse + statically index repo into captures |
repo-query "<query>" |
Evidence-backed search over symbol index |
backfill-package-v2 |
Generate missing package v2 files |
backfill-verification |
Ensure Verification sections include fenced code |
backfill-sources |
Backfill sources/by-id from existing artifacts |
โ๏ธ Configuration
Master config: config/langskills.json โ domains, URL rules, quality gates, license policy.
๐ค LLM & API Keys
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI-compatible API key for skill generation |
OPENAI_BASE_URL |
Yes | API base URL (e.g., https://api.openai.com/v1) |
OPENAI_MODEL |
No | Model name (default: gpt-4.1-mini) |
LLM_PROVIDER |
No | openai (default) or ollama |
OLLAMA_BASE_URL |
No | Ollama server URL |
OLLAMA_MODEL |
No | Ollama model name |
๐ Search & Data Sources
| Variable | Required | Description |
|---|---|---|
TAVILY_API_KEY |
No | Required for Tavily web search |
GITHUB_TOKEN |
No | Recommended for GitHub search (avoids rate limits) |
LANGSKILLS_WEB_SEARCH_PROVIDERS |
No | Comma-separated list (default: tavily,baidu,zhihu,xhs) |
๐ญ Playwright & Auth (optional)
| Variable | Description |
|---|---|
LANGSKILLS_PLAYWRIGHT_HEADLESS |
0 (visible browser) or 1 (headless, default) |
LANGSKILLS_PLAYWRIGHT_USER_DATA_DIR |
Custom Chromium user data directory |
LANGSKILLS_PLAYWRIGHT_AUTH_DIR |
Auth state dir (default: var/runs/playwright_auth) |
LANGSKILLS_ZHIHU_LOGIN_TYPE |
qrcode or cookie |
LANGSKILLS_ZHIHU_COOKIES |
Zhihu cookie string (when login type = cookie) |
LANGSKILLS_XHS_LOGIN_TYPE |
qrcode, cookie, or phone |
LANGSKILLS_XHS_COOKIES |
XHS cookie string (when login type = cookie) |
Zhihu and XHS support is limited due to platform restrictions; full coverage in a future release.
๐ Project Structure
๐ฏ Core System
| Module | Description |
|---|---|
langskills_cli.py |
CLI entry point (auto-detects venv) |
core/cli.py |
All CLI commands & arg parsing |
core/config.py |
Configuration management |
core/search.py |
Multi-provider search orchestration |
core/domain_config.py |
Domain rules & classification |
core/detect_project.py |
Auto-detect project type |
๐ค LLM Backends (core/llm/)
| Module | Description |
|---|---|
openai_client.py |
OpenAI-compatible client |
ollama_client.py |
Ollama local model client |
factory.py |
Client factory & routing |
base.py |
Base LLM interface |
๐ Source Providers (core/sources/)
| Module | Description |
|---|---|
web_search.py |
Tavily web search |
github.py |
GitHub repository search |
stackoverflow.py |
StackOverflow Q&A |
arxiv.py |
arXiv paper fetcher |
baidu.py |
Baidu search (Playwright) |
zhihu.py |
Zhihu (Playwright) |
xhs.py |
XHS / RedNote (Playwright) |
journals/ |
PMC, PLOS, Nature, eLife |
๐ฆ Data & Output
| Directory | Description |
|---|---|
skills/by-skill/ |
Published skills by domain/topic |
skills/by-source/ |
Published skills by source |
dist/ |
Pre-built SQLite bundles + site |
captures/ |
Per-run capture artifacts |
config/ |
Master config + schedules |
๐ค Contributing
Contributions are welcome! Please follow these steps:
- Open an issue to discuss the proposed change
- Fork the repository and create your feature branch
- Submit a pull request with a clear description
๐ License
This project is licensed under the MIT License.
Copyright (c) 2026 Responsible AI (RAI) Lab @ Florida State University
๐ Credits
- Authors: Tianming Sha (Stony Brook University), Dr. Yue Zhao (University of Southern California), Dr. Lichao Sun (Lehigh University), Dr. Yushun Dong (Florida State University)
- Design: Modular pipeline architecture with multi-source intelligence, built for extensibility and offline-first search
- Skills: 101,330 evidence-backed skills generated from 62K+ papers and 23K+ tech sources via LLM-powered quality gates
- Sources: Every skill traces to real web pages, academic papers, or code repositories (arXiv, PMC, PLOS, Nature, eLife, GitHub, etc.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langskills_rai-0.1.0.tar.gz.
File metadata
- Download URL: langskills_rai-0.1.0.tar.gz
- Upload date:
- Size: 274.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c25d3a3c58963b4167221a3365e542f89f3b9710a3c671987270524d176e6e0
|
|
| MD5 |
9044ff70efcbfd11319f2463aa7baa3a
|
|
| BLAKE2b-256 |
271ea12f5d559d1bb749d70ae0e33a0247929c73c7a39665a0a74306f27fd037
|
Provenance
The following attestation bundles were made for langskills_rai-0.1.0.tar.gz:
Publisher:
publish.yml on LabRAI/LangSkills
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langskills_rai-0.1.0.tar.gz -
Subject digest:
9c25d3a3c58963b4167221a3365e542f89f3b9710a3c671987270524d176e6e0 - Sigstore transparency entry: 1024706004
- Sigstore integration time:
-
Permalink:
LabRAI/LangSkills@cea13266bde4e3482016b6249a9b22f1509010eb -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/LabRAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cea13266bde4e3482016b6249a9b22f1509010eb -
Trigger Event:
push
-
Statement type:
File details
Details for the file langskills_rai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langskills_rai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 327.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87e7038e07990b343fc8346b055ea6b264ebced8d902aab34effb868adb048a6
|
|
| MD5 |
6645c325b7d4cd8d30456247b8ce2d2b
|
|
| BLAKE2b-256 |
cf64fb45c5ac0a677e8fe016515b1754f2555a3d19880315f97b10dd47683e14
|
Provenance
The following attestation bundles were made for langskills_rai-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on LabRAI/LangSkills
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langskills_rai-0.1.0-py3-none-any.whl -
Subject digest:
87e7038e07990b343fc8346b055ea6b264ebced8d902aab34effb868adb048a6 - Sigstore transparency entry: 1024706079
- Sigstore integration time:
-
Permalink:
LabRAI/LangSkills@cea13266bde4e3482016b6249a9b22f1509010eb -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/LabRAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cea13266bde4e3482016b6249a9b22f1509010eb -
Trigger Event:
push
-
Statement type: