Skip to main content

Turn any topic into an evidence-backed skills library โ€” automatically

Project description

LangSkills: Evidence-Backed Skills for Vibe Research & Vibe Coding

PyPI Downloads Python 3.10+ License: MIT GitHub stars Skills: 101k+ Bundles: 21 Papers: 62k+

๐ŸŒ LangSkills โ€” Evidence-Backed Skills for AI Agents

๐Ÿ“„ 101K Skills from 62K+ Papers & 23K+ Tech Sources โ€” Search, Generate, Reuse

Quick Start ยท Skill Library ยท Pipeline ยท Installation ยท OpenClaw ยท CLI Reference ยท Configuration


๐Ÿ“ฐ News

  • 2026-02-28 โ€” v0.1.0: 101,330 skills across 21 domain bundles officially released
  • 2026-02-27 โ€” Pre-built SQLite bundles with FTS5 full-text search ready for download
  • 2026-02-27 โ€” Journal pipeline online: PMC, PLOS, Nature, eLife, arXiv full coverage

โœจ Key Features

  • ๐Ÿ“š Massive Pre-Built Skill Library: 101,330 evidence-backed skills covering 62K+ research papers and 23K+ coding/tech sources โ€” all searchable offline via FTS5-powered SQLite bundles.

  • ๐Ÿ”ง Fully Automated Skill Pipeline: Give it a topic โ†’ it discovers sources โ†’ fetches & extracts text โ†’ generates skills with an LLM โ†’ validates quality โ†’ publishes. One command, zero manual work.

  • ๐Ÿ”ฌ Evidence-First, Never Hallucination-Only: Every skill traces back to real web pages, academic papers, or code repositories with full provenance chains โ€” metadata, quality scores, and source links included.

  • ๐ŸŒ Multi-Source Intelligence: Integrates Tavily, GitHub, Baidu, Zhihu, XHS, StackOverflow, arXiv, PMC, PLOS, Nature, eLife โ€” 10+ data source providers for comprehensive coverage.

  • ๐Ÿง  LLM-Powered Quality Gates: Each skill is generated, validated, and scored by LLMs with configurable quality thresholds โ€” ensuring high-signal, low-noise output at scale.

  • โšก Drop-In Reusability: Download domain-specific SQLite bundles, skill-search any keyword, and get structured Markdown ready to feed into any AI agent, RAG pipeline, or knowledge base.

  • ๐Ÿ—๏ธ Extensible Architecture: Modular source providers, LLM backends (OpenAI / Ollama), queue-based batch processing, and configurable domain rules โ€” built to scale.

  • ๐Ÿ“ฆ 21 Domain Bundles: From Linux sysadmin to PLOS biology, from web development to machine learning โ€” organized, versioned, and individually installable.


๐Ÿš€ Quick Start

pip install langskills-rai

# Auto-detect your project and install only matching bundles (~50-200 MB)
langskills-rai bundle-install --auto

# Search the pre-built skill library (Vibe Research)
langskills-rai skill-search "kubernetes networking" --top 5

# Generate new skills from any topic (Vibe Coding)
cp .env.example .env   # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai capture "Docker networking@15"

Full setup details โ†’ Installation


๐Ÿ“„ The Skill Library

62,582 research skills distilled from academic papers + 23,765 coding/tech skills from GitHub, StackOverflow, and the web โ€” all searchable offline.

Domain Skills Sources
๐Ÿ“„ research-plos-* 35,505 PLOS ONE, Biology, CompBio, Medicine, Genetics, NTD, Pathogens
๐Ÿ“„ research-arxiv 3,483 arXiv papers
๐Ÿ“„ research-elife 391 eLife journal
๐Ÿ“„ research-other 23,203 Other academic sources
๐Ÿ’ป linux 7,455 Linux / sysadmin
๐Ÿ’ป web 6,029 Web development
๐Ÿ’ป programming 4,071 General programming
๐Ÿ’ป devtools 2,243 Developer tools
๐Ÿ’ป security 1,182 Security
๐Ÿ’ป cloud / data / ml / llm / observability 2,785 Infra & ML
๐Ÿ—‚๏ธ other 14,983 Uncategorized
101,330 21 SQLite bundles
๐Ÿ” How to Use the Library
# Install a domain bundle (downloads from GitHub Releases)
langskills-rai bundle-install --domain linux

# Or auto-detect your project type and install matching bundles
langskills-rai bundle-install --auto

# Search skills offline (FTS5 full-text search)
langskills-rai skill-search "container orchestration" --top 10

# Filter by domain and minimum quality score
langskills-rai skill-search "CRISPR" --domain research --min-score 4.0

# Get full skill content as Markdown
langskills-rai skill-search "React hooks" --content --format markdown
๐Ÿ“ฆ Skill Package Structure

Each skill is a structured Markdown package with full traceability:

skills/by-skill/<domain>/<topic>/
โ”œโ”€โ”€ skill.md          # The skill content (tutorial / how-to / protocol)
โ”œโ”€โ”€ metadata.yaml     # Provenance, tags, quality score, LLM model used
โ””โ”€โ”€ source.json       # Evidence trail back to original web/paper source

Every skill traces to real sources โ€” never hallucination-only.


๐Ÿ”ง The Pipeline

๐Ÿ“‹ Step-by-Step Usage

1. Explore sources (optional)

langskills-rai search tavily "Linux journalctl" --limit 20
langskills-rai search github "journalctl" --limit 10

2. Capture skills from a topic

# Basic
langskills-rai capture "journalctl@15"

# Target a specific domain
langskills-rai capture "React hooks@20" --domain web

# All domains
langskills-rai capture "Kubernetes" --all --total 30

@N is shorthand for --total N. The pipeline auto-runs: search โ†’ fetch โ†’ generate โ†’ dedupe โ†’ improve โ†’ validate.

3. Validate & publish

langskills-rai validate --strict --package
langskills-rai reindex-skills --root skills/by-skill

4. Build bundles & site

langskills-rai build-site
langskills-rai build-bundle --split-by-domain

5. Batch processing (large-scale)

langskills-rai queue-seed                     # seed from config
langskills-rai topics-capture topics/arxiv.txt # or from file
langskills-rai runner                          # start worker
langskills-rai queue-watch                     # monitor
๐Ÿ“‚ Pipeline Output
captures/<run-id>/
โ”œโ”€โ”€ manifest.json          # Run metadata
โ”œโ”€โ”€ sources/               # Fetched evidence per source
โ”œโ”€โ”€ skills/                # Generated skill packages
โ”‚   โ””โ”€โ”€ <domain>/<topic>/
โ”‚       โ””โ”€โ”€ skill.md
โ””โ”€โ”€ quality_report.md      # Validation summary

๐Ÿ“ฆ Installation

LangSkills supports Linux, macOS, and Windows. Python 3.10+ required.

Option A: pip install (recommended)

pip install langskills-rai

# Download skill bundles (auto-detect your project type)
langskills-rai bundle-install --auto

# Or install a specific domain
langskills-rai bundle-install --domain linux

# Verify
langskills-rai self-check --skip-remote

Option B: From source (for development / skill generation)

๐Ÿง Linux / ๐ŸŽ macOS
git clone https://github.com/LabRAI/LangSkills.git && cd LangSkills
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
playwright install chromium          # optional: Baidu/Zhihu/XHS sources
cp .env.example .env                 # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai self-check --skip-remote
๐Ÿ’ป Windows
git clone https://github.com/LabRAI/LangSkills.git && cd LangSkills
python -m venv .venv && .venv\Scripts\activate
pip install -e ".[dev]"
copy .env.example .env               # fill OPENAI_API_KEY + OPENAI_BASE_URL
langskills-rai self-check --skip-remote
Environment Variables
Variable Required Description
OPENAI_API_KEY Yes OpenAI-compatible API key for skill generation
OPENAI_BASE_URL Yes API base URL (e.g., https://api.openai.com/v1)
OPENAI_MODEL No Model name (default: gpt-4.1-mini)
LLM_PROVIDER No openai (default) or ollama
GITHUB_TOKEN No Recommended for GitHub search (avoids rate limits)
TAVILY_API_KEY No Required for Tavily web search
LANGSKILLS_WORKDIR No Runtime data directory (default: var/)

More variables โ†’ Configuration


๐Ÿค– AI CLI One-Liner โ€” Auto Setup

Copy the prompt below and paste it into Claude Code / Codex / Cursor / Windsurf โ€” the AI agent will automatically clone, install, configure, and verify LangSkills for you.

Do the following steps in order. Do NOT skip any step.

1. Install langskills-rai from PyPI:
   pip install langskills-rai

2. Auto-detect my project and install matching skill bundles:
   langskills-rai bundle-install --auto

3. Run the self-check to verify everything is working:
   langskills-rai self-check --skip-remote

4. If self-check passes, run a quick smoke test โ€” search the built-in library:
   langskills-rai skill-search "machine learning" --top 3

5. If I want to generate NEW skills (not just search), ask me for my
   OPENAI_API_KEY and OPENAI_BASE_URL, then set them as environment variables.

Done. Report the results of steps 3 and 4.

๐Ÿฆž OpenClaw Integration

LangSkills is available as an OpenClaw skill โ€” giving any OpenClaw-powered agent access to 101K+ evidence-backed skills.

Install from Claw Hub (coming soon):

clawhub install langskills-search

Manual install โ€” save the block below as ~/.openclaw/skills/langskills-search/SKILL.md:

---
name: langskills-search
version: 0.1.0
description: Search 101K evidence-backed skills from 62K+ papers & 23K+ tech sources
author: LabRAI
tags: [research, skills, knowledge-base, search, evidence]
requires:
  bins: ["python3"]
metadata: {"source": "https://github.com/LabRAI/LangSkills", "license": "MIT", "min_python": "3.10"}
---

# LangSkills Search

Search 101,330 evidence-backed skills covering 62K+ research papers and 23K+ coding/tech sources โ€” all offline via FTS5 SQLite.

## When to Use

- User asks for best practices, how-tos, or techniques on a technical topic
- You need evidence-backed knowledge (not LLM-generated guesses)
- Research tasks that benefit from academic or real-world source citations

## First-Time Setup

```bash
pip install langskills-rai
# Install all bundles (~1 GB) or pick a domain:
langskills-rai bundle-install --auto
```

## Search Command

```bash
langskills-rai skill-search "<query>" [options]
```

### Parameters

| Flag | Description | Default |
|:---|:---|:---|
| `--top N` | Number of results | 5 |
| `--domain <d>` | Filter by domain | all |
| `--min-score N` | Minimum quality score (0-5) | 0 |
| `--content` | Include full skill body | off |
| `--format markdown` | Output as Markdown | text |

### Example

```bash
langskills-rai skill-search "CRISPR gene editing" --domain research --top 3 --content --format markdown
```

## Reading Results

Each result includes: **title**, **domain**, **quality score** (0-5), **source URL**, and optionally the full skill body. Higher scores indicate stronger evidence chains.

## Available Domains

`linux` ยท `web` ยท `programming` ยท `devtools` ยท `security` ยท `cloud` ยท `data` ยท `ml` ยท `llm` ยท `observability` ยท `research-arxiv` ยท `research-plos-*` ยท `research-elife` ยท `research-other`

## Tips

- Use `--content --format markdown` to get copy-paste-ready skill text
- Combine `--domain` with `--min-score 4.0` for high-quality results
- Run `bundle-install --auto` in a project directory to install only relevant domains

๐Ÿ–ฅ๏ธ CLI Reference

All commands: langskills-rai <command> (or python3 langskills_cli.py <command> from source)

โšก Core Commands
Command What It Does
capture "<topic>@N" Full pipeline: discover โ†’ fetch โ†’ generate โ†’ validate N skills
skill-search "<query>" Search the local skill library (FTS5 full-text)
search <engine> "<query>" Search URLs via a specific provider (tavily / github / baidu)
validate --strict --package Run quality gates on generated skills
improve <run-dir> Re-improve an existing capture run in place
๐Ÿ”„ Batch Pipelines
Command What It Does
runner Resumable background worker: queue โ†’ generate โ†’ publish
arxiv-pipeline arXiv papers: discover โ†’ download PDF โ†’ generate skills
journal-pipeline Journals: crawl PMC / PLOS / Nature / eLife โ†’ generate
topics-capture <file> Enqueue topics from a text file into the persistent queue
queue-seed Auto-seed the queue from config-defined topic lists
๐Ÿ“š Library Management
Command What It Does
bundle-install --domain <d> Download a pre-built SQLite bundle from GitHub Releases
bundle-install --auto Auto-detect project type and install matching bundles
build-bundle --split-by-domain Build self-contained SQLite bundles from skills/
build-site Generate dist/index.json + dist/index.html
reindex-skills Rebuild skills/index.json from the by-skill directory
๐Ÿ”ง More: Utilities & Diagnostics
Command What It Does
self-check --skip-remote Local environment sanity check
auth zhihu|xhs Interactive Playwright login helper
sources-audit Audit source providers (speed, auth, failures)
auto-pr Create a commit/branch and optionally push + open a PR
queue-stats Show queue counts by stage / status / source
queue-watch Live queue stats dashboard (rich)
queue-gc Reclaim expired leases
repo-index Traverse + statically index repo into captures
repo-query "<query>" Evidence-backed search over symbol index
backfill-package-v2 Generate missing package v2 files
backfill-verification Ensure Verification sections include fenced code
backfill-sources Backfill sources/by-id from existing artifacts

โš™๏ธ Configuration

Master config: config/langskills.json โ€” domains, URL rules, quality gates, license policy.

๐Ÿค– LLM & API Keys
Variable Required Description
OPENAI_API_KEY Yes OpenAI-compatible API key for skill generation
OPENAI_BASE_URL Yes API base URL (e.g., https://api.openai.com/v1)
OPENAI_MODEL No Model name (default: gpt-4.1-mini)
LLM_PROVIDER No openai (default) or ollama
OLLAMA_BASE_URL No Ollama server URL
OLLAMA_MODEL No Ollama model name
๐Ÿ” Search & Data Sources
Variable Required Description
TAVILY_API_KEY No Required for Tavily web search
GITHUB_TOKEN No Recommended for GitHub search (avoids rate limits)
LANGSKILLS_WEB_SEARCH_PROVIDERS No Comma-separated list (default: tavily,baidu,zhihu,xhs)
๐ŸŽญ Playwright & Auth (optional)
Variable Description
LANGSKILLS_PLAYWRIGHT_HEADLESS 0 (visible browser) or 1 (headless, default)
LANGSKILLS_PLAYWRIGHT_USER_DATA_DIR Custom Chromium user data directory
LANGSKILLS_PLAYWRIGHT_AUTH_DIR Auth state dir (default: var/runs/playwright_auth)
LANGSKILLS_ZHIHU_LOGIN_TYPE qrcode or cookie
LANGSKILLS_ZHIHU_COOKIES Zhihu cookie string (when login type = cookie)
LANGSKILLS_XHS_LOGIN_TYPE qrcode, cookie, or phone
LANGSKILLS_XHS_COOKIES XHS cookie string (when login type = cookie)

Zhihu and XHS support is limited due to platform restrictions; full coverage in a future release.


๐Ÿ“ Project Structure

๐ŸŽฏ Core System
Module Description
langskills_cli.py CLI entry point (auto-detects venv)
core/cli.py All CLI commands & arg parsing
core/config.py Configuration management
core/search.py Multi-provider search orchestration
core/domain_config.py Domain rules & classification
core/detect_project.py Auto-detect project type
๐Ÿค– LLM Backends (core/llm/)
Module Description
openai_client.py OpenAI-compatible client
ollama_client.py Ollama local model client
factory.py Client factory & routing
base.py Base LLM interface
๐ŸŒ Source Providers (core/sources/)
Module Description
web_search.py Tavily web search
github.py GitHub repository search
stackoverflow.py StackOverflow Q&A
arxiv.py arXiv paper fetcher
baidu.py Baidu search (Playwright)
zhihu.py Zhihu (Playwright)
xhs.py XHS / RedNote (Playwright)
journals/ PMC, PLOS, Nature, eLife
๐Ÿ“ฆ Data & Output
Directory Description
skills/by-skill/ Published skills by domain/topic
skills/by-source/ Published skills by source
dist/ Pre-built SQLite bundles + site
captures/ Per-run capture artifacts
config/ Master config + schedules

๐Ÿค Contributing

Contributions are welcome! Please follow these steps:

  1. Open an issue to discuss the proposed change
  2. Fork the repository and create your feature branch
  3. Submit a pull request with a clear description

๐Ÿ“„ License

This project is licensed under the MIT License.

Copyright (c) 2026 Responsible AI (RAI) Lab @ Florida State University


๐Ÿ™ Credits

  • Authors: Tianming Sha (Stony Brook University), Dr. Yue Zhao (University of Southern California), Dr. Lichao Sun (Lehigh University), Dr. Yushun Dong (Florida State University)
  • Design: Modular pipeline architecture with multi-source intelligence, built for extensibility and offline-first search
  • Skills: 101,330 evidence-backed skills generated from 62K+ papers and 23K+ tech sources via LLM-powered quality gates
  • Sources: Every skill traces to real web pages, academic papers, or code repositories (arXiv, PMC, PLOS, Nature, eLife, GitHub, etc.)

Star History

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langskills_rai-0.1.0.tar.gz (274.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langskills_rai-0.1.0-py3-none-any.whl (327.9 kB view details)

Uploaded Python 3

File details

Details for the file langskills_rai-0.1.0.tar.gz.

File metadata

  • Download URL: langskills_rai-0.1.0.tar.gz
  • Upload date:
  • Size: 274.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langskills_rai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c25d3a3c58963b4167221a3365e542f89f3b9710a3c671987270524d176e6e0
MD5 9044ff70efcbfd11319f2463aa7baa3a
BLAKE2b-256 271ea12f5d559d1bb749d70ae0e33a0247929c73c7a39665a0a74306f27fd037

See more details on using hashes here.

Provenance

The following attestation bundles were made for langskills_rai-0.1.0.tar.gz:

Publisher: publish.yml on LabRAI/LangSkills

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langskills_rai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: langskills_rai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 327.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langskills_rai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87e7038e07990b343fc8346b055ea6b264ebced8d902aab34effb868adb048a6
MD5 6645c325b7d4cd8d30456247b8ce2d2b
BLAKE2b-256 cf64fb45c5ac0a677e8fe016515b1754f2555a3d19880315f97b10dd47683e14

See more details on using hashes here.

Provenance

The following attestation bundles were made for langskills_rai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on LabRAI/LangSkills

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page