A powerful RAG (Retrieval-Augmented Generation) system built with LangChain, designed as an MCP server for Cursor, VS Code, and other AI assistants

Project description

PinRAG

A powerful RAG (Retrieval-Augmented Generation) system built with LangChain, designed as an MCP (Model Context Protocol) server for Cursor, VS Code (GitHub Copilot), and other AI assistants.

Overview

PinRAG provides intelligent document querying and retrieval capabilities for PDFs, YouTube transcripts, Discord exports, and GitHub repositories. Index documents, ask questions, and get answers with source citations—all via MCP tools in your editor.

Features

Multi-format indexing — PDF (.pdf), YouTube (URL or video ID), Discord export (.txt), plain text (.txt), GitHub repo (URL)
RAG with citations — Ask questions, get answers with source (document + page for PDFs, timestamp for YouTube)
Document tags — Tag documents at index time (e.g. AMIGA, PI_PICO) for filtered search
Metadata filtering — Query by document, page range (PDF only), or tag
MCP tools — add_document_tool, query_tool, list_documents_tool, remove_document_tool
MCP resource — Read-only list of indexed documents (pinrag://documents); click in Cursor’s MCP panel to view
MCP prompt — ask_about_documents (parameter: question) for guided RAG queries
Configurable LLM — Anthropic (default) or OpenAI; set via PINRAG_LLM_PROVIDER and model in .env
Configurable embeddings — OpenAI (default) or Cohere; set via PINRAG_EMBEDDING_PROVIDER. Use the same provider for indexing and querying (e.g. re-index after switching).
Built with — LangChain, Chroma; optional OpenAI, Anthropic, Cohere

Installation

pipx install pinrag
# or: uv tool install pinrag

Requires Python 3.12+. Both pipx and uv tool install create an isolated environment and put pinrag-mcp on your PATH.

Updating

pipx upgrade pinrag
# or: uv cache clean && uv tool install pinrag --force

Restart your editor after updating so the MCP server picks up the new version.

Quick Start

1. Create config

mkdir -p ~/.pinrag
# Minimum required (defaults: Anthropic for LLM, OpenAI for embeddings)
echo "OPENAI_API_KEY=sk-..." > ~/.pinrag/.env
echo "ANTHROPIC_API_KEY=sk-ant-..." >> ~/.pinrag/.env
# Optional: Cohere for re-ranking (COHERE_API_KEY + PINRAG_USE_RERANK=true); see Configuration below

Minimum required env vars:

Default setup (Anthropic LLM + OpenAI embeddings): set both OPENAI_API_KEY and ANTHROPIC_API_KEY in ~/.pinrag/.env (or ~/.config/pinrag/.env). The server checks for OPENAI_API_KEY at startup; the LLM needs ANTHROPIC_API_KEY when you run a query.
OpenAI only: set PINRAG_LLM_PROVIDER=openai and only OPENAI_API_KEY (one key for both embeddings and chat).
Cohere embeddings: set PINRAG_EMBEDDING_PROVIDER=cohere and COHERE_API_KEY; you still need an LLM key (OpenAI or Anthropic) per above.

2. Add MCP server

Cursor: Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "pinrag": {
      "command": "pinrag-mcp"
    }
  }
}

VS Code (GitHub Copilot): Run MCP: Open User Configuration from the Command Palette, then add:

{
  "servers": {
    "pinrag": {
      "command": "pinrag-mcp"
    }
  }
}

Or create .vscode/mcp.json in your workspace for project-specific setup. Restart VS Code or Cursor after editing.

Where the MCP finds .env: The server loads .env from the current working directory (cwd) of the MCP process, which is usually the workspace folder you have open. If you use a global ~/.cursor/mcp.json and open a different project, cwd is that project—so the MCP will not see a .env that lives only in another folder (e.g. an pinrag project). You can either put your .env in ~/.pinrag/ or ~/.config/pinrag/ (so it is always found), or add an env block to your MCP config and set all required variables there (API keys, PINRAG_*, etc.). Do not put secrets in a project-level .cursor/mcp.json if that file is committed to git.

Backup: Back up ~/.pinrag/chroma_db (or your PINRAG_PERSIST_DIR) if your indexed documents are important — deleting it removes all indexes.

Note: MCP in VS Code requires GitHub Copilot and VS Code 1.102+. Enterprise users may need an admin to enable "MCP servers in Copilot."

3. Use in chat

Action	Tool
Add files or YouTube videos	`add_document_tool` — path(s) as list (e.g. `paths=["/path/to/file.pdf"]` or `paths=["https://youtu.be/xyz"]`); optionally `tags` (one per path)
List indexed documents	`list_documents_tool` — shows documents, chunk counts, tags, upload times
Query with filters	`query_tool` — filter by `document_id`, `page_min`/`page_max` (PDF only), or `tag`
Remove a document	`remove_document_tool`
View indexed documents (read-only)	Click Resources → `_documents_resource` in the MCP panel

Ask in chat: "Add /path/to/amiga-book.pdf with tag AMIGA", "Index https://youtu.be/xyz and ask what it says", or "Index https://github.com/owner/repo and ask about the codebase". The AI will invoke the tools for you. Citations show page numbers for PDFs, timestamps (e.g. t. 1:23) for YouTube, and file paths for GitHub.

GitHub indexing

Index a GitHub repository to ask questions about its code and docs. Use add_document_tool with a GitHub URL:

https://github.com/owner/repo
https://github.com/owner/repo/tree/branch
github.com/owner/repo (no scheme)

Optional parameters for GitHub URLs: branch, include_patterns (e.g. ["*.md", "src/**/*.py"]), exclude_patterns. Set GITHUB_TOKEN in .env for private repos or higher API rate limits. Large files (>512 KB by default) and binaries are skipped.

YouTube indexing and IP blocking

YouTube often blocks transcript requests from IPs that have made too many requests or from cloud provider IPs (AWS, GCP, Azure, etc.). When indexing playlists or many videos, you may see errors like "YouTube is blocking requests from your IP".

Workaround: Use an HTTP/HTTPS proxy. Add to .env:

PINRAG_YT_PROXY_HTTP_URL=http://user:pass@proxy.example.com:80
PINRAG_YT_PROXY_HTTPS_URL=http://user:pass@proxy.example.com:80

Rotating proxy services (e.g. Webshare) work well; residential proxies are often more reliable than datacenter IPs for avoiding YouTube blocks. The proxy is used only for fetching transcripts via youtube-transcript-api.

When indexing fails, add_document_tool returns a fail_summary with counts by reason: blocked (IP blocking), disabled (transcripts disabled by creator), missing_transcript, and other.

Configuration

.env is loaded from (first existing file wins):

~/.config/pinrag/.env
~/.pinrag/.env
{cwd}/.env (current working directory of the process)

Environment variables:

Variable	Default	Description
LLM
`PINRAG_LLM_PROVIDER`	`anthropic`	`openai` or `anthropic`
`PINRAG_LLM_MODEL`	(provider default)	e.g. `claude-haiku-4-5`, `claude-sonnet-4-6`, `gpt-4o-mini`
`OPENAI_API_KEY`	(required for OpenAI)	OpenAI API key (LLM or embeddings)
`ANTHROPIC_API_KEY`	(required for Anthropic)	Anthropic API key (when `PINRAG_LLM_PROVIDER=anthropic` or `PINRAG_EVALUATOR_PROVIDER=anthropic`)
Evaluators (LLM-as-judge)
`PINRAG_EVALUATOR_PROVIDER`	`openai`	`openai` or `anthropic` — which LLM grades correctness/relevance/groundedness/retrieval
`PINRAG_EVALUATOR_MODEL`	(provider default)	Model for correctness/relevance (e.g. `gpt-4o`, `claude-sonnet-4-6`)
`PINRAG_EVALUATOR_MODEL_CONTEXT`	(provider default)	Model for groundedness/retrieval (context-heavy; e.g. `gpt-4o-mini`, `claude-haiku-4-5`)
Embeddings
`PINRAG_EMBEDDING_PROVIDER`	`openai`	`openai` or `cohere`
`PINRAG_EMBEDDING_MODEL`	(provider default)	e.g. `text-embedding-3-small`, `embed-english-v3.0`
`COHERE_API_KEY`	(required for Cohere)	Cohere API key; install with `pip install pinrag[cohere]` when using Cohere embeddings or re-ranking
Storage & chunking
`PINRAG_PERSIST_DIR`	`chroma_db`	Chroma vector store directory (project-local by default; use `~/.pinrag/chroma_db` for global)
`PINRAG_CHUNK_SIZE`	`1000`	Text chunk size
`PINRAG_CHUNK_OVERLAP`	`200`	Chunk overlap
`PINRAG_COLLECTION_NAME`	`pinrag`	Chroma collection name. Single shared collection by default.
Parent-child retrieval
`PINRAG_USE_PARENT_CHILD`	`false`	Set to `true` to embed small chunks (precise matching) and return larger parent chunks (rich context). Requires re-indexing.
`PINRAG_PARENT_CHUNK_SIZE`	`2000`	Parent chunk size (chars) when `PINRAG_USE_PARENT_CHILD=true`.
`PINRAG_CHILD_CHUNK_SIZE`	`800`	Child chunk size (chars) when `PINRAG_USE_PARENT_CHILD=true`.
Retrieval
`PINRAG_RETRIEVE_K`	`20`	Number of chunks to retrieve. When rerank is on, this is the fallback for the pre-rerank fetch if `PINRAG_RERANK_RETRIEVE_K` is unset.
Re-ranking
`PINRAG_USE_RERANK`	`false`	Set to `true` to enable Cohere Re-Rank: fetch more chunks, re-score with Cohere, pass top N to the LLM. Requires `pip install pinrag[cohere]` and `COHERE_API_KEY`.
`PINRAG_RERANK_RETRIEVE_K`	`20`	Chunks to fetch before reranking when `PINRAG_USE_RERANK=true`. If unset, uses `PINRAG_RETRIEVE_K`.
`PINRAG_RERANK_TOP_N`	`10`	Number of chunks the reranker returns to the LLM (only when `PINRAG_USE_RERANK=true`).
Multi-query
`PINRAG_USE_MULTI_QUERY`	`false`	Set to `true` to generate 3–5 query variants via LLM, retrieve per variant, merge (unique union). Improves recall for terse or ambiguous queries.
`PINRAG_MULTI_QUERY_COUNT`	`4`	Number of alternative queries to generate when `PINRAG_USE_MULTI_QUERY=true`.
Response style
`PINRAG_RESPONSE_STYLE`	`thorough`	RAG answer style: `thorough` (detailed) or `concise`. Used by evaluation target and as default when MCP `query` omits `response_style`.
GitHub indexing
`GITHUB_TOKEN`	(optional)	Personal access token for GitHub API. Required for private repos; increases rate limits for public repos.
`PINRAG_GITHUB_MAX_FILE_BYTES`	`524288` (512 KB)	Skip files larger than this when indexing GitHub repos.
`PINRAG_GITHUB_DEFAULT_BRANCH`	`main`	Default branch when not specified in the GitHub URL.
Plain text indexing
`PINRAG_PLAINTEXT_MAX_FILE_BYTES`	`524288` (512 KB)	Skip plain .txt files larger than this when indexing.
YouTube transcript proxy
`PINRAG_YT_PROXY_HTTP_URL`	(none)	HTTP proxy URL for transcript fetches (e.g. `http://user:pass@proxy:80`). Use when YouTube blocks your IP.
`PINRAG_YT_PROXY_HTTPS_URL`	(none)	HTTPS proxy URL for transcript fetches. Same as HTTP when using a generic proxy.

Re-indexing when changing embedding provider: Changing PINRAG_EMBEDDING_PROVIDER requires re-indexing existing documents (indexes use provider-specific embedding dimensions). Alternatively use separate collections per provider (default behavior) and index into each when needed.

Re-indexing when enabling parent-child: Setting PINRAG_USE_PARENT_CHILD=true requires re-indexing; the new structure (child chunks in Chroma, parent chunks in docstore) is created only during indexing.

Monitoring & Observability

For query performance metrics (latency, timing, token usage) and debugging, use LangSmith. Set LANGSMITH_TRACING=true and LANGSMITH_API_KEY in .env; traces are sent automatically. See notes/langsmith-setup.md for setup. With PINRAG_LOG_TO_STDERR=true, tool completion timing is also logged to stderr.

Multiple providers and collections

Embedding dimension depends on the provider (OpenAI 1536, Cohere 1024). To avoid dimension mismatches:

Default: Collection name is pinrag. Use one embedding provider; if you switch provider, re-index or you will get dimension errors.
Per-provider collections: Set PINRAG_COLLECTION_NAME to a provider-specific name (e.g. pinrag_openai, pinrag_cohere) when indexing, and use the same name when querying with that provider. You can index the same PDFs into multiple collections (switch env and index again) and switch by changing PINRAG_EMBEDDING_PROVIDER and PINRAG_COLLECTION_NAME in .env.
MCP tools: The server uses PINRAG_COLLECTION_NAME (default pinrag) for all tools. Collection is not configurable per call; change it via .env to target a different collection.

Query Filtering

query_tool supports optional filters to narrow retrieval:

Parameter	Description
`document_id`	Search only in this document (e.g. `mybook.pdf` or video ID from `list_documents_tool`)
`page_min`, `page_max`	Restrict to page range (PDF only; single page: `page_min=16`, `page_max=16`)
`tag`	Search only documents with this tag (e.g. `AMIGA`, `PI_PICO`)
`document_type`	Search only by type: `pdf`, `youtube`, `discord`, `github`, or `plaintext`
`response_style`	Answer style: `thorough` (default) or `concise`

Filters can be combined. Sources include page for PDFs and start (timestamp in seconds) for YouTube. Example: "What is OpenOCD? In the Pico doc, pages 16–17 only" →
query_tool(query="...", document_id="RP-008276-DS-1-getting-started-with-pico.pdf", page_min=16, page_max=17).

Development

git clone https://github.com/ndjordjevic/pinrag.git
cd pinrag
uv sync --extra dev
uv run pytest

Run MCP server from source:

uv run pinrag-mcp

For local development, point the MCP config to your venv:

Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "pinrag": {
      "command": "/path/to/pinrag/.venv/bin/pinrag-mcp"
    }
  }
}

VS Code (.vscode/mcp.json):

{
  "servers": {
    "pinrag": {
      "command": "/path/to/pinrag/.venv/bin/pinrag-mcp"
    }
  }
}

License

MIT License. See LICENSE for details.

Project details

Release history Release notifications | RSS feed

0.9.27

Apr 9, 2026

0.9.26

Mar 29, 2026

0.9.25

Mar 28, 2026

0.9.24

Mar 28, 2026

0.9.23

Mar 27, 2026

0.9.22

Mar 26, 2026

0.9.21

Mar 25, 2026

0.9.20

Mar 25, 2026

0.9.19

Mar 25, 2026

0.9.18

Mar 25, 2026

0.9.17

Mar 25, 2026

0.9.16

Mar 25, 2026

0.9.15

Mar 23, 2026

0.9.14

Mar 23, 2026

0.9.13

Mar 23, 2026

0.9.12

Mar 22, 2026

0.9.11

Mar 22, 2026

0.9.10

Mar 22, 2026

0.9.9

Mar 22, 2026

0.9.8

Mar 22, 2026

0.9.7

Mar 22, 2026

0.9.6

Mar 22, 2026

0.9.5

Mar 21, 2026

0.9.4

Mar 21, 2026

0.9.3

Mar 21, 2026

0.9.2

Mar 21, 2026

0.9.1

Mar 21, 2026

0.9.0

Mar 21, 2026

0.8.9

Mar 18, 2026

0.8.8

Mar 14, 2026

0.8.7

Mar 14, 2026

0.8.6

Mar 13, 2026

0.8.5

Mar 12, 2026

0.8.4

Mar 12, 2026

This version

0.8.3

Mar 12, 2026

0.8.2

Mar 12, 2026

0.8.1

Mar 11, 2026

0.8.0

Mar 10, 2026

0.7.1

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinrag-0.8.3.tar.gz (92.9 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pinrag-0.8.3-py3-none-any.whl (75.2 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file pinrag-0.8.3.tar.gz.

File metadata

Download URL: pinrag-0.8.3.tar.gz
Upload date: Mar 12, 2026
Size: 92.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pinrag-0.8.3.tar.gz
Algorithm	Hash digest
SHA256	`64657b59096b0ab26ec57d161ae4c9a09a4b218f708f09062aecb83ecf10c662`
MD5	`72743d3f0dbcd0d82c216b09779c8a96`
BLAKE2b-256	`e7335a811b0c5dfb1b98abc1b348a6413d6d5a2c23be07acc5144b2d6c8cfbfa`

See more details on using hashes here.

File details

Details for the file pinrag-0.8.3-py3-none-any.whl.

File metadata

Download URL: pinrag-0.8.3-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 75.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pinrag-0.8.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9efd94ccfa7c9a80f2950f6d1afba246e49b5d8df9a1ae1d6660e49cf793dd8a`
MD5	`6ba5cea0e8fa0937fcca394c83121b46`
BLAKE2b-256	`154b8919af6c7dd800225a9c0602573a57bedb6088b9e95a4790f8b4c434f7ca`

See more details on using hashes here.

pinrag 0.8.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

PinRAG

Overview

Features

Installation

Updating

Quick Start

1. Create config

2. Add MCP server

3. Use in chat

GitHub indexing

YouTube indexing and IP blocking

Configuration

Monitoring & Observability

Multiple providers and collections

Query Filtering

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes