Turn any GitHub repository into a searchable knowledge base for AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AmberLee2427

These details have not been verified by PyPI

Project description

Nancy Brain

Turn any GitHub repository into a searchable knowledge base for AI agents.

Load the complete source code, documentation, examples, and notebooks from any package you're working with. Nancy Brain gives AI assistants instant access to:

Full source code - actual Python classes, methods, implementation details
Live documentation - tutorials, API docs, usage examples
Real examples - Jupyter notebooks, test cases, configuration files
Smart weighting - boost important docs, learning persists across sessions

The AI can now answer questions like "How do I initialize this class?" or "Show me an example of fitting a light curve" with actual code from the repositories you care about.

🚀 Quick Start

# Install anywhere
pip install nancy-brain

# Initialize a new project
nancy-brain init my-ai-project
cd my-ai-project

# Add some repositories  
nancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git

# Build the knowledge base
nancy-brain build

# Search it!
nancy-brain search "machine learning algorithms"

# Or launch the web interface
nancy-brain ui

🌐 Web Admin Interface

Launch the visual admin interface for easy knowledge base management:

nancy-brain ui

Features:

🔍 Live Search - Test your knowledge base with instant results
📚 Repository Management - Add/remove GitHub repos with visual forms
📄 Article Management - Add/remove PDF articles with visual forms
🏗️ Build Control - Trigger knowledge base builds with options
📊 System Status - Check embeddings, configuration, and health

Perfect for non-technical users and rapid prototyping!

🖥️ Command Line Interface

nancy-brain init <project>        # Initialize new project
nancy-brain add-repo <url>        # Add GitHub repositories  
nancy-brain add-article <url> <name>  # Add PDF articles
nancy-brain build                 # Build knowledge base
nancy-brain search "query"        # Search knowledge base
nancy-brain serve                 # Start HTTP API server
nancy-brain ui                    # Launch web admin interface

Technical Architecture

A lightweight Retrieval-Augmented Generation (RAG) knowledge base with:

Embedding + search pipeline (txtai / FAISS based)
HTTP API connector (FastAPI)
Model Context Protocol (MCP) server connector (tools for search / retrieve / tree / weight)
Dynamic weighting system (extension/path weights + runtime doc preferences)

Designed to power AI assistants on Slack, IDEs, Claude Desktop, custom GPTs, and any MCP-capable client.

1. Installation & Quick Setup

For Users (Recommended)

# Install the package
pip install nancy-brain

# Initialize a new project
nancy-brain init my-knowledge-base
cd my-knowledge-base

# Add repositories and build
nancy-brain add-repo https://github.com/your-org/repo.git
nancy-brain add-article "https://arxiv.org/pdf/paper.pdf" "paper_name" --description "Important paper"
nancy-brain build

# Launch web interface
nancy-brain ui

For Developers

# Clone and install in development mode
git clone <repo-url>
cd nancy-brain
pip install -e ."[dev]"

# Test installation
pytest -q
nancy-brain --help

2. Project Layout (Core Parts)

nancy_brain/                    # Main Python package
├── cli.py                      # Command line interface
├── admin_ui.py                 # Streamlit web admin interface
└── __init__.py                 # Package initialization

connectors/http_api/app.py      # FastAPI app
connectors/mcp_server/          # MCP server implementation
rag_core/                       # Core service, search, registry, store, types
scripts/                        # KB build & management scripts
config/repositories.yml         # Source repository list (input KB)
config/weights.yaml             # Extension + path weighting config
config/model_weights.yaml       # (Optional) static per-doc multipliers

3. Configuration

3.1 Repositories (`config/repositories.yml`)

Structure (categories map to lists of repos):

<category_name>:
  - name: repoA
    url: https://github.com/org/repoA.git
  - name: repoB
    url: https://github.com/org/repoB.git

Categories become path prefixes inside the knowledge base (e.g. cat1/repoA/...).

3.2 Weight Config (`config/weights.yaml`)

extensions: base multipliers by file extension (.py, .md, etc.)
path_includes: if substring appears in doc_id, multiplier is applied multiplicatively.

3.3 Model Weights (`config/model_weights.yaml`)

Optional static per-document multipliers (legacy / seed). Runtime updates via /weight endpoint or MCP set_weight tool override or augment in-memory weights.

3.4 Environment Variables

Var	Purpose	Default
`USE_DUAL_EMBEDDING`	Enable dual (general + code) embedding scoring	true
`CODE_EMBEDDING_MODEL`	Model name for code index (if dual)	microsoft/codebert-base
`KMP_DUPLICATE_LIB_OK`	Set to TRUE to avoid OpenMP macOS clash	TRUE

4. Building the Knowledge Base

Embeddings must be built before meaningful search.

Using the CLI (Recommended)

# Basic build (repositories only)
nancy-brain build

# Build with PDF articles (if configured)
nancy-brain build --articles-config config/articles.yml

# Force update all repositories
nancy-brain build --force-update

# Or use the web interface
nancy-brain ui  # Go to "Build Knowledge Base" page

Using the Python Script Directly

conda activate nancy-brain
cd src/nancy-brain
# Basic build (repositories only)
python scripts/build_knowledge_base.py \
  --config config/repositories.yml \
  --embeddings-path knowledge_base/embeddings

# Full build including optional PDF articles (if config/articles.yml exists)
python scripts/build_knowledge_base.py \
  --config config/repositories.yml \
  --articles-config config/articles.yml \
  --base-path knowledge_base/raw \
  --embeddings-path knowledge_base/embeddings \
  --force-update \
  --dirty
# You can run without the dirty tag to automatically 
# remove source material after indexing is complete

Run python scripts/build_knowledge_base.py -h for all options.

4.1 PDF Articles (Optional Quick Setup)

Create config/articles.yml (example):

journal_articles:
  - name: Paczynski_1986_ApJ_304_1
    url: https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF
    description: Paczynski (1986) – Gravitational microlensing

Install Java (for Tika PDF extraction) – macOS:

brew install openjdk
export JAVA_HOME="/opt/homebrew/opt/openjdk"
export PATH="$JAVA_HOME/bin:$PATH"

(Optional fallback only) Install lightweight PDF libs if you skip Java:

pip install PyPDF2 pdfplumber

Build with articles (explicit):

python scripts/build_knowledge_base.py --config config/repositories.yml --articles-config config/articles.yml

Keep raw PDFs for inspection: add --dirty.

Notes:

If Java/Tika not available, script attempts fallback extraction (needs PyPDF2/pdfplumber or fitz).
Cleanups remove raw PDFs unless --dirty supplied.
Article docs are indexed under journal_articles/<category>/<name>.

Key flags:

--config path to repositories YAML (was --repositories in older docs)
--articles-config optional PDF articles YAML
--base-path where raw repos/PDFs live (default knowledge_base/raw)
--embeddings-path output index directory
--force-update re-pull repos / re-download PDFs
--category <name> limit to one category
--dry-run show actions without performing
--dirty keep raw sources (skip cleanup)

This will:

Clone / update listed repos under knowledge_base/raw/<category>/<repo>
(Optionally) download PDFs into category directories
Convert notebooks (*.ipynb -> *.nb.txt) if nb4llm available
Extract and normalize text + (optionally) PDF text
Build / update embeddings index at knowledge_base/embeddings (and code_index if dual embeddings enabled)

Re-run when repositories or articles change.

5. Running Services

Web Admin Interface (Recommended for Getting Started)

nancy-brain ui
# Opens Streamlit interface at http://localhost:8501
# Features: search, repo management, build control, status

HTTP API Server

# Using CLI
nancy-brain serve

# Or directly with uvicorn
uvicorn connectors.http_api.app:app --host 0.0.0.0 --port 8000

MCP Server (for AI Assistants)

# Run MCP stdio server
python run_mcp_server.py

Initialize service programmatically (example pattern):

from pathlib import Path
from connectors.http_api.app import initialize_rag_service
initialize_rag_service(
    config_path=Path('config/repositories.yml'),
    embeddings_path=Path('knowledge_base/embeddings'),
    weights_path=Path('config/weights.yaml'),
    use_dual_embedding=True
)

The FastAPI dependency layer will then serve requests.

Command Line Search

# Quick search from command line
nancy-brain search "machine learning algorithms" --limit 5

# Search with custom paths
nancy-brain search "neural networks" \
  --embeddings-path custom/embeddings \
  --config custom/repositories.yml

5.1 Endpoints (Bearer auth placeholder)

Method	Path	Description
GET	`/health`	Service status
GET	`/version`	Index / build meta
GET	`/search?query=...&limit=N`	Search documents
POST	`/retrieve`	Retrieve passage (doc_id + line range)
POST	`/retrieve/batch`	Batch retrieve
GET	`/tree?prefix=...`	List KB tree
POST	`/weight`	Set runtime doc weight

Example:

curl -H "Authorization: Bearer TEST" 'http://localhost:8000/search?query=light%20curve&limit=5'

Set a document weight (boost factor 0.5–2.0 typical):

curl -X POST -H 'Authorization: Bearer TEST' \
  -H 'Content-Type: application/json' \
  -d '{"doc_id":"cat1/repoA/path/file.py","multiplier":2.0}' \
  http://localhost:8000/weight

6. MCP Server

Run the MCP stdio server:

python run_mcp_server.py

Tools exposed (operation names):

search (query, limit)
retrieve (doc_id, start, end)
retrieve_batch
tree (prefix, depth)
set_weight (doc_id, multiplier)
status / version

6.1 VS Code Integration

Install a Model Context Protocol client extension (e.g. "MCP Explorer" or equivalent).
Add a server entry pointing to the script, stdio transport. Example config snippet:

{
  "mcpServers": {
    "nancy-brain": {
      "command": "python",
      "args": ["/absolute/path/to/src/nancy-brain/run_mcp_server.py"],
      "env": {
        "PYTHONPATH": "/absolute/path/to/src/nancy-brain" 
      }
    }
  }
}

Specific mamba environment example:

{
	"servers": {
		"nancy-brain": {
			"type": "stdio",
			"command": "/Users/malpas.1/.local/share/mamba/envs/nancy-brain/bin/python",
			"args": [
				"/Users/malpas.1/Code/slack-bot/src/nancy-brain/run_mcp_server.py"
			],
			"env": {
				"PYTHONPATH": "/Users/malpas.1/Code/slack-bot/src/nancy-brain",
				"KMP_DUPLICATE_LIB_OK": "TRUE"
			}
		}
	},
	"inputs": []
}

Reload VS Code. The provider should list the tools; invoke search to test.

6.2 Claude Desktop

Claude supports MCP config in its settings file. Add an entry similar to above (command + args). Restart Claude Desktop; tools appear in the prompt tools menu.

7. Use Cases & Examples

For Researchers

# Add astronomy packages
nancy-brain add-repo https://github.com/astropy/astropy.git
nancy-brain add-repo https://github.com/rpoleski/MulensModel.git

# Add key research papers
nancy-brain add-article \
  "https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF" \
  "Paczynski_1986_microlensing" \
  --category "foundational_papers" \
  --description "Paczynski (1986) - Gravitational microlensing by the galactic halo"

nancy-brain build

# AI can now answer: "How do I model a microlensing event?"
nancy-brain search "microlensing model fit"

For ML Engineers

# Add ML frameworks
nancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git
nancy-brain add-repo https://github.com/pytorch/pytorch.git
nancy-brain build

# AI can now answer: "Show me gradient descent implementation"
nancy-brain search "gradient descent optimizer"

For Teams

# Launch web interface for non-technical users
nancy-brain ui
# Point team to http://localhost:8501
# They can search, add repos, manage articles, trigger builds visually
# Repository Management tab: Add GitHub repos
# Articles tab: Add PDF papers and documents

8. Slack Bot (Nancy)

The Slack-facing assistant lives outside this submodule (see parent repository). High-level steps:

Ensure HTTP API running and reachable (or embed service directly in bot process).
Bot receives user message -> constructs query -> calls /search and selected /retrieve for context.
Bot composes answer including source references (doc_id and GitHub URL) before sending back.
Optional: adaptively call /weight when feedback indicates a source should be boosted or dampened.

Check root-level nancy_bot.py or Slack integration docs (SLACK.md) for token setup and event subscription details.

9. Custom GPT (OpenAI Actions / Function Calls)

Define OpenAI tool specs mapping to HTTP endpoints:

searchDocuments(query, limit) -> GET /search
retrievePassage(doc_id, start, end) -> POST /retrieve
listTree(prefix, depth) -> GET /tree
setWeight(doc_id, multiplier) -> POST /weight

Use an API gateway or direct URL. Include auth header. Provide JSON schemas matching request/response models.

10. Dynamic Weighting Flow

Base score from embeddings (dual or single).
Extension multiplier (from weights.yaml).
Path multiplier(s) (cumulative).
Model weight (static config + runtime overrides via /weight).
Adjusted score = base * extension_weight * model_weight (and any path multipliers folded into extension weight step).

Runtime /weight takes effect immediately on subsequent searches.

11. Updating / Rebuilding

Action	Command
Pull repo updates	`nancy-brain build --force-update` or re-run build script
Change extension weights	Edit `config/weights.yaml` (no restart needed for runtime? restart or rebuild if cached)
Change embedding model	Delete / rename existing `knowledge_base/embeddings` and rebuild with new env vars

12. Deployment Notes

Containerize: build image with pre-built embeddings baked or mount a persistent volume.
Health probe: /health (returns 200 once rag_service initialized) else 503.
Concurrency: FastAPI async safe; weight updates are simple dict writes (low contention). For heavy load consider a lock if races appear.
Persistence of runtime weights: currently in-memory; persist manually if needed (extend set_weight).

13. Troubleshooting

Symptom	Cause	Fix
503 RAG service not initialized	`initialize_rag_service` not called / wrong paths	Call initializer with correct embeddings path
Empty search results	Embeddings not built / wrong path	Re-run `nancy-brain build`, verify index directory
macOS OpenMP crash	MKL / libomp duplicate	`KMP_DUPLICATE_LIB_OK=TRUE` already set early
MCP tools not visible	Wrong path or PYTHONPATH	Use absolute paths in MCP config
CLI command not found	Package not installed	`pip install nancy-brain`

Enable debug logging:

export LOG_LEVEL=DEBUG

(add logic or run with uvicorn --log-level debug)

14. Development & Contributing

# Clone and set up development environment
git clone <repo-url>
cd nancy-brain
pip install -e ."[dev]"

# Run tests
pytest

# Run linting
black nancy_brain/ 
flake8 nancy_brain/

# Test CLI locally
nancy-brain --help

Releasing

Nancy Brain uses automated versioning and PyPI publishing:

# Bump patch version (0.1.0 → 0.1.1)
./release.sh patch

# Bump minor version (0.1.0 → 0.2.0)  
./release.sh minor

# Bump major version (0.1.0 → 1.0.0)
./release.sh major

This automatically:

Updates version numbers in pyproject.toml and nancy_brain/__init__.py
Creates a git commit and tag
Pushes to GitHub, triggering PyPI publication via GitHub Actions

Manual version management:

# See current version and bump options
bump-my-version show-bump

# Dry run (see what would change)
bump-my-version bump --dry-run patch

15. Roadmap (Optional)

Persistence layer for runtime weights
Additional retrieval filters (e.g. semantic rerank)
Auth plugin / token validation
VS Code extension
Package publishing to PyPI

16. License

See parent repository license.

17. Minimal Verification Script

# After build & run
curl -H 'Authorization: Bearer TEST' 'http://localhost:8000/health'

Expect JSON with status + trace_id.

Happy searching.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AmberLee2427

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Mar 8, 2026

0.2.1

Mar 8, 2026

0.2.0

Jan 17, 2026

0.1.10

Nov 11, 2025

0.1.9

Nov 11, 2025

0.1.8

Nov 11, 2025

0.1.7

Nov 3, 2025

0.1.6

Nov 3, 2025

0.1.5

Aug 28, 2025

0.1.3

Aug 24, 2025

This version

0.1.2

Aug 24, 2025

0.1.1

Aug 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nancy_brain-0.1.2.tar.gz (2.0 MB view details)

Uploaded Aug 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nancy_brain-0.1.2-py3-none-any.whl (52.3 kB view details)

Uploaded Aug 24, 2025 Python 3

File details

Details for the file nancy_brain-0.1.2.tar.gz.

File metadata

Download URL: nancy_brain-0.1.2.tar.gz
Upload date: Aug 24, 2025
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nancy_brain-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`1cdc0c2fa80457b5a05a017aa43f19dff8c24dbd76fe11e16cca19586f510f37`
MD5	`ec7ce282166e56f2f4810d7fe08ffe76`
BLAKE2b-256	`e34095aa9e4db9020bf5a9e71e457267f101892073e698b4086f59df3e27452f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nancy_brain-0.1.2.tar.gz:

Publisher: publish.yml on AmberLee2427/nancy-brain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nancy_brain-0.1.2.tar.gz
- Subject digest: 1cdc0c2fa80457b5a05a017aa43f19dff8c24dbd76fe11e16cca19586f510f37
- Sigstore transparency entry: 428046233
- Sigstore integration time: Aug 24, 2025
Source repository:
- Permalink: AmberLee2427/nancy-brain@fefd36da8668ff9226b45d18f7543cd7bba74ad6
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/AmberLee2427
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fefd36da8668ff9226b45d18f7543cd7bba74ad6
- Trigger Event: push

File details

Details for the file nancy_brain-0.1.2-py3-none-any.whl.

File metadata

Download URL: nancy_brain-0.1.2-py3-none-any.whl
Upload date: Aug 24, 2025
Size: 52.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nancy_brain-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79fe646b06d5cf1f06f4ae0aacd7eb0df0de60e3e3fd162eef018e0223e7136a`
MD5	`f3eaf0ab894361e1f685a366ade87224`
BLAKE2b-256	`9a0ecac6174478bb65e09313caf36dbd888f7e059b309b81ad292625d01618a5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nancy_brain-0.1.2-py3-none-any.whl:

Publisher: publish.yml on AmberLee2427/nancy-brain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nancy_brain-0.1.2-py3-none-any.whl
- Subject digest: 79fe646b06d5cf1f06f4ae0aacd7eb0df0de60e3e3fd162eef018e0223e7136a
- Sigstore transparency entry: 428046242
- Sigstore integration time: Aug 24, 2025
Source repository:
- Permalink: AmberLee2427/nancy-brain@fefd36da8668ff9226b45d18f7543cd7bba74ad6
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/AmberLee2427
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fefd36da8668ff9226b45d18f7543cd7bba74ad6
- Trigger Event: push

nancy-brain 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Nancy Brain

🚀 Quick Start

🌐 Web Admin Interface

🖥️ Command Line Interface

Technical Architecture

1. Installation & Quick Setup

For Users (Recommended)

For Developers

2. Project Layout (Core Parts)

3. Configuration

3.1 Repositories (config/repositories.yml)

3.2 Weight Config (config/weights.yaml)

3.3 Model Weights (config/model_weights.yaml)

3.4 Environment Variables

4. Building the Knowledge Base

Using the CLI (Recommended)

Using the Python Script Directly

4.1 PDF Articles (Optional Quick Setup)

5. Running Services

Web Admin Interface (Recommended for Getting Started)

HTTP API Server

MCP Server (for AI Assistants)

Command Line Search

5.1 Endpoints (Bearer auth placeholder)

6. MCP Server

6.1 VS Code Integration

6.2 Claude Desktop

7. Use Cases & Examples

For Researchers

For ML Engineers

For Teams

8. Slack Bot (Nancy)

9. Custom GPT (OpenAI Actions / Function Calls)

10. Dynamic Weighting Flow

11. Updating / Rebuilding

12. Deployment Notes

13. Troubleshooting

14. Development & Contributing

Releasing

15. Roadmap (Optional)

16. License

17. Minimal Verification Script

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

3.1 Repositories (`config/repositories.yml`)

3.2 Weight Config (`config/weights.yaml`)

3.3 Model Weights (`config/model_weights.yaml`)