Local-first MCP server for intelligent bookmark management with semantic search
Project description
Bookmark Lens
A local-first MCP server for intelligent bookmark management with semantic search.
Save, search, and organize your bookmarks using AI-powered semantic search. Works completely offline (no LLM required for core features).
Features
- ๐ Semantic Search - Find bookmarks by meaning, not just keywords
- ๐ Rich Metadata - Automatic extraction of titles, descriptions, and content
- ๐ค Smart Mode - LLM-powered summaries, auto-tags, and topic classification (optional)
- ๐ท๏ธ Smart Tagging - Manual tags + auto-generated tags (Smart Mode)
- ๐ Topic Classification - Automatic categorization (Smart Mode)
- ๐ Date Filtering - Search by time ranges (natural language supported via LLM)
- ๐ Domain Filtering - Filter by website
- ๐พ Local-First - All data stored locally (DuckDB + LanceDB)
- ๐ค MCP Native - Works with Claude Desktop and other MCP clients
- โก Fast - Local embeddings with sentence-transformers
Quick Setup
Claude Desktop
-
Open your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
-
Add bookmark-lens to the
mcpServerssection:
{
"mcpServers": {
"bookmark-lens": {
"command": "uvx",
"args": ["bookmark-lens"]
}
}
}
- Restart Claude Desktop
That's it! No installation, no setup, no configuration needed.
Other MCP Clients
For other MCP-compatible clients, use:
uvx bookmark-lens
Example Conversations
Save a bookmark:
You: Save https://docs.anthropic.com/en/docs/build-with-claude/model-context-protocol
with note "MCP documentation for building servers"
Claude: [Saves bookmark with automatic title/content extraction]
Search bookmarks:
You: Find bookmarks about AI agents from last week
Claude: [Converts "last week" to date range, searches semantically]
Search with filters:
You: Show me GitHub bookmarks about React
Claude: [Searches with domain filter and semantic query]
Update bookmarks:
You: Add tag "tutorial" to that bookmark
Claude: [Updates tags while preserving existing ones]
Architecture
bookmark-lens/
โโโ src/bookmark_lens/
โ โโโ server.py # MCP server (stdio transport)
โ โโโ config.py # Configuration management
โ โโโ database/
โ โ โโโ duckdb_client.py # Relational data (bookmarks, tags)
โ โ โโโ lancedb_client.py # Vector embeddings
โ โโโ models/
โ โ โโโ bookmark.py # Pydantic models
โ โโโ services/
โ โโโ content_fetcher.py # Web page fetching
โ โโโ embedding_service.py # Text โ vectors
โ โโโ bookmark_service.py # Orchestration
โ โโโ search_service.py # Hybrid search
โโโ data/ # Local databases (gitignored)
โโโ tests/
โโโ manual_test.py # End-to-end testing
Technology Stack
- MCP SDK - Model Context Protocol for AI integration
- DuckDB - Relational database (bookmarks, metadata, tags)
- LanceDB - Vector database (embeddings for semantic search)
- sentence-transformers - Local embedding model (all-MiniLM-L6-v2)
- readability-lxml - Content extraction from web pages
- Pydantic - Data validation and serialization
MCP Tools
save_bookmark
Save a URL with optional note and tags.
Parameters:
url(required): URL to bookmarknote(optional): Context or reason for savingtags(optional): List of tags
Example:
{
"url": "https://example.com/article",
"note": "Great explanation of embeddings",
"tags": ["ai", "ml", "tutorial"]
}
search_bookmarks
Search bookmarks semantically with optional filters.
Parameters:
query(required): What to search fordomain(optional): Filter by domain (e.g., "github.com")tags(optional): Filter by tagsfrom_date(optional): ISO 8601 date stringto_date(optional): ISO 8601 date stringlimit(optional): Max results (default: 10)
Example:
{
"query": "machine learning tutorials",
"domain": "github.com",
"tags": ["python"],
"from_date": "2024-11-07T00:00:00Z",
"limit": 5
}
get_bookmark
Get full details about a bookmark by ID.
Parameters:
id(required): Bookmark ID
update_bookmark
Update note and/or tags for a bookmark.
Parameters:
id(required): Bookmark IDnote(optional): New notetags(optional): Tags to add/replacetag_mode(optional): "replace" or "append" (default: "replace")
delete_bookmark
Delete a bookmark and all its associated data.
Parameters:
id(required): Bookmark ID
Example:
{
"id": "bkm_abc123"
}
list_tags
List all tags with their usage counts.
Parameters: None
Example Response:
{
"success": true,
"count": 5,
"tags": [
{"tag": "ai", "count": 20},
{"tag": "python", "count": 15},
{"tag": "tutorial", "count": 8}
]
}
get_bookmark_stats
Get statistics about your bookmark collection with optional filters.
Parameters:
stat_type(optional): Type of statistics"total"- Total count (default)"by_domain"- Breakdown by domain"by_topic"- Breakdown by topic"by_tag"- Breakdown by tag"by_date"- Activity over time
domain(optional): Filter by domaintopic(optional): Filter by topictags(optional): Filter by tagsfrom_date(optional): Filter after date (ISO 8601)to_date(optional): Filter before date (ISO 8601)limit(optional): For breakdown stats, top N results (default: 10)
Examples:
Total bookmarks:
{
"stat_type": "total"
}
Bookmarks saved this week:
{
"stat_type": "total",
"from_date": "2024-11-07T00:00:00Z"
}
Top domains:
{
"stat_type": "by_domain",
"limit": 5
}
AI bookmarks by domain:
{
"stat_type": "by_domain",
"topic": "AI"
}
Configuration
All configuration is via environment variables (.env file):
# Database paths
BOOKMARK_LENS_DB_PATH=./data/bookmark_lens.db
LANCE_DB_PATH=./data/embeddings.lance
# Embedding model
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384
# Content fetching
BOOKMARK_LENS_FETCH_TIMEOUT=30
BOOKMARK_LENS_USER_AGENT=bookmark-lens/0.1.0
MAX_CONTENT_LENGTH=50000
Smart Mode (LLM Enhancements)
Enable Smart Mode to get automatic summaries, tags, and topic classification for your bookmarks.
Setup
- Choose an LLM model (see LiteLLM providers)
- Get an API key from your provider
- Add to
.env:LLM_MODEL=claude-3-haiku-20240307 LLM_API_KEY=your-api-key-here
- Restart the server
Recommended Models
claude-3-haiku-20240307- Fast, cheap, good quality (Anthropic) [Recommended]gpt-4o-mini- Fast, cheap (OpenAI)gpt-4o- Better quality, more expensive (OpenAI)claude-3-5-sonnet-20241022- Best quality (Anthropic)
See LiteLLM documentation for 100+ supported models.
What Smart Mode Adds
- Auto-summaries: Short (1-2 sentences) and long (1 paragraph) summaries
- Auto-tags: 3-5 relevant tags automatically generated
- Topic classification: High-level category (AI, Cloud, Programming, Data, Security, DevOps, Design, Business, Science, Other)
- Better search: Summaries and topics included in embeddings for improved relevance
- Markdown extraction: Full content extracted as Markdown (preserves structure)
Cost Estimate
With claude-3-haiku-20240307: ~$0.0005 per bookmark (very cheap!)
Performance
- Core Mode (no LLM): Fast saves, only title/description extracted
- Smart Mode (with LLM): Slower saves (~5-10s), full content + enhancements
Note: Smart Mode is completely optional. All core features work without any LLM configuration.
Embedding Models
Default: all-MiniLM-L6-v2 (384 dimensions, fast, good quality)
Alternatives:
all-mpnet-base-v2(768 dimensions, better quality, slower)paraphrase-multilingual-MiniLM-L12-v2(384 dimensions, multilingual)
Change in .env:
EMBEDDING_MODEL_NAME=all-mpnet-base-v2
EMBEDDING_DIMENSION=768
How It Works
Saving a Bookmark
- Fetch - Downloads the web page
- Extract - Pulls out title, description, main content (Markdown in Smart Mode)
- Enhance - Generates summaries, tags, topic (Smart Mode only)
- Embed - Converts text to vector using local model
- Store - Saves to DuckDB (metadata) and LanceDB (vector)
Searching Bookmarks
- Embed Query - Converts search text to vector
- Vector Search - Finds similar bookmarks (LanceDB)
- Filter - Applies domain/tag/date filters (DuckDB)
- Rank - Sorts by similarity score
- Return - Top N results with relevance scores
Natural Language Dates
The LLM (via the bookmark_search_guide prompt) converts natural language to ISO dates:
- "yesterday" โ
2024-11-13T00:00:00Z - "last week" โ
2024-11-07T00:00:00Z - "last month" โ
2024-10-14T00:00:00Z
The server only accepts ISO 8601 format - the LLM does the conversion.
Development
Want to contribute? See CONTRIBUTING.md for setup instructions.
Running Tests
# Clone the repository
git clone https://github.com/yourusername/bookmark-lens.git
cd bookmark-lens
# Install in development mode
pip install -e ".[dev]"
# Run tests
python tests/test_simple.py
Troubleshooting
"Model not found" error
The first run downloads the embedding model (~80MB). This is normal and happens once.
"Database locked" error
Close any other processes using the database. DuckDB doesn't support concurrent writes.
Search returns no results
- Check if bookmarks were saved successfully
- Try a broader query
- Verify embedding model loaded correctly
Slow first search
The embedding model loads on first use. Subsequent searches are fast.
Roadmap
Phase 2 (Smart Mode - Future)
- LLM-powered summaries
- Auto-tagging
- Topic classification
- Query expansion
Future Features
- Browser history import
- Browser extension
- Export/import bookmarks
- Bookmark collections
- Sharing capabilities
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! Please:
- Check
TASKS.mdfor current status - Follow existing code style (minimal, focused implementations)
- Add tests for new features
- Update documentation
Credits
Built with:
- MCP SDK by Anthropic
- DuckDB - Fast analytical database
- LanceDB - Vector database
- sentence-transformers - Embedding models
- readability-lxml - Content extraction
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bookmark_lens-0.0.1.tar.gz.
File metadata
- Download URL: bookmark_lens-0.0.1.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e48e2b04746f99d9253b893891bfed59bdf6da38a2a0e32b39685c7324cb7e28
|
|
| MD5 |
8e1174e37a36e4e4538596876363dfd9
|
|
| BLAKE2b-256 |
0332fe8da70ca1fe6480f8ac3069740645da9d876cf06982664fdfedf585520f
|
File details
Details for the file bookmark_lens-0.0.1-py3-none-any.whl.
File metadata
- Download URL: bookmark_lens-0.0.1-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96919c277622a37f27cbfe2e4c2ec473b272e080c53a3498ae06ed56cff517d8
|
|
| MD5 |
f134eec6a288001dfc5da9bf0cd63b65
|
|
| BLAKE2b-256 |
e26aafc13f0e8a9d3e8b3d033e50e3afba8759f3c5d05dbb82018ad9f7241460
|