Skip to main content

Local-first MCP server for intelligent bookmark management with semantic search

Project description

Bookmark Lens

A local-first MCP server for intelligent bookmark management with semantic search.

Save, search, and organize your bookmarks using AI-powered semantic search. Works completely offline (no LLM required for core features).


Features

  • ๐Ÿ” Semantic Search - Find bookmarks by meaning, not just keywords
  • ๐Ÿ“ Rich Metadata - Automatic extraction of titles, descriptions, and content
  • ๐Ÿค– Smart Mode - LLM-powered summaries, auto-tags, and topic classification (optional)
  • ๐Ÿท๏ธ Smart Tagging - Manual tags + auto-generated tags (Smart Mode)
  • ๐Ÿ“Š Topic Classification - Automatic categorization (Smart Mode)
  • ๐Ÿ“… Date Filtering - Search by time ranges (natural language supported via LLM)
  • ๐ŸŒ Domain Filtering - Filter by website
  • ๐Ÿ’พ Local-First - All data stored locally (DuckDB + LanceDB)
  • ๐Ÿค– MCP Native - Works with Claude Desktop and other MCP clients
  • โšก Fast - Local embeddings with sentence-transformers

Quick Setup

Claude Desktop

  1. Open your Claude Desktop config file:

    • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
  2. Add bookmark-lens to the mcpServers section:

{
  "mcpServers": {
    "bookmark-lens": {
      "command": "uvx",
      "args": ["bookmark-lens"]
    }
  }
}
  1. Restart Claude Desktop

That's it! No installation, no setup, no configuration needed.

Other MCP Clients

For other MCP-compatible clients, use:

uvx bookmark-lens

Example Conversations

Save a bookmark:

You: Save https://docs.anthropic.com/en/docs/build-with-claude/model-context-protocol 
     with note "MCP documentation for building servers"

Claude: [Saves bookmark with automatic title/content extraction]

Search bookmarks:

You: Find bookmarks about AI agents from last week

Claude: [Converts "last week" to date range, searches semantically]

Search with filters:

You: Show me GitHub bookmarks about React

Claude: [Searches with domain filter and semantic query]

Update bookmarks:

You: Add tag "tutorial" to that bookmark

Claude: [Updates tags while preserving existing ones]

Architecture

bookmark-lens/
โ”œโ”€โ”€ src/bookmark_lens/
โ”‚   โ”œโ”€โ”€ server.py              # MCP server (stdio transport)
โ”‚   โ”œโ”€โ”€ config.py              # Configuration management
โ”‚   โ”œโ”€โ”€ database/
โ”‚   โ”‚   โ”œโ”€โ”€ duckdb_client.py   # Relational data (bookmarks, tags)
โ”‚   โ”‚   โ””โ”€โ”€ lancedb_client.py  # Vector embeddings
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ””โ”€โ”€ bookmark.py        # Pydantic models
โ”‚   โ””โ”€โ”€ services/
โ”‚       โ”œโ”€โ”€ content_fetcher.py # Web page fetching
โ”‚       โ”œโ”€โ”€ embedding_service.py # Text โ†’ vectors
โ”‚       โ”œโ”€โ”€ bookmark_service.py # Orchestration
โ”‚       โ””โ”€โ”€ search_service.py  # Hybrid search
โ”œโ”€โ”€ data/                      # Local databases (gitignored)
โ””โ”€โ”€ tests/
    โ””โ”€โ”€ manual_test.py         # End-to-end testing

Technology Stack

  • MCP SDK - Model Context Protocol for AI integration
  • DuckDB - Relational database (bookmarks, metadata, tags)
  • LanceDB - Vector database (embeddings for semantic search)
  • sentence-transformers - Local embedding model (all-MiniLM-L6-v2)
  • readability-lxml - Content extraction from web pages
  • Pydantic - Data validation and serialization

MCP Tools

save_bookmark

Save a URL with optional note and tags.

Parameters:

  • url (required): URL to bookmark
  • note (optional): Context or reason for saving
  • tags (optional): List of tags

Example:

{
  "url": "https://example.com/article",
  "note": "Great explanation of embeddings",
  "tags": ["ai", "ml", "tutorial"]
}

search_bookmarks

Search bookmarks semantically with optional filters.

Parameters:

  • query (required): What to search for
  • domain (optional): Filter by domain (e.g., "github.com")
  • tags (optional): Filter by tags
  • from_date (optional): ISO 8601 date string
  • to_date (optional): ISO 8601 date string
  • limit (optional): Max results (default: 10)

Example:

{
  "query": "machine learning tutorials",
  "domain": "github.com",
  "tags": ["python"],
  "from_date": "2024-11-07T00:00:00Z",
  "limit": 5
}

get_bookmark

Get full details about a bookmark by ID.

Parameters:

  • id (required): Bookmark ID

update_bookmark

Update note and/or tags for a bookmark.

Parameters:

  • id (required): Bookmark ID
  • note (optional): New note
  • tags (optional): Tags to add/replace
  • tag_mode (optional): "replace" or "append" (default: "replace")

delete_bookmark

Delete a bookmark and all its associated data.

Parameters:

  • id (required): Bookmark ID

Example:

{
  "id": "bkm_abc123"
}

list_tags

List all tags with their usage counts.

Parameters: None

Example Response:

{
  "success": true,
  "count": 5,
  "tags": [
    {"tag": "ai", "count": 20},
    {"tag": "python", "count": 15},
    {"tag": "tutorial", "count": 8}
  ]
}

get_bookmark_stats

Get statistics about your bookmark collection with optional filters.

Parameters:

  • stat_type (optional): Type of statistics
    • "total" - Total count (default)
    • "by_domain" - Breakdown by domain
    • "by_topic" - Breakdown by topic
    • "by_tag" - Breakdown by tag
    • "by_date" - Activity over time
  • domain (optional): Filter by domain
  • topic (optional): Filter by topic
  • tags (optional): Filter by tags
  • from_date (optional): Filter after date (ISO 8601)
  • to_date (optional): Filter before date (ISO 8601)
  • limit (optional): For breakdown stats, top N results (default: 10)

Examples:

Total bookmarks:

{
  "stat_type": "total"
}

Bookmarks saved this week:

{
  "stat_type": "total",
  "from_date": "2024-11-07T00:00:00Z"
}

Top domains:

{
  "stat_type": "by_domain",
  "limit": 5
}

AI bookmarks by domain:

{
  "stat_type": "by_domain",
  "topic": "AI"
}

Configuration

All configuration is via environment variables (.env file):

# Database paths
BOOKMARK_LENS_DUCKDB_PATH=./data/bookmark_lens.db
BOOKMARK_LENS_LANCEDB_PATH=./data/embeddings.lance

# Embedding model
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384

# Content fetching
BOOKMARK_LENS_FETCH_TIMEOUT=30
BOOKMARK_LENS_USER_AGENT=bookmark-lens/0.1.0
MAX_CONTENT_LENGTH=50000

Smart Mode (LLM Enhancements)

Enable Smart Mode to get automatic summaries, tags, and topic classification for your bookmarks.

Setup

  1. Choose an LLM model (see LiteLLM providers)
  2. Get an API key from your provider
  3. Add to .env:
    LLM_MODEL=claude-3-haiku-20240307
    LLM_API_KEY=your-api-key-here
    
  4. Restart the server

Recommended Models

  • claude-3-haiku-20240307 - Fast, cheap, good quality (Anthropic) [Recommended]
  • gpt-4o-mini - Fast, cheap (OpenAI)
  • gpt-4o - Better quality, more expensive (OpenAI)
  • claude-3-5-sonnet-20241022 - Best quality (Anthropic)

See LiteLLM documentation for 100+ supported models.

What Smart Mode Adds

  • Auto-summaries: Short (1-2 sentences) and long (1 paragraph) summaries
  • Auto-tags: 3-5 relevant tags automatically generated
  • Topic classification: High-level category (AI, Cloud, Programming, Data, Security, DevOps, Design, Business, Science, Other)
  • Better search: Summaries and topics included in embeddings for improved relevance
  • Markdown extraction: Full content extracted as Markdown (preserves structure)

Cost Estimate

With claude-3-haiku-20240307: ~$0.0005 per bookmark (very cheap!)

Performance

  • Core Mode (no LLM): Fast saves, only title/description extracted
  • Smart Mode (with LLM): Slower saves (~5-10s), full content + enhancements

Note: Smart Mode is completely optional. All core features work without any LLM configuration.


Embedding Models

Default: all-MiniLM-L6-v2 (384 dimensions, fast, good quality)

Alternatives:

  • all-mpnet-base-v2 (768 dimensions, better quality, slower)
  • paraphrase-multilingual-MiniLM-L12-v2 (384 dimensions, multilingual)

Change in .env:

EMBEDDING_MODEL_NAME=all-mpnet-base-v2
EMBEDDING_DIMENSION=768

How It Works

Saving a Bookmark

  1. Fetch - Downloads the web page
  2. Extract - Pulls out title, description, main content (Markdown in Smart Mode)
  3. Enhance - Generates summaries, tags, topic (Smart Mode only)
  4. Embed - Converts text to vector using local model
  5. Store - Saves to DuckDB (metadata) and LanceDB (vector)

Searching Bookmarks

  1. Embed Query - Converts search text to vector
  2. Vector Search - Finds similar bookmarks (LanceDB)
  3. Filter - Applies domain/tag/date filters (DuckDB)
  4. Rank - Sorts by similarity score
  5. Return - Top N results with relevance scores

Natural Language Dates

The LLM (via the bookmark_search_guide prompt) converts natural language to ISO dates:

  • "yesterday" โ†’ 2024-11-13T00:00:00Z
  • "last week" โ†’ 2024-11-07T00:00:00Z
  • "last month" โ†’ 2024-10-14T00:00:00Z

The server only accepts ISO 8601 format - the LLM does the conversion.


Development

Want to contribute? See CONTRIBUTING.md for setup instructions.

Running Tests

# Clone the repository
git clone https://github.com/yourusername/bookmark-lens.git
cd bookmark-lens

# Install in development mode
pip install -e ".[dev]"

# Run tests
python tests/test_simple.py

Troubleshooting

"Model not found" error

The first run downloads the embedding model (~80MB). This is normal and happens once.

"Database locked" error

Close any other processes using the database. DuckDB doesn't support concurrent writes.

Search returns no results

  • Check if bookmarks were saved successfully
  • Try a broader query
  • Verify embedding model loaded correctly

Slow first search

The embedding model loads on first use. Subsequent searches are fast.


Roadmap

Phase 2 (Smart Mode - Future)

  • LLM-powered summaries
  • Auto-tagging
  • Topic classification
  • Query expansion

Future Features

  • Browser history import
  • Browser extension
  • Export/import bookmarks
  • Bookmark collections
  • Sharing capabilities

License

MIT License - see LICENSE file for details.


Contributing

Contributions welcome! Please:

  1. Check TASKS.md for current status
  2. Follow existing code style (minimal, focused implementations)
  3. Add tests for new features
  4. Update documentation

Credits

Built with:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookmark_lens-0.0.3.tar.gz (39.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bookmark_lens-0.0.3-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file bookmark_lens-0.0.3.tar.gz.

File metadata

  • Download URL: bookmark_lens-0.0.3.tar.gz
  • Upload date:
  • Size: 39.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bookmark_lens-0.0.3.tar.gz
Algorithm Hash digest
SHA256 23763d54da34c9178d05c68918cbf083a2e350773df1deff0c03cc4a2285bfc7
MD5 6330d082c2dcf5665c4204152e247b1d
BLAKE2b-256 208b6c600e32f71eeaceb10e1f9c83fbe8e3311e1d3b1f71c9089973968d41ad

See more details on using hashes here.

File details

Details for the file bookmark_lens-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: bookmark_lens-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bookmark_lens-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1534c07fa11c2f6ec76a84759608f3219573d33768dd285ea40de00d7629c776
MD5 b91c4e2a49cd508b5902d5da1aa02130
BLAKE2b-256 5aded0c27e607e05193bb79139b3b36f58b4f3679a4990860cc8b34613ee7faa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page