BookmarkSummarizer is a powerful tool that crawls your Chrome bookmarks, generates summaries using large language models, and turns them into a personal knowledge base. Easily search and utilize all your bookmarked web resources without manual organization.
Project description
BookmarkSummarizer
BookmarkSummarizer is a powerful tool that crawls your browsers' bookmarks, generates summaries using large language models, and turns them into a personal knowledge base. Easily search and utilize all your bookmarked web resources without manual organization. Supports all common desktop browsers (Chrome, Firefox, Edge, Safari) as well as uncommon ones (Chromium, Brave, Vivaldi, Opera, etc).
✨ Key Features
- 🔍 Smart Bookmark Crawling: Automatically extract content from your browsers' bookmarks by fetching the bookmarks' URLs webpages content.
- 🤖 AI Summary Generation: Create high-quality summaries for each bookmark using large language models
- 🚀 Blazingly fast and scalable full-text fuzzy search: Rocket fast fuzzy search indexing and retrieval based on Whoosh, supporting millions of bookmarks, all offline!
- 🔄 Parallel Processing: Efficient multi-threaded crawling to significantly reduce processing time
- 🌐 Multiple Model Support: Compatible with OpenAI, Deepseek, Qwen, and Ollama offline models
- 💾 Incremental Update And Checkpoint Recovery: Update the database with new bookmarks or continue processing after interruptions without losing completed work
- 📊 Detailed Logging: Clear progress and status reports for monitoring and debugging
- Made to scale: Start small with hundreds of bookmarks in a <10MB LMDB database, and with incremental updates you can scale to thousands of bookmarks of a few GB using just a fraction of the RAM thanks to the out-of-core database saved on-disk, up to millions of bookmarks with a LMDB database of several TBs using only a few GBs of memory to load during crawling. The fuzzy search engine further improves scaling by building another fuzzy search Whoosh database much smaller in size, so that searching bookmarks content, URL, titles or summaries is blazingly fast with negligible RAM footprint.
- Modular architecture: custom parsers can be added without modifying the core logic by adding python files in custom_parsers. For example, custom parsers are provided to extract YouTube transcripts as content to summarize, and suspended tabs that got bookmarked are transparently unsuspended to fetch the true target page content.
🚀 Quick Start
Prerequisites
- Python 3.6+
- Chrome browser
- Internet connection
- Large language model API key (optional)
Installation
Portable binaries
Head to the GitHub Releases and pick the latest release, you will find precompiled binaries for Windows, MacOS and Linux.
From PyPi
If you already have a Python install, you can install this app simply by:
pip install --upgrade bookmark-summarizer
From source
- Clone the repository:
git clone https://github.com/lrq3000/BookmarkSummarizer.git
cd BookmarkSummarizer
- Install dependencies:
pip install -e .
- Make a TOML configuration file to finetune behavior (create a
.tomlfile):
model_type=ollama # options: openai, deepseek, qwen, ollama
api_key=your_api_key_here
api_base=http://localhost:11434 # ollama local endpoint or other model api address
model_name=qwen3:1.7b # or other supported model
max_tokens=1000
temperature=0.3
Usage
Fetch Bookmarks from Browsers
Fetch bookmarks from all browsers (default):
python index.py
This fetches bookmarks from all installed browsers (Chrome, Firefox, Edge, Safari, Opera, Brave, Vivaldi, etc.) using the browser-history module and saves them to bookmarks.json.
Fetch bookmarks from a specific browser:
python index.py --browser chrome
Supported browsers: chrome, firefox, edge, opera, opera_gx, safari, vivaldi, brave.
Fetch bookmarks from a custom profile path:
python index.py --browser chrome --profile-path "C:\Users\Username\AppData\Local\Google\Chrome\User Data\Profile 1"
This is useful when you have multiple Chrome profiles or custom browser installations.
Crawl and Summarize Bookmarks
Basic usage (crawl and summarize from all browsers):
python crawl.py
This fetches bookmarks from all browsers, crawls their content, generates AI summaries, and saves the results. Use the same command to update crawled bookmarks incrementally or resume after interruptions - already processed bookmarks will be skipped.
Crawl from a specific browser:
python crawl.py --browser firefox
Fetches and crawls bookmarks only from Firefox.
Crawl from a custom profile path:
python crawl.py --browser chrome --profile-path "/home/user/.config/google-chrome/Profile 1"
Combines browser selection with custom profile path.
Limit the number of bookmarks:
python crawl.py --limit 10
Processes only the first 10 bookmarks.
Set the number of parallel processing threads:
python crawl.py --workers 10
Uses 10 worker threads for parallel crawling (default: 20).
Skip summary generation:
python crawl.py --no-summary
Crawls content but skips AI summary generation.
Generate summaries from already crawled content:
python crawl.py --from-json
Generates summaries for existing bookmarks_with_content.json without re-crawling.
Search Through Bookmarks
Once your bookmarks are crawled, a bookmarks_with_content.json file will be present in the current folder. Then you can search through it with a fuzzy search engine:
Launch the search interface with index updates:
python fuzzy_bookmark_search.py --update-index
This launches a local web server with the search engine accessible through http://localhost:8132/ (the port can be changed via --port xxx). The search engine uses Whoosh to build a fast, on-disk, fuzzy searchable index.
Launch the search interface without updating the index:
python fuzzy_bookmark_search.py
Uses the existing index without rebuilding it.
Output Files
bookmarks.json: Filtered bookmark list from browsers, it is just a compilation of all bookmarks fetched directly from the browsers.bookmark_index.lmdb: Folder of bookmark data with crawled content and AI-generated summaries stored in a LMDB.failed_urls.json: URLs that failed to crawl with reasons.crawl_errors.log: Errors log for the crawler, this logs all errors even if not related to the unreachability of bookmarks' contents (eg, this logs software logic bugs).whoosh_index/: Directory containing the Whoosh search index files for the seach engine.
📋 Detailed Features
Bookmark Crawling
BookmarkSummarizer automatically reads all bookmarks from the Chrome bookmarks file and intelligently filters out ineligible URLs. It uses two strategies to crawl web content:
- Regular Crawling: Uses the Requests library to capture content from most web pages
- Dynamic Content Crawling: For dynamic webpages (such as Zhihu and other platforms), automatically switches to Selenium
- Modular architecture with custom parsers : For specific websites or content such as YouTube, custom parsers / adapters can be implemented in
custom_parsers/as separate.pymodules that will be automatically called to filter and process every bookmarks. The custom parsers get a full copy of the bookmark's metadata and can choose to filter based on any criterion, not only the URL, but content based or title based, etc. For example, for YouTube, the transcript is downloaded to be the content for summarization.
Summary Generation
BookmarkSummarizer uses advanced large language models to generate high-quality summaries for each bookmark content, including:
- Extracting key information and important concepts
- Preserving technical terms and key data
- Generating structured summaries for easier retrieval
- Supporting various mainstream large language models
- Supportign 100% offline generation via ollama for complete privacy
Tip: if ollama is used, it is advised to set the context window to 128k and use a model that supports such a wide context window such as qwen3:4b (supports 256k context!) or qwen3:1.7b or qwen3:0.6b (40k context) for less power machines, so that summaries are done on the whole bookmark's full-text content without truncation. gemma3:1b can also be interesting (32k context) but it has hallucination issues when there is not much full-text content.
Checkpoint Recovery
- Saves progress immediately after processing each bookmark
- Automatically skips previously processed bookmarks when restarted
- Ensures data safety even when processing large numbers of bookmarks
📁 Output Files
bookmarks.json: Filtered bookmark listbookmarks_with_content.json: Bookmark data with content and summariesfailed_urls.json: Failed URLs and reasons
🔧 Custom Configuration
In addition to command-line parameters, you can set the following parameters through a .toml configuration file:
# model type settings
model_type=ollama # openai, deepseek, qwen, ollama
api_key=your_api_key_here
api_base=http://localhost:11434
model_name=gemma3:1b
# content processing settings
max_tokens=1024 # maximum number of tokens for summary generation
max_input_content_length=6000 # maximum length of input content
temperature=0.3 # randomness of summary generation
# crawler settings
bookmark_limit=0 # no limit by default
max_workers=20 # number of parallel worker threads
generate_summary=true # whether to generate summaries
🤝 Contributing
Pull Requests are welcome! For any issues or suggestions, please create an Issue.
Author
Originally created by wyj/sologuy.
Development of new features and maintenance is done since Novembre 2025 by Stephen Karl Larroque.
📄 License
This project is licensed under the Apache License 2.0.
Suggested complementary 3rd-party bookmarks tools
Here is a non-exhaustive list of complementary opensource 3rd-party extensions or tools that can complement BookmarkSummarizer:
- Search Bookmarks, History and Tabs: Fast bookmarks fuzzy search engine on URL and bookmark's title (not the full-page content). Chrome extension.
- Full text tabs forever (FTTF): Full-text search of historically visited pages. This has the advantage of causing no network overhead (no additional HTTP request is done, the pages you access are indexed on-the-fly), hence no risk of rate limiting/IP banning. Chrome extension.
- Floccus: Autosync bookmarks (and hence sessions if using InfiniTabs) between browsers (also works on mobile via native Floccus app on F-Droid or Mises or Cromite). Chrome extension.
- TidyMark: Reorganize/group bookmarks (supports cloud or offline ollama). Chrome extension.
- Wherewasi: Temporal and semantic tabs clustering into sessions using cloud Gemini AI. Chrome extension.
- LinkWarden or ArchiveBox: alternatives to BookmarkSummarizer to index/archive the full-text content pointed at by the bookmarks.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bookmark_summarizer-0.4.3.post5.tar.gz.
File metadata
- Download URL: bookmark_summarizer-0.4.3.post5.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dd510ac8e6050f7e9e86775ed17096998589c2aad15fade07bf4238de062098
|
|
| MD5 |
03952da7f7e923f390d91f66e77fda8d
|
|
| BLAKE2b-256 |
d177b73bd55cb0ed010e5159d666792003b0e3909d859f1e6925af0a5b73e132
|
File details
Details for the file bookmark_summarizer-0.4.3.post5-py3-none-any.whl.
File metadata
- Download URL: bookmark_summarizer-0.4.3.post5-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2f5dcb22721dad67265ae1ee271f11e4759d038a22a0f196ea12e33e8fdc29b
|
|
| MD5 |
6e3711553566df508a239f5e8e888bd1
|
|
| BLAKE2b-256 |
386b26105d3eb38c99129a11b8b6ba3d3eecdd896e196f19eb99ebc76fd4a4bb
|