Skip to main content

Bookmark Toolkit. Helps manage, analyze, and visualize bookmarks

Project description

Bookmark Toolkit (btk)

A modern, database-first bookmark manager with powerful features for organizing, searching, and analyzing your bookmarks.

Features

  • ๐Ÿ—„๏ธ SQLite-based storage - Fast, reliable, and portable
  • ๐Ÿ“ฅ Multi-format import - HTML (Netscape), JSON, CSV, Markdown, plain text
  • ๐Ÿ“ค Multi-format export - HTML (hierarchical folders), JSON, CSV, Markdown
  • ๐Ÿ” Advanced search - Full-text search including cached content
  • ๐Ÿท๏ธ Hierarchical tags - Organize with nested tags (e.g., programming/python)
  • ๐Ÿค– Auto-tagging - NLP-powered automatic tag generation
  • ๐Ÿ“„ Content caching - Stores compressed HTML and markdown for offline access
  • ๐Ÿ“‘ PDF support - Extracts and indexes text from PDF bookmarks
  • ๐Ÿ”Œ Plugin system - Extensible architecture for custom features
  • ๐ŸŒ Browser integration - Import bookmarks and history from Chrome, Firefox, Safari
  • ๐Ÿ“Š Statistics & analytics - Track usage, duplicates, health scores
  • โšก Parallel processing - Fast bulk operations with multi-threading

Installation

pip install bookmark-tk

Quick Start

# Initialize default database
btk init

# Import bookmarks from HTML
btk import html bookmarks.html

# Search bookmarks
btk search "python"

# Add a bookmark
btk add https://example.com --title "Example Site" --tags tutorial,web

# List all bookmarks
btk list

# Export to various formats
btk export bookmarks.html html --hierarchical
btk export bookmarks.json json
btk export bookmarks.csv csv

Database Management

BTK uses a single SQLite database file (default: btk.db) instead of directory-based storage:

# Use default database (btk.db in current directory)
btk list

# Specify a different database
btk --db ~/bookmarks.db list

# Set default database in config
btk config set database.path ~/bookmarks.db

# Database operations
btk db info              # Show database statistics
btk db vacuum            # Optimize database
btk db export backup.db  # Export to new database

Core Commands

Import & Export

# Import from various formats
btk import html bookmarks.html          # Netscape HTML format
btk import json bookmarks.json          # JSON format
btk import csv bookmarks.csv            # CSV format
btk import markdown notes.md            # Extract links from markdown
btk import text urls.txt                # Plain text URLs

# Auto-detect format
btk import bookmarks.html               # Automatically detects format

# Import browser bookmarks
btk import chrome                       # Import from Chrome
btk import firefox --profile default    # Import from specific Firefox profile

# Export to various formats
btk export output.html html             # HTML with hierarchical folders
btk export output.json json             # JSON format
btk export output.csv csv               # CSV format
btk export output.md markdown           # Markdown with tag sections

Search & Query

# Basic search (title, URL, description)
btk search "machine learning"

# Search in cached content
btk search "neural networks" --in-content

# Advanced queries with JMESPath
btk query "[?stars == \`true\`].title"                    # Starred bookmarks
btk query "[?visit_count > \`5\`].{title: title, url: url}" # Frequently visited

# Filter by tags
btk tags filter programming/python      # Bookmarks with tag prefix

# List tags with statistics
btk tags list
btk tags tree                           # Show tag hierarchy
btk tags stats                          # Tag usage statistics

Bookmark Management

# Add bookmarks
btk add https://example.com --title "Example" --tags tutorial,reference
btk add https://paper.pdf --tags research,ml  # Automatically extracts PDF text

# Get bookmark details
btk get 42                              # Simple view
btk get 42 --details                    # Full details with metadata
btk get 42 --format json                # JSON output

# Update bookmarks
btk update 42 --title "New Title" --tags python,tutorial --stars true
btk update 42 --add-tags advanced --remove-tags beginner

# Delete bookmarks
btk delete 42
btk delete --tag-prefix old/            # Delete by tag prefix

Content & Caching

# Refresh cached content
btk refresh --id 42                     # Refresh specific bookmark
btk refresh --all                       # Refresh all bookmarks
btk refresh --all --workers 50          # Use 50 parallel workers

# View cached content
btk view 42                             # View markdown in terminal
btk view 42 --html                      # Open HTML in browser

# Auto-tag using cached content
btk auto-tag --id 42                    # Preview tags for bookmark
btk auto-tag --id 42 --apply            # Apply suggested tags
btk auto-tag --all --workers 100 --apply # Tag all bookmarks in parallel

Organization

# Tag operations
btk tags rename "old-tag" "new-tag"
btk tags merge tag1 tag2 tag3 --into merged-tag

# Deduplication
btk dedupe --strategy merge             # Merge duplicate metadata
btk dedupe --strategy keep_first        # Keep oldest bookmark
btk dedupe --strategy keep_most_visited # Keep most visited

# Statistics
btk stats                               # Database statistics
btk stats --tags                        # Tag statistics

Configuration

BTK supports configuration files for persistent settings:

# Show configuration
btk config show

# Set configuration values
btk config set database.path ~/bookmarks.db
btk config set output.format json
btk config set import.fetch_titles true

# Configuration file location: ~/.config/btk/config.toml

Advanced Features

PDF Support

BTK automatically extracts text from PDF bookmarks for search and auto-tagging:

btk add https://arxiv.org/pdf/2301.00001.pdf --tags research,ml
btk search "neural network" --in-content  # Searches PDF text
btk view 42                                # View extracted PDF text

Hierarchical Tags & Export

Organize bookmarks with hierarchical tags and export to browser-compatible HTML:

# Add bookmarks with hierarchical tags
btk add https://docs.python.org --tags programming/python/docs
btk add https://flask.palletsprojects.com --tags programming/python/web

# Export with folder structure
btk export bookmarks.html html --hierarchical

# Result: Nested folders in browser
# ๐Ÿ“ programming
#   ๐Ÿ“ python
#     ๐Ÿ“ docs
#       ๐Ÿ”– Python Documentation
#     ๐Ÿ“ web
#       ๐Ÿ”– Flask Documentation

Content Caching

BTK caches webpage content for offline access and full-text search:

  • Fetches HTML and converts to markdown
  • Compresses with zlib (70-80% compression ratio)
  • Extracts text from PDFs
  • Enables content-based search and auto-tagging
# Content is cached automatically when adding bookmarks
btk add https://example.com

# Manually refresh content
btk refresh --all --workers 50

# Search within cached content
btk search "specific phrase" --in-content

Plugin System

BTK has an extensible plugin architecture:

from btk.plugins import Plugin, PluginMetadata, PluginPriority

class MyPlugin(Plugin):
    def get_metadata(self) -> PluginMetadata:
        return PluginMetadata(
            name="my-plugin",
            version="1.0.0",
            description="Custom functionality",
            priority=PluginPriority.NORMAL
        )

    def on_bookmark_added(self, bookmark):
        # Custom logic when bookmark is added
        pass

Architecture

Modern Stack

  • Database: SQLAlchemy ORM with SQLite backend
  • Models: Bookmark, Tag, ContentCache, BookmarkHealth
  • CLI: argparse with Rich for beautiful terminal output
  • Testing: pytest with 59% code coverage (317 tests)
  • Content: HTML/Markdown conversion, zlib compression, PDF extraction

Database Schema

bookmarks
โ”œโ”€โ”€ id (primary key)
โ”œโ”€โ”€ unique_id (hash)
โ”œโ”€โ”€ url
โ”œโ”€โ”€ title
โ”œโ”€โ”€ description
โ”œโ”€โ”€ added (timestamp)
โ”œโ”€โ”€ stars (boolean)
โ”œโ”€โ”€ visit_count
โ”œโ”€โ”€ last_visited
โ””โ”€โ”€ reachable (boolean)

tags
โ”œโ”€โ”€ id
โ”œโ”€โ”€ name (unique)
โ”œโ”€โ”€ description
โ””โ”€โ”€ color

bookmark_tags (many-to-many)
โ”œโ”€โ”€ bookmark_id
โ””โ”€โ”€ tag_id

content_cache
โ”œโ”€โ”€ id
โ”œโ”€โ”€ bookmark_id (foreign key)
โ”œโ”€โ”€ html_content (compressed)
โ”œโ”€โ”€ markdown_content
โ”œโ”€โ”€ content_hash
โ”œโ”€โ”€ fetched_at
โ””โ”€โ”€ status_code

Code Organization

btk/
โ”œโ”€โ”€ cli.py              # Command-line interface
โ”œโ”€โ”€ db.py               # Database operations
โ”œโ”€โ”€ models.py           # SQLAlchemy models
โ”œโ”€โ”€ importers.py        # Import from various formats
โ”œโ”€โ”€ exporters.py        # Export to various formats
โ”œโ”€โ”€ content_fetcher.py  # Web content fetching & caching
โ”œโ”€โ”€ content_cache.py    # Content cache management
โ”œโ”€โ”€ plugins.py          # Plugin system
โ”œโ”€โ”€ tag_utils.py        # Tag operations
โ”œโ”€โ”€ dedup.py            # Deduplication
โ”œโ”€โ”€ archiver.py         # Web archive integration
โ””โ”€โ”€ browser_import.py   # Browser bookmark import

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=btk --cov-report=term-missing

# Run specific test file
pytest tests/test_db.py -v

Test Coverage

  • Overall: 59.40% (317 tests, all passing)
  • Core modules: >80% coverage
    • db.py: 90.77%
    • exporters.py: 92.45%
    • importers.py: 82.35%
    • utils.py: 88.57%
    • tag_utils.py: 95.67%
    • dedup.py: 88.24%
    • plugins.py: 90.07%

Roadmap

Recently Completed โœ…

  • SQLAlchemy-based database architecture
  • Content caching with compression
  • PDF text extraction
  • Auto-tagging with NLP
  • Hierarchical tag export
  • Parallel processing for bulk operations
  • Browser bookmark import
  • Plugin system

In Progress ๐Ÿšง

  • Enhanced search capabilities
  • Reading list management
  • Link rot detection with Wayback Machine

Planned Features ๐ŸŽฏ

  • Browser extensions (Chrome, Firefox)
  • MCP integration for AI-powered queries
  • Static site generator for bookmark collections
  • Similarity detection and recommendations
  • Full-text search with ranking
  • Bookmark relationship graphs
  • Social features (shared collections)

Migration from Legacy JSON Format

If you're upgrading from an older JSON-based version of BTK:

  1. The new version uses SQLite databases instead of JSON files
  2. Use btk import json old-bookmarks.json to migrate your data
  3. Legacy commands and directory-based storage are no longer supported
  4. All functionality is now database-first with improved performance

Contributing

Contributions are welcome! Areas for contribution:

  • Adding new importers/exporters
  • Creating plugins for custom functionality
  • Improving test coverage
  • Documentation improvements
  • Performance optimizations

See the plugin system for the easiest way to extend BTK without modifying core code.

License

MIT License - see LICENSE file for details.

Author

Developed by Alex Towell

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookmark_tk-0.6.0.tar.gz (101.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bookmark_tk-0.6.0-py3-none-any.whl (79.4 kB view details)

Uploaded Python 3

File details

Details for the file bookmark_tk-0.6.0.tar.gz.

File metadata

  • Download URL: bookmark_tk-0.6.0.tar.gz
  • Upload date:
  • Size: 101.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for bookmark_tk-0.6.0.tar.gz
Algorithm Hash digest
SHA256 872b509f65c7b42b9dae5c4cae30611884932352cb1b272977fce4bffcf7f2a7
MD5 a46b1b6c6d448f0b5a4eda7aba09b72f
BLAKE2b-256 b87ddf28aad4a1d5bc03098e400f6a20be8ef3bc28e9b1bbc240071ea693c34e

See more details on using hashes here.

File details

Details for the file bookmark_tk-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: bookmark_tk-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 79.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for bookmark_tk-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ccc8f266b872e2b270bc3ad792e05764c0e7f7f082feecc2139a732c59a3bce8
MD5 a445becf0aedf5d3051c4e13bbe2a269
BLAKE2b-256 96f1b91e2fb40a335e6db9d1ca96b0e2b8fea8ae923668c89a484e8ad395cbd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page