Skip to main content

Bookmark Toolkit. Helps manage, analyze, and visualize bookmarks

Project description

Bookmark Toolkit (btk)

A modern, database-first bookmark manager with powerful features for organizing, searching, and analyzing your bookmarks.

Features

  • ๐Ÿ—„๏ธ SQLite-based storage - Fast, reliable, and portable
  • ๐Ÿ“ฅ Multi-format import - HTML (Netscape), JSON, CSV, Markdown, plain text
  • ๐Ÿ“ค Multi-format export - HTML (hierarchical folders), JSON, CSV, Markdown
  • ๐Ÿ” Advanced search - Full-text search including cached content
  • ๐Ÿท๏ธ Hierarchical tags - Organize with nested tags (e.g., programming/python)
  • ๐Ÿค– Auto-tagging - NLP-powered automatic tag generation
  • ๐Ÿ“„ Content caching - Stores compressed HTML and markdown for offline access
  • ๐Ÿ“‘ PDF support - Extracts and indexes text from PDF bookmarks
  • ๐Ÿ”Œ Plugin system - Extensible architecture for custom features
  • ๐ŸŒ Browser integration - Import bookmarks and history from Chrome, Firefox, Safari
  • ๐Ÿ“Š Statistics & analytics - Track usage, duplicates, health scores
  • โšก Parallel processing - Fast bulk operations with multi-threading

Installation

pip install bookmark-tk

Quick Start

# Start the interactive shell (recommended for exploration)
btk shell

# Or use direct CLI commands
btk bookmark add https://example.com --title "Example" --tags tutorial,web
btk bookmark list
btk bookmark search "python"

# Import and export
btk import html bookmarks.html
btk export bookmarks.html html --hierarchical

# Tag management
btk tag add my-tag 42          # Add tag to bookmark #42
btk tag list                   # List all tags
btk tag tree                   # Show tag hierarchy

Interactive Shell

BTK includes a powerful interactive shell with a virtual filesystem interface:

$ btk shell

btk:/$ ls
bookmarks  tags  starred  archived  recent  domains

btk:/$ cd tags
btk:/tags$ ls
programming/  research/  tutorial/  web/

btk:/tags$ cd programming/python
btk:/tags/programming/python$ ls
3298  4095  5124  5789  (bookmark IDs with this tag)

btk:/tags/programming/python$ cat 4095/title
Advanced Python Techniques

btk:/tags/programming/python$ star 4095
โ˜… Starred bookmark #4095

btk:/tags/programming/python$ recent
# Shows recently visited bookmarks in this context

btk:/tags/programming/python$ cd /bookmarks/4095
btk:/bookmarks/4095$ pwd
/bookmarks/4095

btk:/bookmarks/4095$ tag data-science machine-learning
โœ“ Added tags to bookmark #4095

Shell Features

  • Virtual filesystem - Navigate bookmarks like files and directories
  • Hierarchical tags - Tags like programming/python/django create navigable folders
  • Context-aware commands - Commands adapt based on your current location
  • Unix-like interface - Familiar cd, ls, pwd, mv, cp commands
  • Tab completion - (planned) Auto-complete for commands and paths
  • Tag operations - Rename tags with mv old-tag new-tag
  • Bulk operations - Copy tags to multiple bookmarks with cp

Database Management

BTK uses a single SQLite database file (default: btk.db) instead of directory-based storage:

# Use default database (btk.db in current directory)
btk list

# Specify a different database
btk --db ~/bookmarks.db list

# Set default database in config
btk config set database.path ~/bookmarks.db

# Database operations
btk db info              # Show database statistics
btk db vacuum            # Optimize database
btk db export backup.db  # Export to new database

CLI Commands

BTK organizes commands into logical groups. Use btk <group> <command> syntax:

Bookmark Operations

# Add bookmarks
btk bookmark add https://example.com --title "Example" --tags tutorial,reference
btk bookmark add https://paper.pdf --tags research,ml  # Auto-extracts PDF text

# List and search
btk bookmark list                       # List all bookmarks
btk bookmark list --limit 10            # List first 10
btk bookmark search "machine learning"  # Search bookmarks
btk bookmark search "python" --in-content  # Search cached content

# Get bookmark details
btk bookmark get 42                     # Simple view
btk bookmark get 42 --details           # Full details
btk bookmark get 42 --format json       # JSON output

# Update bookmarks
btk bookmark update 42 --title "New Title" --tags python,tutorial --stars
btk bookmark update 42 --add-tags advanced --remove-tags beginner

# Delete bookmarks
btk bookmark delete 42
btk bookmark delete --filter-tags old/  # Delete by tag prefix

# Query with JMESPath
btk bookmark query "[?stars == \`true\`].title"  # Starred bookmarks
btk bookmark query "[?visit_count > \`5\`]"      # Frequently visited

Tag Management

# List tags
btk tag list                            # All tags
btk tag tree                            # Hierarchical tree view
btk tag stats                           # Usage statistics

# Tag operations
btk tag add my-tag 42 43 44             # Add tag to bookmarks
btk tag remove old-tag 42               # Remove tag from bookmark
btk tag rename old-tag new-tag          # Rename tag everywhere
btk tag copy source-tag 42              # Copy tag to bookmark
btk tag filter programming/python       # Filter by tag prefix

Import & Export

# Import from various formats
btk import html bookmarks.html          # Netscape HTML format
btk import json bookmarks.json          # JSON format
btk import csv bookmarks.csv            # CSV format
btk import markdown notes.md            # Extract links from markdown
btk import text urls.txt                # Plain text URLs

# Import browser bookmarks
btk import chrome                       # Import from Chrome
btk import firefox --profile default    # Import from Firefox profile

# Export to various formats
btk export output.html html --hierarchical  # HTML with folder structure
btk export output.json json                 # JSON format
btk export output.csv csv                   # CSV format
btk export output.md markdown               # Markdown with sections

Content Operations

# Refresh cached content
btk content refresh --id 42             # Refresh specific bookmark
btk content refresh --all               # Refresh all bookmarks
btk content refresh --all --workers 50  # Use 50 parallel workers

# View cached content
btk content view 42                     # View markdown in terminal
btk content view 42 --html              # Open HTML in browser

# Auto-tag using content
btk content auto-tag --id 42            # Preview suggested tags
btk content auto-tag --id 42 --apply    # Apply suggested tags
btk content auto-tag --all --workers 100  # Tag all bookmarks

Database Operations

# Database info
btk db info                             # Show statistics
btk db stats                            # Detailed stats
btk db vacuum                           # Optimize database

# Deduplication
btk db dedupe --strategy merge          # Merge duplicate metadata
btk db dedupe --strategy keep_first     # Keep oldest bookmark
btk db dedupe --preview                 # Preview changes

Configuration

btk config show                         # Show current config
btk config set database.path ~/bookmarks.db
btk config set output.format json

Shell

btk shell                               # Start interactive shell
btk shell --db ~/bookmarks.db           # Use specific database

Configuration

BTK supports configuration files for persistent settings:

# Show configuration
btk config show

# Set configuration values
btk config set database.path ~/bookmarks.db
btk config set output.format json
btk config set import.fetch_titles true

# Configuration file location: ~/.config/btk/config.toml

Advanced Features

PDF Support

BTK automatically extracts text from PDF bookmarks for search and auto-tagging:

btk add https://arxiv.org/pdf/2301.00001.pdf --tags research,ml
btk search "neural network" --in-content  # Searches PDF text
btk view 42                                # View extracted PDF text

Hierarchical Tags & Export

Organize bookmarks with hierarchical tags and export to browser-compatible HTML:

# Add bookmarks with hierarchical tags
btk add https://docs.python.org --tags programming/python/docs
btk add https://flask.palletsprojects.com --tags programming/python/web

# Export with folder structure
btk export bookmarks.html html --hierarchical

# Result: Nested folders in browser
# ๐Ÿ“ programming
#   ๐Ÿ“ python
#     ๐Ÿ“ docs
#       ๐Ÿ”– Python Documentation
#     ๐Ÿ“ web
#       ๐Ÿ”– Flask Documentation

Content Caching

BTK caches webpage content for offline access and full-text search:

  • Fetches HTML and converts to markdown
  • Compresses with zlib (70-80% compression ratio)
  • Extracts text from PDFs
  • Enables content-based search and auto-tagging
# Content is cached automatically when adding bookmarks
btk add https://example.com

# Manually refresh content
btk refresh --all --workers 50

# Search within cached content
btk search "specific phrase" --in-content

Plugin System

BTK has an extensible plugin architecture:

from btk.plugins import Plugin, PluginMetadata, PluginPriority

class MyPlugin(Plugin):
    def get_metadata(self) -> PluginMetadata:
        return PluginMetadata(
            name="my-plugin",
            version="1.0.0",
            description="Custom functionality",
            priority=PluginPriority.NORMAL
        )

    def on_bookmark_added(self, bookmark):
        # Custom logic when bookmark is added
        pass

Architecture

Modern Stack

  • Database: SQLAlchemy ORM with SQLite backend
  • Models: Bookmark, Tag, ContentCache, BookmarkHealth, Collection
  • CLI: Grouped argparse structure with Rich for beautiful terminal output
  • Shell: Interactive REPL with virtual filesystem and context-aware commands
  • Testing: pytest with 515 tests, >80% coverage on core modules
  • Content: HTML/Markdown conversion, zlib compression, PDF extraction

Database Schema

bookmarks
โ”œโ”€โ”€ id (primary key)
โ”œโ”€โ”€ unique_id (hash)
โ”œโ”€โ”€ url
โ”œโ”€โ”€ title
โ”œโ”€โ”€ description
โ”œโ”€โ”€ added (timestamp)
โ”œโ”€โ”€ stars (boolean)
โ”œโ”€โ”€ visit_count
โ”œโ”€โ”€ last_visited
โ””โ”€โ”€ reachable (boolean)

tags
โ”œโ”€โ”€ id
โ”œโ”€โ”€ name (unique)
โ”œโ”€โ”€ description
โ””โ”€โ”€ color

bookmark_tags (many-to-many)
โ”œโ”€โ”€ bookmark_id
โ””โ”€โ”€ tag_id

content_cache
โ”œโ”€โ”€ id
โ”œโ”€โ”€ bookmark_id (foreign key)
โ”œโ”€โ”€ html_content (compressed)
โ”œโ”€โ”€ markdown_content
โ”œโ”€โ”€ content_hash
โ”œโ”€โ”€ fetched_at
โ””โ”€โ”€ status_code

Code Organization

btk/
โ”œโ”€โ”€ cli.py              # Grouped command-line interface
โ”œโ”€โ”€ shell.py            # Interactive shell with virtual filesystem
โ”œโ”€โ”€ db.py               # Database operations
โ”œโ”€โ”€ models.py           # SQLAlchemy models
โ”œโ”€โ”€ graph.py            # Bookmark relationship graphs
โ”œโ”€โ”€ importers.py        # Import from various formats
โ”œโ”€โ”€ exporters.py        # Export to various formats
โ”œโ”€โ”€ content_fetcher.py  # Web content fetching
โ”œโ”€โ”€ content_cache.py    # Content cache management
โ”œโ”€โ”€ content_extractor.py # Content extraction & parsing
โ”œโ”€โ”€ auto_tag.py         # Auto-tagging with NLP/TF-IDF
โ”œโ”€โ”€ plugins.py          # Plugin system
โ”œโ”€โ”€ tag_utils.py        # Tag operations & hierarchies
โ”œโ”€โ”€ dedup.py            # Deduplication strategies
โ”œโ”€โ”€ archiver.py         # Web archive integration
โ””โ”€โ”€ browser_import.py   # Browser bookmark import

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=btk --cov-report=term-missing

# Run specific test file
pytest tests/test_db.py -v

Test Coverage

  • Overall: 515 tests, all passing โœ…
  • Core modules: >80% coverage
    • graph.py: 97.28%
    • models.py: 96.62%
    • tag_utils.py: 95.67%
    • content_extractor.py: 93.63%
    • exporters.py: 92.45%
    • plugins.py: 90.07%
    • dedup.py: 88.24%
    • utils.py: 88.57%
    • db.py: 86.91%
  • Interface modules:
    • shell.py: 53.12% (69 tests)
    • cli.py: 23.11% (41 tests)
    • Expected lower coverage for interactive/CLI code

Roadmap

Recently Completed โœ…

  • Smart Collections & Time-Based Recent (v0.7.1)
    • 5 auto-updating smart collections (/unread, /popular, /broken, /untagged, /pdfs)
    • Time-based navigation with 6 periods ร— 3 activity types
    • Enhanced /recent with hierarchical structure
    • Collection counts in ls output
  • Interactive Shell with Virtual Filesystem (v0.7.0)
    • Unix-like navigation (cd, ls, pwd)
    • Hierarchical tag browsing
    • Context-aware commands
    • Tag operations (mv, cp)
  • Grouped CLI Structure - Organized commands by functionality
  • Comprehensive Test Suite - 515 tests with >50% shell coverage
  • SQLAlchemy-based database architecture
  • Content caching with compression
  • PDF text extraction
  • Auto-tagging with NLP
  • Hierarchical tag export
  • Parallel processing for bulk operations
  • Browser bookmark import
  • Plugin system

In Progress ๐Ÿšง

  • Enhanced search capabilities
  • Reading list management
  • Link rot detection with Wayback Machine

Planned Features ๐ŸŽฏ

  • Enhanced Domain Organization - Improved domain-based browsing and filtering
  • Bookmark Notes/Annotations - Rich text notes and annotations on bookmarks
  • User-Defined Collections - Custom smart collections via configuration
  • Browser extensions (Chrome, Firefox)
  • MCP integration for AI-powered queries
  • Static site generator for bookmark collections
  • Similarity detection and recommendations
  • Full-text search with ranking
  • Bookmark relationship graphs
  • Social features (shared collections)

Migration from Legacy JSON Format

If you're upgrading from an older JSON-based version of BTK:

  1. The new version uses SQLite databases instead of JSON files
  2. Use btk import json old-bookmarks.json to migrate your data
  3. Legacy commands and directory-based storage are no longer supported
  4. All functionality is now database-first with improved performance

Contributing

Contributions are welcome! Areas for contribution:

  • Adding new importers/exporters
  • Creating plugins for custom functionality
  • Improving test coverage
  • Documentation improvements
  • Performance optimizations

See the plugin system for the easiest way to extend BTK without modifying core code.

License

MIT License - see LICENSE file for details.

Author

Developed by Alex Towell

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookmark_tk-0.8.0.tar.gz (325.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bookmark_tk-0.8.0-py3-none-any.whl (239.0 kB view details)

Uploaded Python 3

File details

Details for the file bookmark_tk-0.8.0.tar.gz.

File metadata

  • Download URL: bookmark_tk-0.8.0.tar.gz
  • Upload date:
  • Size: 325.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bookmark_tk-0.8.0.tar.gz
Algorithm Hash digest
SHA256 26b804a162ea80554d9b617026a95e76f463253ee0a832d564bf8cc7e7f926f1
MD5 6d2b6b56a8832e065252d0445fc93214
BLAKE2b-256 c0bdef73c520cb32a30ec7f01bf2cfdde3f7783cf32138919255280f6d5116d9

See more details on using hashes here.

File details

Details for the file bookmark_tk-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: bookmark_tk-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 239.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bookmark_tk-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2696ffd076b9856fe3e857ff86a961a963bafc41618afb40d5332f9ccd40388d
MD5 876f1f5f5ba3023cc7ac60e9b5a879b6
BLAKE2b-256 2affa176511889941d80d473874fd042352ad9c8cd9cd3f61deb3f1dcf5d3f8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page