Skip to main content

Bookworm - A LLM-powered bookmark search engine

Project description

bookworm 📖

main PyPI version

LLM-powered bookmark search engine

bookworm allows you to search from your local browser bookmarks using natural language. For times when you have a large collection of bookmarks and you can't quite remember where you put that one website you need at the moment.

asciicast

In the example above, we search for the term “Japan.” While some results don’t explicitly mention the word, terms like “Osaka” appear because they are closely related to the search term based on OpenAI embeddings.

Install

python -m pip install bookworm_genai

[!TIP] If you are using uvx then you can also just run this:

uvx --from bookworm_genai bookworm --help

Usage

export OPENAI_API_KEY=

# Run once and then anytime bookmarks across supported browsers changes
bookworm sync

# Sync bookmarks only from a specific browser
bookworm sync --browser-filter chrome

# Ask questions against the bookmark database
bookworm ask

# Ask questions against the bookmark database
# Specify the query when invoking the command
# If you omit this then you will be asked for a query when the tool is running
bookworm ask -q pandas

# Ask questions against the bookmark database and specify the number of results that should come back
bookworm ask -n 1

The sync process currently supports the following configurations:

Operating System Google Chrome Mozilla Firefox Brave Microsoft Edge
Linux
macOS
Windows

[!TIP] ✨ Want to contribute? See the adding an integration section.

Processes

bookworm sync

Vectorize your bookmarks across all supported browsers.

graph LR

subgraph Bookmarks
    Chrome(Chrome Bookmarks)
    Brave(Brave Bookmarks)
    Firefox(Firefox Bookmarks)
end

Bookworm(bookworm sync)

EmbeddingsService(Embeddings Service e.g OpenAIEmbeddings)

VectorStore(Vector Store e.g DuckDB)

Chrome -->|load bookmarks|Bookworm
Brave -->|load bookmarks|Bookworm
Firefox -->|load bookmarks|Bookworm

Bookworm -->|vectorize bookmarks|EmbeddingsService-->|store embeddings|VectorStore
Details

The vector database depicted above is stored locally on your machine. You can check it's location by running the following after installing this project:

from platformdirs import PlatformDirs

print(PlatformDirs('bookworm').user_data_dir)

bookworm ask

Search from your bookmarks

graph LR

query
Bookworm(bookworm ask)

subgraph _
    LLM(LLM e.g OpenAI)
    VectorStore(Vector Store e.g DuckDB)
end

query -->|user queries for information|Bookworm

Bookworm -->|similarity search|VectorStore -->|send similar docs + user query|LLM
LLM -->|send back response|Bookworm

bookworm export

Export your bookmarks across all supported browsers into an output (e.g CSV)

graph LR

VectorStore
Bookworm(bookworm export)
CSV(bookmarks.csv)

VectorStore -->|extract all bookmarks|Bookworm
Bookworm -->|export into file|CSV

Developer Setup

# LLMs
export OPENAI_API_KEY=

# Langchain (optional, but useful for debugging)
export LANGCHAIN_API_KEY=
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_PROJECT=bookworm

# Misc (optional)
export LOGGING_LEVEL=INFO

Recommendations:

poetry env use 3.9 # or path to your 3.9 installation

poetry shell
poetry install

bookworm --help
Running Linux tests on MacOS/Windows

If you are running on a non-linux machine, it may be helpful to run the provided Dockerfile to verify it's working on that environment.

You can build this via:

make docker_linux

You will need to have Docker installed to run this.

Adding an Integration

As you can see from usage, bookworm supports various integrations but not all. If you find one that you want to support one, then a change is needed inside integrations.py.

You can see in that file there is a variable called browsers that follows this structure:

browsers = {
    "BROWSER": {
        "PLATFORM": {
            ...
        }
    }
}

So say you wanted to add Chrome support in Windows then you would go under the Chrome key and then add a win32 key which has all the details. You can refer to existing examples but generally the contents of those details are where to find the bookmarks on the user's system along with how to interpret them.

You can also find a full list of the document loaders supported here.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookworm_genai-0.13.1b103.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bookworm_genai-0.13.1b103-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file bookworm_genai-0.13.1b103.tar.gz.

File metadata

  • Download URL: bookworm_genai-0.13.1b103.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.8.0-1017-azure

File hashes

Hashes for bookworm_genai-0.13.1b103.tar.gz
Algorithm Hash digest
SHA256 25f8586fe396190bc6c3ccb3ace15f250bd8e988a7a31cb5129d978bc3f48e6c
MD5 340c2e15f554f7f39261700eb6912ad7
BLAKE2b-256 1c70ffb2a106546daf9c349e4109ebb253731a73eedcb37372da96bbfbc8f560

See more details on using hashes here.

File details

Details for the file bookworm_genai-0.13.1b103-py3-none-any.whl.

File metadata

  • Download URL: bookworm_genai-0.13.1b103-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.8.0-1017-azure

File hashes

Hashes for bookworm_genai-0.13.1b103-py3-none-any.whl
Algorithm Hash digest
SHA256 b716ba1397e3cbcdc9cf0f36d9230c13bc769c7fdec2580e1fd21c0677a3198d
MD5 f15064e0b7f236a9dd3602dfdba9b770
BLAKE2b-256 d86892eb46a922ce54a18ddcde487135674b27749d50a2708920b08d4f58f853

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page