Skip to main content

A powerful personal search engine built on top of SQLite's FTS5.

Project description

Housaku (豊作)

Housaku is a powerful yet simple personal search engine built on top of SQLite's FTS5.

Features

  • Support for multiple file formats: Index files in a variety of formats, including:
    • Plain text tiles.
    • Markdown.
    • PDF.
    • EPUB.
    • DOCX.
  • Basic Web Scraping: In addition to personal files, you can also index posts from your favorite RSS/Atom feeds.
  • Parallel File Processing: Housaku utilizes multi-threading to process files simultaneously, making the indexing process incredibly fast.
  • Powered by SQLite's FTS5: Built on the advanced full-text search capabilities of SQLite's FTS5 extension.
  • Relevant Results with BM25: Search results are sorted using the BM25 algorithm, ensuring the most relevant results.

WIP

Housaku is an ongoind project, and several major features are in the pipeline, including:

  • A user-friendly Web interface.
  • A TUI for command-line enthusiasts.

Motivation

As someone who stores a wealth of documents on my hard drive—ranging from academic PDFs to personal notes in Obsidian—I often found it challenging to search across multiple applications and file types. I wanted a solution that would allow me to search not only my notes but also important books in my Calibre library and blog posts from my favorite feeds. This inspired me to build Housaku.

Install

Via pip

pip install housaku

Via pipx

pipx install housaku

Via uv

uv tool add housaku

# Or

uvx housaku

Usage

Configuration

To start using Housaku, the first step is to edit the config.toml file located at $XDG_CONFIG_HOME/housaku/config.toml. This file is generated the first time you run housaku and will look something like this:

# Welcome! This is the configuration file for housaku.

[files]
# Directories to include for indexing.
# Example: include = ["/home/<user>/documents/notes"]
include = []

# Patterns to exclude from the indexing
# Example: exclude = ["*.tmp", "backup", "*.png"]
exclude = []

[feeds]
# List of RSS/Atom feeds to index
# Example: urls = ["https://example.com/feed", "https://anotherexample.com/rss"]
urls = []

Notes: This folder will also contain the SQLite database where all the indexed data will be stored.

To open your config.toml file, you can just run the following command:

housaku config

Indexing

Once you have configured your directories and/or feeds, run the following command to start the indexing process.

housaku index

If you want to specify directories for indexing when running the index command, use the -i option. For example:

housaku index -i "/home/<user>/Documents/notes" -i "/home/<user>/Documents/vault/"

Search

To perform a search, you just need to use the following command:

housaku search --query "search engine"

# By default the limit is 20
housaku search --query "search engine" --limit 5

Contributing

Contributions are welcome! If you have suggestions for improvements or new features, feel free to open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

housaku-0.3.1.tar.gz (959.3 kB view details)

Uploaded Source

Built Distribution

housaku-0.3.1-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file housaku-0.3.1.tar.gz.

File metadata

  • Download URL: housaku-0.3.1.tar.gz
  • Upload date:
  • Size: 959.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for housaku-0.3.1.tar.gz
Algorithm Hash digest
SHA256 57a9039f5a82ed7c194022a7104184ff073a0af739be82462ed162132de893e4
MD5 7843c57b7f730af5e79694ff8b8aa85f
BLAKE2b-256 69285da335e8f48798dd47e531c05a7bb31f97113660cb9294f1f638905d88ef

See more details on using hashes here.

File details

Details for the file housaku-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: housaku-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for housaku-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 65c824f8557ebc3d78a665c0b82d7f8a071c6bc24d906c605f9521e460ee186f
MD5 261b9b051eadb1115a91784d93c8a7a0
BLAKE2b-256 9003b90acbc34686797b0bc94b6c8e62469d0ce4df53dc05cbd5639b05069030

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page