A personal search engine.
Project description
Housaku (豊作)
Housaku is a powerful yet simple personal search engine that allows you to index and search a wide range of documents and posts.
Features
- Multi-Format Support: Index and search files in various formats, including:
- Plain Text Files.
- Markdown.
- PDF.
- EPUB.
- DOCX.
- Feed Integration: Index content from your favorite RSS and Atom feeds.
- Fast search times: Optimized search algorithm for fast results.
- Incremental indexing: Housaku skips already indexed documents, allowing you to gradually build your corpus without having to worry about redundancy.
WIP
Housaku is an ongoind project, and several major features are in the pipeline, including:
- A user-friendly Web UI.
- A TUI for command-line enthusiasts.
- Document and post update functionality for already indexed items.
- Optimizations tweaks for faster indexing times.
- A better sorting algorithm for search results (maybe the BM25 algorithm).
- A query language for advanced search capabilities.
Motivation
As someone who stores a wealth of documents on my hard drive—ranging from academic PDFs to personal notes in Obsidian—I often found it challenging to search across multiple applications and file types. I wanted a solution that would allow me to search not only my notes but also important books in my Calibre library and blog posts from my favorite feeds. This inspired me to build Housaku.
Install
Via pip
pip install housaku
Via pipx
pipx install housaku
Via uv
uv tool add housaku
# Or
uvx housaku
Usage
Configuration
To start using Housaku, the first step is to create a config.toml
file located at $XDG_CONFIG_HOME/housaku/config.toml
. This folder will also contain the SQLite database where all the indexed data will be stored.
You configuration file should look something like this:
[files]
include = [
"/home/<your-username>/Documents/",
]
exclude = [
".git",
".obsidian",
".stfolder",
".stversions",
".trash",
"*.mobi"
]
[feeds]
urls = [
"http://blog.golang.org/feeds/posts/default",
"http://www.theverge.com/rss/full.xml",
"https://adrianroselli.com/feed",
"https://chriscoyier.net/feed/",
"https://dnlzrgz.com/rss/",
"https://textual.textualize.io/feed_rss_created.xml",
]
Indexing
Once you have configured your directories and/or feeds, run the following command to start the indexing process.
housaku index
Indexing may vary from a few seconds to several minutes, depending on the size and type of files being indexed. You can begin searching as soon as some documents have been indexed, but it's advisable to allow some time for the initial indexing process. Housaku will skip documents and posts that have already been indexed, allowing you to build your corpus gradually.
Search
Currently, Housaku does not support a query language. It just matches keywords in your search terms with those saved in the database. To perform a search then, use the following command:
housaku search --query "search engine"
# You can also limit the number of results
housaku search --query "search engine" --limit 5
Contributing
Contributions are welcome! If you have suggestions for improvements or new features, feel free to open an issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file housaku-0.1.2.tar.gz
.
File metadata
- Download URL: housaku-0.1.2.tar.gz
- Upload date:
- Size: 969.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e34ed0f003bc2300fb88f7831ec2f1d55ac48ca9fe1553c2a00bc0b37df05b6 |
|
MD5 | efeb3912ac88516fb979dd78c8e13886 |
|
BLAKE2b-256 | cb859d0eee83278850cf2183d8e89a26047cd7024a6dd2c44342f8cb610f1fd2 |
File details
Details for the file housaku-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: housaku-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ca6f8b48ac409f2f39f68e0e39d19093dd674ade793bedea05ced2e753733dc |
|
MD5 | 1f8af59a60b55ee97ff5a50d889258f7 |
|
BLAKE2b-256 | 9ac7a766a32dd29355c5024b05aba6359ee875c510fa30ba6927104ca2d38161 |