Full-text search with zero thinking.

Project description

Roughsearch

roughsearch-logo

A full-text search engine that tries to require as little thinking as possible.

Roughsearch is a lightweight full-text search engine built on DuckDB and BM25. It targets Japanese and English only. You can use it from the CLI or call it directly from Python.

So when you want to search a large pile of text files in a decent way, what do you do? Start by writing a Dockerfile? Spin up an Elasticsearch container? Install a morphological analysis plugin? Fire a huge number of API requests at it? Wait forever for indexing? No. Your life will end first.

Roughsearch gives up on flexibility completely. Fine-grained settings, grand scoring systems, and everything else are gone. This software has exactly one purpose: "search roughly." Run $ pip install roughsearch, and the environment is ready. Point a command at the target directory, and the indexing is done. That is all.

Architecture

Roughsearch indexes loaded documents with the following pipeline:

It reads the text and runs morphological analysis with Sudachi.
It extracts major terms from the result, mainly nouns, verbs, and adjectives.
It indexes the normalized form of each extracted term and a romaji version of that term.

At search time, the original terms score higher and the transliterated alphabet forms score lower, producing weighted best-match results. All of this data is stored in a single .duckdb file, which makes the index highly portable.

Installation

$ pip install roughsearch

This software requires Python 3.11 or later.

Usage

Embedded Use

When you want to add a simple full-text search engine to your own system.

import roughsearch

with roughsearch.Client("docs.duckdb", language="ja") as rs:
    rs.add("doc-001", title="いろはにほへと", body="あのイーハトーヴォのすきとおった風")
    rs.add("doc-002", title="ちりぬるを", body="夏でも底に冷たさをもつ青いそら")
    rs.reindex()

    results = rs.search("風")
    for hit in results.hits:
        print(hit.score, hit.title, hit.snippet)

CLI Server

When you just want to get it running.
The server exposes a REST API that any frontend can use for search.

$ roughsearch init docs.duckdb --language ja
$ roughsearch add docs.duckdb ./docs
$ roughsearch serve docs.duckdb --port 8080

add, serve, search, and dump normally use the default language saved by init. If needed, you can temporarily override it with --language.

HTTP Client

When you want to connect to a running Roughsearch server and query it.

import roughsearch

rs = roughsearch.HttpClient("http://localhost:8080")
results = rs.search("空")

CLI Reference

Commands

Command	Description
`init <db_path>`	Initialize and create a new database
`add <db_path> <path>`	Add documents from a directory and rebuild the index
`serve <db_path>`	Start the REST API server
`search <db_path> <query>`	Search from the command line
`reindex <db_path>`	Rebuild the FTS index, for example after adding documents
`reanalyze <db_path>`	Reanalyze stored documents with the current analyzer and rebuild the index, for example after a software update
`dump <db_path>`	Print stored documents as JSON to stdout
`stats <db_path>`	Show the document count
`inspect [text]`	Analyzer debugging command that prints tokenization results as JSON

Options

init

Option	Default	Description
`--language`	`ja`	Database analyzer language (`en` or `ja`)

add

Option	Default	Description
`--glob`	`None`	Glob pattern for target files such as `*.md`
`--language`	`None`	Temporarily override the language for added documents

serve

Option	Default	Description
`--language`	`None`	Temporarily override the default language used by the server
`--host`	`127.0.0.1`	Bind address
`--port`	`8080`	Port number

search

Option	Default	Description
`--language`	`None`	Language filter for the search
`--limit`	`20`	Maximum number of results

dump

Option	Default	Description
`--language`	`None`	Filter by language
`--limit`	`20`	Maximum number of output rows

inspect

Option	Default	Description
`--language`	`ja`	Analyzer language
`--title`	`""`	Text to analyze on the title side
`--file`	`None`	Read the body from a file. If set, it takes precedence over the positional `text` argument

Examples

Index and Search a Local Document Directory

$ pip install roughsearch

$ roughsearch init notes.duckdb --language ja
$ roughsearch add notes.duckdb ./notes --glob "*.md"
$ roughsearch search notes.duckdb "ニンジャ"

Embedded Python Use with Metadata and Filters

import roughsearch

with roughsearch.Client("notes.duckdb", language="ja") as rs:
    rs.add(
        "note-001",
        title="いろはにほへと",
        body="あのイーハトーヴォのすきとおった風",
        metadata={"tags": ["note", "japanese"], "source": "handbook"},
        source_uri="handbook/note-001.md",
    )
    rs.reindex()

    from roughsearch.search.query import SearchQuery, SearchFilters
    results = rs.search(
        SearchQuery(
            query="風",
            filters=SearchFilters(tags=["note"]),
            highlight=True,
            limit=10,
        )
    )

Start the API Server and Search with curl

$ roughsearch serve docs.duckdb --port 8080 &

$ curl -s -X POST http://localhost:8080/documents \
  -H "Content-Type: application/json" \
  -d '{"id":"1","title":"いろはにほへと","body":"あのイーハトーヴォのすきとおった風"}'

$ curl -s -X POST http://localhost:8080/reindex

$ curl -s -X POST http://localhost:8080/search \
  -H "Content-Type: application/json" \
  -d '{"query":"風","limit":5}' | python -m json.tool

Bulk Add

import roughsearch

docs = [
    {"id": "1", "title": "いろはにほへと", "body": "あのイーハトーヴォのすきとおった風"},
    {"id": "2", "title": "ちりぬるを",  "body": "夏でも底に冷たさをもつ青いそら"},
]

with roughsearch.Client("bulk.duckdb") as rs:
    rs.add_documents(docs)
    rs.reindex()
    print(rs.search("風").total)

Output Format

{
  "query": "風",
  "total": 1,
  "hits": [
    {
      "id": "doc-001",
      "score": 8.512,
      "title": "いろはにほへと",
      "snippet": "あのイーハトーヴォのすきとおった<mark>風</mark>",
      "body": "あのイーハトーヴォのすきとおった風",
      "language": "ja",
      "source_uri": null,
      "heading_path": null,
      "parent_id": null,
      "chunk_id": null,
      "metadata": {}
    }
  ]
}

REST API Endpoints

Method	Path	Description
`GET`	`/health`	Health check
`GET`	`/stats`	Document counts by language
`POST`	`/documents`	Add one document
`POST`	`/documents/bulk`	Add multiple documents
`GET`	`/documents/{id}`	Fetch a document by ID
`DELETE`	`/documents/{id}`	Soft-delete a document
`POST`	`/search`	Full-text search
`POST`	`/reindex`	Rebuild the FTS index
`POST`	`/optimize`	Run a DB checkpoint and compaction

Notes

Reindexing is required after writes. Documents added with add() are stored immediately, but they will not appear in search results until you call reindex(). This keeps bulk imports fast.
Assume a single writer. DuckDB does not support concurrent writes. Run one server process and only one write operation at a time.
It listens on localhost by default. If you need external access, place it behind a reverse proxy such as nginx.

License

MIT. See LICENSE for details.

Project details

Release history Release notifications | RSS feed

0.1.3

Apr 30, 2026

0.1.2

Apr 29, 2026

0.1.1

Apr 29, 2026

This version

0.1.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roughsearch-0.1.0.tar.gz (65.7 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

roughsearch-0.1.0-py3-none-any.whl (23.7 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file roughsearch-0.1.0.tar.gz.

File metadata

Download URL: roughsearch-0.1.0.tar.gz
Upload date: Apr 29, 2026
Size: 65.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for roughsearch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`235f6dbe3a06873ef6688115c8d433d270c5f449a5bbef2ae4f9e710f97a49e5`
MD5	`c3696a06eb5396db547122f8037b7a2c`
BLAKE2b-256	`bf38805d653f76211e8887b1df59b1fdbc9c11f58a839aabaac962bd0895c2bc`

See more details on using hashes here.

File details

Details for the file roughsearch-0.1.0-py3-none-any.whl.

File metadata

Download URL: roughsearch-0.1.0-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 23.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for roughsearch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8147f0bac87af37dbe775c6586245667e526e8208b7cac24cb0707ecef79d3a5`
MD5	`2a6d061c783990043226970f478a1e0b`
BLAKE2b-256	`d42ad10b65abf067868137cc89584f85ceffba79b1c3ff00195f771110cbb078`

See more details on using hashes here.

roughsearch 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Roughsearch

Architecture

Installation

Usage

Embedded Use

CLI Server

HTTP Client

CLI Reference

Commands

Options

init

add

serve

search

dump

inspect

Examples

Index and Search a Local Document Directory

Embedded Python Use with Metadata and Filters

Start the API Server and Search with curl

Bulk Add

Output Format

REST API Endpoints

Notes

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes