Apache Lucene based in-memory local search engine for Python

These details have not been verified by PyPI

Project links

Project description

Python License

nlp4j-local-search

English | 日本語

Use Apache Lucene from Python without running Elasticsearch, OpenSearch, Solr, or Docker.

nlp4j-local-search is a lightweight in-memory full-text search library for Python.

It allows you to use Apache Lucene-based search functionality directly from Python, without setting up a search server.

This library is designed for:

NLP experiments
RAG prototyping
Local full-text search
Jupyter Notebook and Google Colab experiments
Small search applications
Test code that needs temporary search indexes

Internally, it uses Java and Apache Lucene, but Python users do not need to write Java code.

Why this library?

Elasticsearch, OpenSearch, and Apache Solr are powerful search engines, and they are all built on Apache Lucene.

However, for small experiments, local prototypes, or notebook-based workflows, setting up a full search server can be too heavy.

With nlp4j-local-search, you can create a Lucene-based search index directly inside your Python process.

from nlp4j_local_search import SearchEngine

with SearchEngine("en") as engine:
    engine.add("1", "Developers are searching documents with a local search engine.")
    engine.add("2", "A developer searched many documents yesterday.")
    engine.add("3", "This tool searches local JSON documents.")

    engine.commit()

    for r in engine.search("search"):
        print(r.id, r.body, r.score)

No server.
No Docker.
No external search engine process.

Features

Python-first API
Apache Lucene-based full-text search
In-memory local search
No Elasticsearch required
No OpenSearch required
No Solr required
No Docker required
Japanese full-text search
English full-text search
JSON document input
Useful for NLP and RAG experiments

Installation

Note: PyPI release is under preparation.
For now, please install directly from GitHub.

pip install git+https://github.com/oyahiroki/nlp4j-local-search.git

For development:

git clone https://github.com/oyahiroki/nlp4j-local-search.git
cd nlp4j-local-search
pip install -e .

Requirements

Python 3.8 or later
Java runtime environment
jpype1

Quick Start

from nlp4j_local_search import SearchEngine

engine = SearchEngine("ja")

engine.add("1", "東京都は日本の都道府県のひとつです")
engine.add("2", "京都は日本の都市です")
engine.add("3", "京都市には任天堂の本社があります")

engine.commit()

results = engine.search("京都")

for r in results:
    print(r.id, r.body, r.score)

engine.close()

Japanese Analyzer Example: Avoiding Noisy Substring Matches

Japanese text search is different from simple substring matching.

For example, if you search for 京都 using simple substring matching, a sentence containing 東京都 may also match because 東京都 contains the characters 京都.

However, with Japanese full-text analysis, 東京都 and 京都 can be treated as different terms.

from nlp4j_local_search import SearchEngine

with SearchEngine("ja") as engine:
    engine.add("1", "東京都は日本の都道府県のひとつです")
    engine.add("2", "京都は日本の都市です")
    engine.add("3", "京都市には任天堂の本社があります")

    engine.commit()

    for r in engine.search("京都", limit=10):
        print(r.id, r.body, r.score)


---

## Recommended Usage

Using `SearchEngine` as a context manager is recommended.

```python
from nlp4j_local_search import SearchEngine

with SearchEngine("ja") as engine:
    engine.add("1", "東京都は日本の都道府県のひとつです")
    engine.add("2", "京都は日本の都市です。")
    engine.add("3", "京都市には任天堂の本社があります")
    engine.add_json({"id": "4", "body": "京都府は広いです"})

    engine.commit()

    for r in engine.search("京都", limit=10):
        print(r.id, r.body, r.score)

Example output:

2 京都は日本の都市です。 0.18059490621089935
4 京都府は広いです 0.18059490621089935
3 京都市には任天堂の本社があります 0.16212496161460876

Adding Documents

You can add a document by specifying an ID and body text.

engine.add("1", "Kyoto is a historical city in Japan.")

Adding JSON Documents

You can also add a document as a Python dictionary.

engine.add_json({
    "id": "1",
    "body": "Kyoto is a historical city in Japan."
})

Or as a JSON string.

engine.add_json("""
{
  "id": "2",
  "body": "Osaka is a large city in western Japan."
}
""")

This is useful for NLP workflows where JSON and JSONL are commonly used as intermediate data formats.

Searching

results = engine.search("Kyoto")

You can specify the maximum number of search results.

results = engine.search("Kyoto", limit=10)

Each result has the following attributes:

r.id
r.body
r.score

Language Settings

Japanese:

engine = SearchEngine("ja")

English:

engine = SearchEngine("en")

English Analyzer Example

When using SearchEngine("en"), English text is analyzed with an English analyzer.

This means that search can handle common English word variations such as:

search
searches
searched
searching

It can also handle cases such as:

document / documents
Lucene / Lucene's
uppercase / lowercase differences

This is useful when you want more than simple substring matching.

from nlp4j_local_search import SearchEngine

with SearchEngine("en") as engine:
    engine.add("1", "Developers are searching documents with a local search engine.")
    engine.add("2", "A developer searched many documents yesterday.")
    engine.add("3", "This tool searches local JSON documents.")
    engine.add("4", "Lucene's EnglishAnalyzer is useful for English full-text search.")
    engine.add("5", "The quick brown fox jumps over the lazy dog.")

    engine.commit()

    print("Query: search")
    for r in engine.search("search", limit=10):
        print(r.id, r.body, r.score)

    print("Query: document")
    for r in engine.search("document", limit=10):
        print(r.id, r.body, r.score)

    print("Query: lucene")
    for r in engine.search("lucene", limit=10):
        print(r.id, r.body, r.score)

Unlike simple substring matching, English full-text search can match related word forms such as search, searched, and searching.

This makes it useful for local search, NLP experiments, and search baseline evaluation.

Japanese Search Example

For Japanese text, use SearchEngine("ja").

from nlp4j_local_search import SearchEngine

with SearchEngine("ja") as engine:
    engine.add("1", "東京都は日本の都道府県のひとつです")
    engine.add("2", "京都は日本の都市です")
    engine.add("3", "京都市には任天堂の本社があります")
    engine.add("4", "大阪は関西の大都市です")

    engine.commit()

    for r in engine.search("京都", limit=10):
        print(r.id, r.body, r.score)

This is useful when you want to try Japanese full-text search locally without setting up a search server.

Google Colab

nlp4j-local-search can also be used in Google Colab.

!pip install git+https://github.com/oyahiroki/nlp4j-local-search.git

Then:

from nlp4j_local_search import SearchEngine

with SearchEngine("ja") as engine:
    engine.add("1", "東京都は日本の都道府県のひとつです")
    engine.add("2", "京都は日本の都市です")
    engine.add("3", "京都市には任天堂の本社があります")
    engine.add_json({"id": "4", "body": "京都府は広いです"})

    engine.commit()

    results = engine.search("京都", limit=10)

    for r in results:
        print(f"ID: {r.id}, Score: {r.score:.4f}")
        print(f"Body: {r.body}")
        print("-" * 50)

Notes:

The index is stored in memory.
If the Colab session is reset, the index will be lost.
JVM startup may take a few seconds on the first run.

Design Concept

Local Search

This library is not a search server.

You do not need to run:

Elasticsearch
OpenSearch
Solr
Docker

The search engine runs inside your Python process.

In-Memory Index

By default, the search index is created in memory.

This makes the library useful for:

Temporary experiments
Unit tests
Jupyter Notebook
Google Colab
Proof-of-concept development
Local NLP workflows

The index is not persisted to disk.

Python-First API

Although the internal implementation uses Java and Apache Lucene, the public API is designed for Python users.

engine = SearchEngine("en")

That is enough to start using Lucene-based search from Python.

Use Cases

NLP Experiments

You can quickly create a searchable index from text data, Wikipedia-derived datasets, dictionary data, or intermediate NLP results.

RAG Prototyping

Before building a full RAG system, you can test local keyword search behavior with small or medium-sized datasets.

Search Baseline for Embedding Experiments

When evaluating embedding models, it is often useful to compare vector search results with traditional keyword-based full-text search.

Test Code

Because the index is in memory, you can create and discard search indexes during automated tests.

Current Status

This project is currently in an early development stage.

Current focus:

Simple local full-text search from Python
Japanese search
English search
JSON document input
In-memory indexing

APIs may change in future versions.

Roadmap

Planned or considered features:

PyPI release
Improved Google Colab support
Vector search
Aggregation
JSON Query DSL
OpenSearch-compatible API

Project Information

Package name:

nlp4j-local-search

Python module name:

nlp4j_local_search

Current version:

0.1.0

License

Apache License 2.0

Author

Hiroki Oya

GitHub:

https://github.com/oyahiroki

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp4j_local_search-0.2.0.tar.gz (43.5 MB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nlp4j_local_search-0.2.0-py3-none-any.whl (43.5 MB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file nlp4j_local_search-0.2.0.tar.gz.

File metadata

Download URL: nlp4j_local_search-0.2.0.tar.gz
Upload date: Jun 14, 2026
Size: 43.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for nlp4j_local_search-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`82f398de5d67424e4269debd91e2f6e737d492d39c214dd29ace6d897172d81f`
MD5	`51a5208b7deb19635241ee5c649ca2b5`
BLAKE2b-256	`48de72ad701cc7f264e5c049a3edc5d9fef0637cf7154cc6f6d2733ed45a8921`

See more details on using hashes here.

File details

Details for the file nlp4j_local_search-0.2.0-py3-none-any.whl.

File metadata

Download URL: nlp4j_local_search-0.2.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 43.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for nlp4j_local_search-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5990b4027263a17d83f95134a78190e8c3310a0a74d7ba99719ed4ff06b75d48`
MD5	`8fa7afe8c58e5cb0f90c09c0e4636196`
BLAKE2b-256	`9754440997ff73a5ca40ddf9a39d4c7813ca9caa9689885ba87aeae9380437a2`

See more details on using hashes here.

nlp4j-local-search 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nlp4j-local-search

Why this library?

Features

Installation

Requirements

Quick Start

Japanese Analyzer Example: Avoiding Noisy Substring Matches

Adding Documents

Adding JSON Documents

Searching

Language Settings

English Analyzer Example

Japanese Search Example

Google Colab

Design Concept

Local Search

In-Memory Index

Python-First API

Use Cases

NLP Experiments

RAG Prototyping

Search Baseline for Embedding Experiments

Test Code

Current Status

Roadmap

Project Information

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes