Skip to main content

Local and HTTP server embedding tools for multilingual E5

Project description

以下は、現在の構成に合わせた英語版 README.md の案です。local_e5.py の JSONL CLI、server_e5.py の HTTP server、e5_model.py の共通 E5 embedding 処理、request_handler.py の3エンドポイント構成を反映しています。

nlp4j-llm-embeddings-e5

Local and HTTP server tools for generating multilingual E5 embeddings.

This project provides command-line and HTTP server utilities for using the intfloat/multilingual-e5-large model locally. It is designed for embedding JSONL data, building local semantic search workflows, and exposing embedding functions over a simple HTTP API.

The implementation uses sentence-transformers internally and applies E5-style prefixes such as passage: and query: automatically.

Features

  • Generate embeddings for JSONL files
  • Add embedding vectors to a specified JSON attribute
  • Use multilingual E5 embeddings locally
  • Run a lightweight HTTP embedding server
  • Support document embeddings with passage: prefix
  • Support query/document semantic search with query: and passage: prefixes
  • Support cosine similarity calculation
  • Optional token count checking
  • Batch processing for local JSONL embedding
  • Model warmup support for server mode

Model

The default model is:

intfloat/multilingual-e5-large

E5 models are designed to work with explicit text prefixes.

For document embeddings:

passage: your document text

For query embeddings:

query: your search query

This project automatically adds these prefixes depending on the selected mode.

Project Structure

.
├── LICENSE.txt
├── README.md
├── README_ja.md
├── docker
│   ├── Dockerfile
│   └── README.md
├── examples
│   ├── index.html
│   ├── nlp4j-embedding-local-e5-bench-example_input_ja_1.txt
│   ├── nlp4j-embedding-local-e5-bench.py
│   ├── nlp4j-embedding-local-openai.py
│   ├── test2.txt
│   ├── test3.txt
│   └── test_json.txt
├── pyproject.toml
├── requirements.txt
└── src
    └── nlp4j_embedding
        ├── __init__.py
        ├── e5_model.py
        ├── local_e5.py
        ├── request_handler.py
        └── server_e5.py

Installation

Install from source

git clone https://github.com/oyahiroki/nlp4j-llm-embeddings-e5.git
cd nlp4j-llm-embeddings-e5
pip install .

For development:

pip install -e .

Install dependencies manually

pip install -r requirements.txt

If you want to use GPU acceleration, please install a PyTorch build suitable for your CUDA environment.

Commands

After installation, the following commands are available:

nlp4j-embedding-local-e5
nlp4j-embedding-server-e5

Local JSONL Embedding

The local command reads a JSONL file, embeds text from a specified attribute, and writes a new JSONL file with an embedding vector added.

Basic usage

nlp4j-embedding-local-e5 input.jsonl output.jsonl

By default, it reads text from the text attribute and writes the vector to the vector attribute.

Input example:

{"id": "1", "text": "Kyoto is a city in Japan."}
{"id": "2", "text": "Tokyo is the capital of Japan."}

Output example:

{"id": "1", "text": "Kyoto is a city in Japan.", "vector": [0.0123, -0.0456, ...]}
{"id": "2", "text": "Tokyo is the capital of Japan.", "vector": [0.0234, -0.0567, ...]}

Specify input and output attributes

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --text-attr body \
  --vector-attr embedding

Specify E5 text type

For document embeddings, use passage:

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --text-type passage

For query embeddings, use query:

nlp4j-embedding-local-e5 queries.jsonl queries_with_vectors.jsonl \
  --text-type query

To disable automatic E5 prefixing:

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --text-type none

Batch size

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --batch-size 32

Token length

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --max-length 512

Token count warning

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --check-token-count

If the token count exceeds --max-length, a warning is printed.

Verbose mode

nlp4j-embedding-local-e5 input.jsonl output.jsonl \
  --verbose

HTTP Embedding Server

Start the server:

nlp4j-embedding-server-e5

The default host is 127.0.0.1 and the default port is 8888.

nlp4j-embedding-server-e5 --host 127.0.0.1 --port 8888

By default, the model is loaded and warmed up at server startup.

To skip warmup:

nlp4j-embedding-server-e5 --no-warmup

HTTP API

The server provides the following endpoints:

/embeddings
/semantic_search
/cos_sim

/embeddings

Generate an embedding for a single text.

This endpoint is intended for document embeddings and uses the E5 passage: prefix internally.

GET

curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test."

POST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text":"This is a test."}' \
  http://127.0.0.1:8888/embeddings

Response example

{
  "message": "ok",
  "time": "2026-06-20T12:00:00",
  "text": "This is a test.",
  "embeddings": [0.0123, -0.0456, 0.0789]
}

Token count check

curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test.&checktokencount=true"

/semantic_search

Run semantic search between a query and one or more candidate texts.

The query is encoded with the E5 query: prefix. The candidate texts are encoded with the E5 passage: prefix.

GET

The GET API supports one query text and one candidate text.

curl "http://127.0.0.1:8888/semantic_search?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."

POST

The POST API supports multiple candidate texts.

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text":"Japanese NLP","texts":["GiNZA is a Japanese NLP library.","This document is about image processing."]}' \
  http://127.0.0.1:8888/semantic_search

Response example

{
  "message": "ok",
  "time": "2026-06-20T12:00:00",
  "text": "Japanese NLP",
  "r": [
    {
      "corpus_id": 0,
      "score": 0.8234
    },
    {
      "corpus_id": 1,
      "score": 0.3123
    }
  ]
}

/cos_sim

Calculate cosine similarity between two texts.

This endpoint currently uses no automatic E5 prefix by default. It is intended as a simple compatibility endpoint for comparing two raw texts.

For retrieval-style search, /semantic_search is recommended because it applies query: and passage: prefixes correctly.

GET

curl "http://127.0.0.1:8888/cos_sim?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."

POST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text1":"This is a test.","text2":"This is an exam.","checktokencount":true}' \
  http://127.0.0.1:8888/cos_sim

Response example

{
  "text1": "This is a test.",
  "text2": "This is an exam.",
  "cosine_similarity": 0.8123
}

Python API

You can also use the internal Python functions directly.

from nlp4j_embedding import e5_model

vector, elapsed = e5_model.embed_text(
    "Kyoto is a city in Japan.",
    text_type="passage"
)

print(vector)
print(elapsed)

Batch embedding:

from nlp4j_embedding import e5_model

vectors, elapsed = e5_model.embed_texts(
    [
        "Kyoto is a city in Japan.",
        "Tokyo is the capital of Japan."
    ],
    text_type="passage"
)

print(vectors)

Semantic search:

from nlp4j_embedding import e5_model

results = e5_model.semantic_search(
    "Japanese city",
    [
        "Kyoto is a city in Japan.",
        "Python is a programming language."
    ]
)

print(results)

Cosine similarity:

from nlp4j_embedding import e5_model

score = e5_model.cos_sim(
    "This is a test.",
    "This is an exam."
)

print(score)

Notes on E5 Prefixes

E5 models expect input text to be prefixed depending on the task.

For search queries:

query: ...

For documents or passages:

passage: ...

This project automatically adds the prefix unless the text already starts with query: or passage:.

For local JSONL embedding, the default text type is passage.

nlp4j-embedding-local-e5 input.jsonl output.jsonl

This is equivalent to:

nlp4j-embedding-local-e5 input.jsonl output.jsonl --text-type passage

Performance Notes

The first execution may take time because the model must be downloaded and loaded.

The server command warms up the model by default so that the first HTTP request does not need to load the model.

nlp4j-embedding-server-e5

To skip warmup:

nlp4j-embedding-server-e5 --no-warmup

For large JSONL files, increase or decrease the batch size depending on available memory and GPU capacity.

nlp4j-embedding-local-e5 input.jsonl output.jsonl --batch-size 64

Docker

A Dockerfile is provided in the docker directory.

cd docker

See:

docker/README.md

for Docker-specific usage.

License

This project is licensed under the Apache License 2.0.

See LICENSE.txt for details.

Author

Hiroki OYA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp4j_llm_embedding_e5-0.1.0.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp4j_llm_embedding_e5-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file nlp4j_llm_embedding_e5-0.1.0.tar.gz.

File metadata

  • Download URL: nlp4j_llm_embedding_e5-0.1.0.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for nlp4j_llm_embedding_e5-0.1.0.tar.gz
Algorithm Hash digest
SHA256 75ccf831df5e0d03de99805d896369ebc617c9ff90beca02301f7492e0beda14
MD5 43ab79dec33482b23fc87acca52394ce
BLAKE2b-256 26ca610e84bba2b06ecb8c5c6551a289ad63b823e4a6b56c9391cad511cde0c4

See more details on using hashes here.

File details

Details for the file nlp4j_llm_embedding_e5-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nlp4j_llm_embedding_e5-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 488340151f631c6fb6db13393f6c506215a19d0364a0307c7ffe228284a44cdb
MD5 d3dd3ad151ba3597bed8b382ca1978fb
BLAKE2b-256 b707a01ce5d8af448a42d47724151d7f9bf1db2c626e56c40c5b7ac3dfe3aa6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page