Local and HTTP server embedding tools for multilingual E5
Project description
以下は、現在の構成に合わせた英語版 README.md の案です。local_e5.py の JSONL CLI、server_e5.py の HTTP server、e5_model.py の共通 E5 embedding 処理、request_handler.py の3エンドポイント構成を反映しています。
nlp4j-llm-embeddings-e5
Local and HTTP server tools for generating multilingual E5 embeddings.
This project provides command-line and HTTP server utilities for using the intfloat/multilingual-e5-large model locally. It is designed for embedding JSONL data, building local semantic search workflows, and exposing embedding functions over a simple HTTP API.
The implementation uses sentence-transformers internally and applies E5-style prefixes such as passage: and query: automatically.
Features
- Generate embeddings for JSONL files
- Add embedding vectors to a specified JSON attribute
- Use multilingual E5 embeddings locally
- Run a lightweight HTTP embedding server
- Support document embeddings with
passage:prefix - Support query/document semantic search with
query:andpassage:prefixes - Support cosine similarity calculation
- Optional token count checking
- Batch processing for local JSONL embedding
- Model warmup support for server mode
Model
The default model is:
intfloat/multilingual-e5-large
E5 models are designed to work with explicit text prefixes.
For document embeddings:
passage: your document text
For query embeddings:
query: your search query
This project automatically adds these prefixes depending on the selected mode.
Project Structure
.
├── LICENSE.txt
├── README.md
├── README_ja.md
├── docker
│ ├── Dockerfile
│ └── README.md
├── examples
│ ├── index.html
│ ├── nlp4j-embedding-local-e5-bench-example_input_ja_1.txt
│ ├── nlp4j-embedding-local-e5-bench.py
│ ├── nlp4j-embedding-local-openai.py
│ ├── test2.txt
│ ├── test3.txt
│ └── test_json.txt
├── pyproject.toml
├── requirements.txt
└── src
└── nlp4j_embedding
├── __init__.py
├── e5_model.py
├── local_e5.py
├── request_handler.py
└── server_e5.py
Installation
Install from source
git clone https://github.com/oyahiroki/nlp4j-llm-embeddings-e5.git
cd nlp4j-llm-embeddings-e5
pip install .
For development:
pip install -e .
Install dependencies manually
pip install -r requirements.txt
If you want to use GPU acceleration, please install a PyTorch build suitable for your CUDA environment.
Commands
After installation, the following commands are available:
nlp4j-embedding-local-e5
nlp4j-embedding-server-e5
Local JSONL Embedding
The local command reads a JSONL file, embeds text from a specified attribute, and writes a new JSONL file with an embedding vector added.
Basic usage
nlp4j-embedding-local-e5 input.jsonl output.jsonl
By default, it reads text from the text attribute and writes the vector to the vector attribute.
Input example:
{"id": "1", "text": "Kyoto is a city in Japan."}
{"id": "2", "text": "Tokyo is the capital of Japan."}
Output example:
{"id": "1", "text": "Kyoto is a city in Japan.", "vector": [0.0123, -0.0456, ...]}
{"id": "2", "text": "Tokyo is the capital of Japan.", "vector": [0.0234, -0.0567, ...]}
Specify input and output attributes
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--text-attr body \
--vector-attr embedding
Specify E5 text type
For document embeddings, use passage:
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--text-type passage
For query embeddings, use query:
nlp4j-embedding-local-e5 queries.jsonl queries_with_vectors.jsonl \
--text-type query
To disable automatic E5 prefixing:
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--text-type none
Batch size
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--batch-size 32
Token length
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--max-length 512
Token count warning
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--check-token-count
If the token count exceeds --max-length, a warning is printed.
Verbose mode
nlp4j-embedding-local-e5 input.jsonl output.jsonl \
--verbose
HTTP Embedding Server
Start the server:
nlp4j-embedding-server-e5
The default host is 127.0.0.1 and the default port is 8888.
nlp4j-embedding-server-e5 --host 127.0.0.1 --port 8888
By default, the model is loaded and warmed up at server startup.
To skip warmup:
nlp4j-embedding-server-e5 --no-warmup
HTTP API
The server provides the following endpoints:
/embeddings
/semantic_search
/cos_sim
/embeddings
Generate an embedding for a single text.
This endpoint is intended for document embeddings and uses the E5 passage: prefix internally.
GET
curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test."
POST
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text":"This is a test."}' \
http://127.0.0.1:8888/embeddings
Response example
{
"message": "ok",
"time": "2026-06-20T12:00:00",
"text": "This is a test.",
"embeddings": [0.0123, -0.0456, 0.0789]
}
Token count check
curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test.&checktokencount=true"
/semantic_search
Run semantic search between a query and one or more candidate texts.
The query is encoded with the E5 query: prefix.
The candidate texts are encoded with the E5 passage: prefix.
GET
The GET API supports one query text and one candidate text.
curl "http://127.0.0.1:8888/semantic_search?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."
POST
The POST API supports multiple candidate texts.
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text":"Japanese NLP","texts":["GiNZA is a Japanese NLP library.","This document is about image processing."]}' \
http://127.0.0.1:8888/semantic_search
Response example
{
"message": "ok",
"time": "2026-06-20T12:00:00",
"text": "Japanese NLP",
"r": [
{
"corpus_id": 0,
"score": 0.8234
},
{
"corpus_id": 1,
"score": 0.3123
}
]
}
/cos_sim
Calculate cosine similarity between two texts.
This endpoint currently uses no automatic E5 prefix by default. It is intended as a simple compatibility endpoint for comparing two raw texts.
For retrieval-style search, /semantic_search is recommended because it applies query: and passage: prefixes correctly.
GET
curl "http://127.0.0.1:8888/cos_sim?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."
POST
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text1":"This is a test.","text2":"This is an exam.","checktokencount":true}' \
http://127.0.0.1:8888/cos_sim
Response example
{
"text1": "This is a test.",
"text2": "This is an exam.",
"cosine_similarity": 0.8123
}
Python API
You can also use the internal Python functions directly.
from nlp4j_embedding import e5_model
vector, elapsed = e5_model.embed_text(
"Kyoto is a city in Japan.",
text_type="passage"
)
print(vector)
print(elapsed)
Batch embedding:
from nlp4j_embedding import e5_model
vectors, elapsed = e5_model.embed_texts(
[
"Kyoto is a city in Japan.",
"Tokyo is the capital of Japan."
],
text_type="passage"
)
print(vectors)
Semantic search:
from nlp4j_embedding import e5_model
results = e5_model.semantic_search(
"Japanese city",
[
"Kyoto is a city in Japan.",
"Python is a programming language."
]
)
print(results)
Cosine similarity:
from nlp4j_embedding import e5_model
score = e5_model.cos_sim(
"This is a test.",
"This is an exam."
)
print(score)
Notes on E5 Prefixes
E5 models expect input text to be prefixed depending on the task.
For search queries:
query: ...
For documents or passages:
passage: ...
This project automatically adds the prefix unless the text already starts with query: or passage:.
For local JSONL embedding, the default text type is passage.
nlp4j-embedding-local-e5 input.jsonl output.jsonl
This is equivalent to:
nlp4j-embedding-local-e5 input.jsonl output.jsonl --text-type passage
Performance Notes
The first execution may take time because the model must be downloaded and loaded.
The server command warms up the model by default so that the first HTTP request does not need to load the model.
nlp4j-embedding-server-e5
To skip warmup:
nlp4j-embedding-server-e5 --no-warmup
For large JSONL files, increase or decrease the batch size depending on available memory and GPU capacity.
nlp4j-embedding-local-e5 input.jsonl output.jsonl --batch-size 64
Docker
A Dockerfile is provided in the docker directory.
cd docker
See:
docker/README.md
for Docker-specific usage.
License
This project is licensed under the Apache License 2.0.
See LICENSE.txt for details.
Author
Hiroki OYA
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlp4j_llm_embedding_e5-0.1.0.tar.gz.
File metadata
- Download URL: nlp4j_llm_embedding_e5-0.1.0.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75ccf831df5e0d03de99805d896369ebc617c9ff90beca02301f7492e0beda14
|
|
| MD5 |
43ab79dec33482b23fc87acca52394ce
|
|
| BLAKE2b-256 |
26ca610e84bba2b06ecb8c5c6551a289ad63b823e4a6b56c9391cad511cde0c4
|
File details
Details for the file nlp4j_llm_embedding_e5-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nlp4j_llm_embedding_e5-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
488340151f631c6fb6db13393f6c506215a19d0364a0307c7ffe228284a44cdb
|
|
| MD5 |
d3dd3ad151ba3597bed8b382ca1978fb
|
|
| BLAKE2b-256 |
b707a01ce5d8af448a42d47724151d7f9bf1db2c626e56c40c5b7ac3dfe3aa6d
|