Local and HTTP server embedding tools for multilingual E5

These details have not been verified by PyPI

Project links

Project description

nlp4j-llm-embeddings-e5

Local and HTTP server tools for generating multilingual E5 embeddings.

This project provides command-line and HTTP server utilities for using the intfloat/multilingual-e5-large model locally. It is designed for embedding JSONL data, building local semantic search workflows, and exposing embedding functions over a simple HTTP API.

The implementation uses sentence-transformers internally and applies E5-style prefixes such as passage: and query: automatically.

Features

Generate embeddings for JSONL files
Add embedding vectors to a specified JSON attribute
Use multilingual E5 embeddings locally
Run a lightweight HTTP embedding server
Support document embeddings with passage: prefix
Support query/document semantic search with query: and passage: prefixes
Support cosine similarity calculation
Optional token count checking
Batch processing for local JSONL embedding
Model warmup support for server mode

Model

The default model is:

intfloat/multilingual-e5-large

E5 models are designed to work with explicit text prefixes.

For document embeddings:

passage: your document text

For query embeddings:

query: your search query

This project automatically adds these prefixes depending on the selected mode.

Project Structure

.
├── LICENSE.txt
├── README.md
├── README_ja.md
├── pyproject.toml
├── requirements.txt
└── src
    └── nlp4j_embedding
        ├── __init__.py
        ├── e5_model.py
        ├── local_e5.py
        ├── request_handler.py
        └── server_e5.py

Installation

Install from source

git clone https://github.com/oyahiroki/nlp4j-llm-embeddings-e5.git
cd nlp4j-llm-embeddings-e5
pip install .

For development:

pip install -e .

Install dependencies manually

pip install -r requirements.txt

If you want to use GPU acceleration, please install a PyTorch build suitable for your CUDA environment.

Commands

After installation, the following commands are available:

nlp4j-embedding-local-e5
nlp4j-embedding-server-e5

Local JSONL Embedding

The local command reads a JSONL file, embeds text from a specified attribute, and writes a new JSONL file with an embedding vector added.

Basic usage

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl

Or using short options:

nlp4j-embedding-local-e5 -i input.jsonl -o output.jsonl

By default, it reads text from the text attribute and writes the vector to the vector attribute.

Input example:

{"id": "1", "text": "Kyoto is a city in Japan."}
{"id": "2", "text": "Tokyo is the capital of Japan."}

Output example:

{"id": "1", "text": "Kyoto is a city in Japan.", "vector": [0.0123, -0.0456, ...]}
{"id": "2", "text": "Tokyo is the capital of Japan.", "vector": [0.0234, -0.0567, ...]}

Specify input and output attributes

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --text-attr body \
  --vector-attr embedding

Specify E5 text type

For document embeddings, use passage:

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --text-type passage

For query embeddings, use query:

nlp4j-embedding-local-e5 --input queries.jsonl --output queries_with_vectors.jsonl \
  --text-type query

To disable automatic E5 prefixing:

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --text-type none

Batch size

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --batch-size 32

Token length

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --max-length 512

Token count warning

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --check-token-count

If the token count exceeds --max-length, a warning is printed.

Verbose mode

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --verbose

HTTP Embedding Server

Start the server:

nlp4j-embedding-server-e5

The default host is 127.0.0.1 and the default port is 8888.

nlp4j-embedding-server-e5 --host 127.0.0.1 --port 8888

By default, the model is loaded and warmed up at server startup.

To skip warmup:

nlp4j-embedding-server-e5 --no-warmup

HTTP API

The server provides the following endpoints:

/embeddings
/semantic_search
/cos_sim

`/embeddings`

Generate an embedding for a single text.

This endpoint is intended for document embeddings and uses the E5 passage: prefix internally.

GET

curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test."

POST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text":"This is a test."}' \
  http://127.0.0.1:8888/embeddings

Response example

{
  "message": "ok",
  "time": "2026-06-20T12:00:00",
  "text": "This is a test.",
  "embeddings": [0.0123, -0.0456, 0.0789]
}

Token count check

curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test.&checktokencount=true"

`/semantic_search`

Run semantic search between a query and one or more candidate texts.

The query is encoded with the E5 query: prefix. The candidate texts are encoded with the E5 passage: prefix.

GET

The GET API supports one query text and one candidate text.

curl "http://127.0.0.1:8888/semantic_search?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."

POST

The POST API supports multiple candidate texts.

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text":"Japanese NLP","texts":["GiNZA is a Japanese NLP library.","This document is about image processing."]}' \
  http://127.0.0.1:8888/semantic_search

Response example

{
  "message": "ok",
  "time": "2026-06-20T12:00:00",
  "text": "Japanese NLP",
  "r": [
    {
      "corpus_id": 0,
      "score": 0.8234
    },
    {
      "corpus_id": 1,
      "score": 0.3123
    }
  ]
}

`/cos_sim`

Calculate cosine similarity between two texts.

This endpoint currently uses no automatic E5 prefix by default. It is intended as a simple compatibility endpoint for comparing two raw texts.

For retrieval-style search, /semantic_search is recommended because it applies query: and passage: prefixes correctly.

GET

curl "http://127.0.0.1:8888/cos_sim?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."

POST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text1":"This is a test.","text2":"This is an exam.","checktokencount":true}' \
  http://127.0.0.1:8888/cos_sim

Response example

{
  "text1": "This is a test.",
  "text2": "This is an exam.",
  "cosine_similarity": 0.8123
}

Python API

You can also use the internal Python functions directly.

from nlp4j_embedding import e5_model

vector, elapsed = e5_model.embed_text(
    "Kyoto is a city in Japan.",
    text_type="passage"
)

print(vector)
print(elapsed)

Batch embedding:

from nlp4j_embedding import e5_model

vectors, elapsed = e5_model.embed_texts(
    [
        "Kyoto is a city in Japan.",
        "Tokyo is the capital of Japan."
    ],
    text_type="passage"
)

print(vectors)

Semantic search:

from nlp4j_embedding import e5_model

results = e5_model.semantic_search(
    "Japanese city",
    [
        "Kyoto is a city in Japan.",
        "Python is a programming language."
    ]
)

print(results)

Cosine similarity:

from nlp4j_embedding import e5_model

score = e5_model.cos_sim(
    "This is a test.",
    "This is an exam."
)

print(score)

Notes on E5 Prefixes

E5 models expect input text to be prefixed depending on the task.

For search queries:

query: ...

For documents or passages:

passage: ...

This project automatically adds the prefix unless the text already starts with query: or passage:.

For local JSONL embedding, the default text type is passage.

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl

This is equivalent to:

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl --text-type passage

Performance Notes

The first execution may take time because the model must be downloaded and loaded.

The server command warms up the model by default so that the first HTTP request does not need to load the model.

nlp4j-embedding-server-e5

To skip warmup:

nlp4j-embedding-server-e5 --no-warmup

For large JSONL files, increase or decrease the batch size depending on available memory and GPU capacity.

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl --batch-size 64

Docker

A Dockerfile is provided in the docker directory.

cd docker

See:

docker/README.md

for Docker-specific usage.

License

This project is licensed under the Apache License 2.0.

See LICENSE.txt for details.

Author

Hiroki OYA

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 20, 2026

0.1.0

Jun 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp4j_llm_embedding_e5-0.1.1.tar.gz (19.7 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl (17.7 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file nlp4j_llm_embedding_e5-0.1.1.tar.gz.

File metadata

Download URL: nlp4j_llm_embedding_e5-0.1.1.tar.gz
Upload date: Jun 20, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for nlp4j_llm_embedding_e5-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9b4a2a596a13136b01ae7044566f00f51339291c8b6c59d9864735b71efd4d21`
MD5	`39dd5ebf4778aaceec6980340f1886e7`
BLAKE2b-256	`85a33e99cb36380b067349a756229ba0141460da9e4a7f3abb24f51c91cf533a`

See more details on using hashes here.

File details

Details for the file nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl.

File metadata

Download URL: nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 17.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a9df674fd3a976d1ec68051385aedb8b9e9471819202ee10b0eed8fb711270f7`
MD5	`c8db52c443e6e484ad888ffda4b9ff82`
BLAKE2b-256	`0dae3c4959bfe29fdfb3bde91c0fef35079a018141c047650c8b6553ab228da6`

See more details on using hashes here.

nlp4j-llm-embedding-e5 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nlp4j-llm-embeddings-e5

Features

Model

Project Structure

Installation

Install from source

Install dependencies manually

Commands

Local JSONL Embedding

Basic usage

Specify input and output attributes

Specify E5 text type

Batch size

Token length

Token count warning

Verbose mode

HTTP Embedding Server

HTTP API

/embeddings

GET

POST

Response example

Token count check

/semantic_search

GET

POST

Response example

/cos_sim

GET

POST

Response example

Python API

Notes on E5 Prefixes

Performance Notes

Docker

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`/embeddings`

`/semantic_search`

`/cos_sim`