Skip to main content

Local and HTTP server embedding tools for multilingual E5

Project description

nlp4j-llm-embeddings-e5

Local and HTTP server tools for generating multilingual E5 embeddings.

This project provides command-line and HTTP server utilities for using the intfloat/multilingual-e5-large model locally. It is designed for embedding JSONL data, building local semantic search workflows, and exposing embedding functions over a simple HTTP API.

The implementation uses sentence-transformers internally and applies E5-style prefixes such as passage: and query: automatically.

Features

  • Generate embeddings for JSONL files
  • Add embedding vectors to a specified JSON attribute
  • Use multilingual E5 embeddings locally
  • Run a lightweight HTTP embedding server
  • Support document embeddings with passage: prefix
  • Support query/document semantic search with query: and passage: prefixes
  • Support cosine similarity calculation
  • Optional token count checking
  • Batch processing for local JSONL embedding
  • Model warmup support for server mode

Model

The default model is:

intfloat/multilingual-e5-large

E5 models are designed to work with explicit text prefixes.

For document embeddings:

passage: your document text

For query embeddings:

query: your search query

This project automatically adds these prefixes depending on the selected mode.

Project Structure

.
├── LICENSE.txt
├── README.md
├── README_ja.md
├── pyproject.toml
├── requirements.txt
└── src
    └── nlp4j_embedding
        ├── __init__.py
        ├── e5_model.py
        ├── local_e5.py
        ├── request_handler.py
        └── server_e5.py

Installation

Install from source

git clone https://github.com/oyahiroki/nlp4j-llm-embeddings-e5.git
cd nlp4j-llm-embeddings-e5
pip install .

For development:

pip install -e .

Install dependencies manually

pip install -r requirements.txt

If you want to use GPU acceleration, please install a PyTorch build suitable for your CUDA environment.

Commands

After installation, the following commands are available:

nlp4j-embedding-local-e5
nlp4j-embedding-server-e5

Local JSONL Embedding

The local command reads a JSONL file, embeds text from a specified attribute, and writes a new JSONL file with an embedding vector added.

Basic usage

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl

Or using short options:

nlp4j-embedding-local-e5 -i input.jsonl -o output.jsonl

By default, it reads text from the text attribute and writes the vector to the vector attribute.

Input example:

{"id": "1", "text": "Kyoto is a city in Japan."}
{"id": "2", "text": "Tokyo is the capital of Japan."}

Output example:

{"id": "1", "text": "Kyoto is a city in Japan.", "vector": [0.0123, -0.0456, ...]}
{"id": "2", "text": "Tokyo is the capital of Japan.", "vector": [0.0234, -0.0567, ...]}

Specify input and output attributes

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --text-attr body \
  --vector-attr embedding

Specify E5 text type

For document embeddings, use passage:

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --text-type passage

For query embeddings, use query:

nlp4j-embedding-local-e5 --input queries.jsonl --output queries_with_vectors.jsonl \
  --text-type query

To disable automatic E5 prefixing:

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --text-type none

Batch size

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --batch-size 32

Token length

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --max-length 512

Token count warning

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --check-token-count

If the token count exceeds --max-length, a warning is printed.

Verbose mode

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl \
  --verbose

HTTP Embedding Server

Start the server:

nlp4j-embedding-server-e5

The default host is 127.0.0.1 and the default port is 8888.

nlp4j-embedding-server-e5 --host 127.0.0.1 --port 8888

By default, the model is loaded and warmed up at server startup.

To skip warmup:

nlp4j-embedding-server-e5 --no-warmup

HTTP API

The server provides the following endpoints:

/embeddings
/semantic_search
/cos_sim

/embeddings

Generate an embedding for a single text.

This endpoint is intended for document embeddings and uses the E5 passage: prefix internally.

GET

curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test."

POST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text":"This is a test."}' \
  http://127.0.0.1:8888/embeddings

Response example

{
  "message": "ok",
  "time": "2026-06-20T12:00:00",
  "text": "This is a test.",
  "embeddings": [0.0123, -0.0456, 0.0789]
}

Token count check

curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test.&checktokencount=true"

/semantic_search

Run semantic search between a query and one or more candidate texts.

The query is encoded with the E5 query: prefix. The candidate texts are encoded with the E5 passage: prefix.

GET

The GET API supports one query text and one candidate text.

curl "http://127.0.0.1:8888/semantic_search?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."

POST

The POST API supports multiple candidate texts.

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text":"Japanese NLP","texts":["GiNZA is a Japanese NLP library.","This document is about image processing."]}' \
  http://127.0.0.1:8888/semantic_search

Response example

{
  "message": "ok",
  "time": "2026-06-20T12:00:00",
  "text": "Japanese NLP",
  "r": [
    {
      "corpus_id": 0,
      "score": 0.8234
    },
    {
      "corpus_id": 1,
      "score": 0.3123
    }
  ]
}

/cos_sim

Calculate cosine similarity between two texts.

This endpoint currently uses no automatic E5 prefix by default. It is intended as a simple compatibility endpoint for comparing two raw texts.

For retrieval-style search, /semantic_search is recommended because it applies query: and passage: prefixes correctly.

GET

curl "http://127.0.0.1:8888/cos_sim?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."

POST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text1":"This is a test.","text2":"This is an exam.","checktokencount":true}' \
  http://127.0.0.1:8888/cos_sim

Response example

{
  "text1": "This is a test.",
  "text2": "This is an exam.",
  "cosine_similarity": 0.8123
}

Python API

You can also use the internal Python functions directly.

from nlp4j_embedding import e5_model

vector, elapsed = e5_model.embed_text(
    "Kyoto is a city in Japan.",
    text_type="passage"
)

print(vector)
print(elapsed)

Batch embedding:

from nlp4j_embedding import e5_model

vectors, elapsed = e5_model.embed_texts(
    [
        "Kyoto is a city in Japan.",
        "Tokyo is the capital of Japan."
    ],
    text_type="passage"
)

print(vectors)

Semantic search:

from nlp4j_embedding import e5_model

results = e5_model.semantic_search(
    "Japanese city",
    [
        "Kyoto is a city in Japan.",
        "Python is a programming language."
    ]
)

print(results)

Cosine similarity:

from nlp4j_embedding import e5_model

score = e5_model.cos_sim(
    "This is a test.",
    "This is an exam."
)

print(score)

Notes on E5 Prefixes

E5 models expect input text to be prefixed depending on the task.

For search queries:

query: ...

For documents or passages:

passage: ...

This project automatically adds the prefix unless the text already starts with query: or passage:.

For local JSONL embedding, the default text type is passage.

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl

This is equivalent to:

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl --text-type passage

Performance Notes

The first execution may take time because the model must be downloaded and loaded.

The server command warms up the model by default so that the first HTTP request does not need to load the model.

nlp4j-embedding-server-e5

To skip warmup:

nlp4j-embedding-server-e5 --no-warmup

For large JSONL files, increase or decrease the batch size depending on available memory and GPU capacity.

nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl --batch-size 64

Docker

A Dockerfile is provided in the docker directory.

cd docker

See:

docker/README.md

for Docker-specific usage.

License

This project is licensed under the Apache License 2.0.

See LICENSE.txt for details.

Author

Hiroki OYA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp4j_llm_embedding_e5-0.1.1.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file nlp4j_llm_embedding_e5-0.1.1.tar.gz.

File metadata

  • Download URL: nlp4j_llm_embedding_e5-0.1.1.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for nlp4j_llm_embedding_e5-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9b4a2a596a13136b01ae7044566f00f51339291c8b6c59d9864735b71efd4d21
MD5 39dd5ebf4778aaceec6980340f1886e7
BLAKE2b-256 85a33e99cb36380b067349a756229ba0141460da9e4a7f3abb24f51c91cf533a

See more details on using hashes here.

File details

Details for the file nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nlp4j_llm_embedding_e5-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a9df674fd3a976d1ec68051385aedb8b9e9471819202ee10b0eed8fb711270f7
MD5 c8db52c443e6e484ad888ffda4b9ff82
BLAKE2b-256 0dae3c4959bfe29fdfb3bde91c0fef35079a018141c047650c8b6553ab228da6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page