HTTP Server for GiNZA - Japanese NLP Library
Project description
ginzaserver
HTTP Server for GiNZA - Japanese NLP Library
A high-performance, multi-threaded HTTP server that provides REST API access to GiNZA, a Japanese natural language processing library built on spaCy.
Features
- 🚀 Multi-threaded server using
ThreadingMixInfor concurrent request handling - 🎯 Dual model support: Choose between
ja_ginza(fast) orja_ginza_electra(accurate) - 🔥 GPU acceleration support for enhanced performance
- 📊 Performance optimized with list comprehensions and efficient memory management
- 🌐 REST API with both GET and POST endpoints
- 📝 JSON response format with detailed token analysis
Installation
Prerequisites
Python 3.8 or higher is required.
Install GiNZA Models
Choose one or both models based on your needs:
# Fast model (recommended for production)
pip install -U ginza ja_ginza
# Accurate model (higher memory usage, ~16GB RAM recommended)
pip install -U ginza ja_ginza_electra
Install ginzaserver
Install directly from GitHub:
pip install git+https://github.com/oyahiroki/ginzaserver
Usage
Running the Server
ginzaserver <port> <option>
Parameters:
port: Port number to listen on (e.g., 8888)option: Model selection0: Useja_ginza(faster, 10-20ms per request)1: Useja_ginza_electra(more accurate, 40-50ms per request)
Example:
ginzaserver 8888 0
API Endpoints
POST Request
Send JSON data with a text field:
curl -X POST -H "Content-Type: application/json" \
-d '{"text":"今日はいい天気です"}' \
http://localhost:8888/
GET Request
Pass text as a URL-encoded query parameter:
curl "http://localhost:8888/?text=%E4%BB%8A%E6%97%A5%E3%81%AF%E3%81%84%E3%81%84%E5%A4%A9%E6%B0%97%E3%81%A7%E3%81%99"
Response Format
The server returns JSON with the following structure:
{
"type": "doc",
"sents": [
{
"tokens": [
{
"i": 0,
"orth": "今日",
"tag": "名詞-普通名詞-副詞可能",
"pos": "NOUN",
"lemma": "今日",
"head.i": 3,
"dep": "obl"
},
...
]
}
]
}
Token Fields:
i: Token index in the documentorth: Original word formtag: Detailed part-of-speech tagpos: Universal part-of-speech taglemma: Base form of the wordhead.i: Index of the syntactic headdep: Dependency relation
Client Example
A sample client is included in examples/ginzaclient.py:
import urllib.request
import json
url = 'http://localhost:8888'
method = 'POST'
headers = {'Content-Type': 'application/json'}
obj = {'text': '今日はいい天気です'}
requestbody = json.dumps(obj).encode('utf-8')
request = urllib.request.Request(url, data=requestbody, method=method, headers=headers)
with urllib.request.urlopen(request) as response:
response_body = response.read().decode('utf-8')
response = json.loads(response_body)
print(json.dumps(response, indent=2, ensure_ascii=False))
Running as Python Script
You can also run the server directly as a Python script:
python ginzaserver/ginzaserver.py 8888 0
Performance Optimizations
Recent improvements include:
- ✅ List comprehensions for faster token processing
- ✅ Removed unnecessary
delstatements - ✅ Direct JSON encoding without intermediate variables
- ✅ GPU acceleration support (automatically enabled if available)
- ✅ Removed unused imports
GPU Support
The server automatically detects and enables GPU acceleration if available:
if spacy.prefer_gpu():
spacy.require_gpu()
For CUDA support, install the appropriate spaCy version:
# For CUDA 11.5
pip install -U spacy[cuda115]
Uninstallation
pip uninstall ginzaserver
Troubleshooting
Memory Issues
If the server is killed due to out-of-memory errors, check system logs:
# Linux
dmesg -T | grep -E -i -B100 'killed process'
# Check available memory
free -h
Consider using the ja_ginza model (option 0) instead of ja_ginza_electra if memory is limited.
WSL/Container Localhost Access
When running in WSL or containers, you may need to bind to 0.0.0.0 instead of 127.0.0.1 to accept external connections. Modify the ip variable in ginzaserver.py if needed.
License
Apache License 2.0
Author
Hiroki Oya (oyahiroki@gmail.com)
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ginzaserver-0.1.0.tar.gz.
File metadata
- Download URL: ginzaserver-0.1.0.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35e93edd6020efa1d833af2f4146751938743fb64b35c76cbab906bf0f4e9b08
|
|
| MD5 |
a5a6b54f9b6629c95f50099d2a5562eb
|
|
| BLAKE2b-256 |
9dc051c73bda50cc037c51def3a3aa0bcf7ed49af26e8990aec1ce1005791654
|
File details
Details for the file ginzaserver-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ginzaserver-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91adb5d8a8fa65edc1f7665ea4866b021e782ee887ac41e8cf8837a7aabee1da
|
|
| MD5 |
1c48420d5eb859bcc56cd572b09265ca
|
|
| BLAKE2b-256 |
f7936c0980594c210aac896e54a434abd7cef2e98dea6bf254b863b1dcae55af
|