Skip to main content

HTTP Server for GiNZA - Japanese NLP Library

Project description

ginzaserver

HTTP Server for GiNZA - Japanese NLP Library

A high-performance, multi-threaded HTTP server that provides REST API access to GiNZA, a Japanese natural language processing library built on spaCy.

Features

  • 🚀 Multi-threaded server using ThreadingMixIn for concurrent request handling
  • 🎯 Dual model support: Choose between ja_ginza (fast) or ja_ginza_electra (accurate)
  • 🔥 GPU acceleration support for enhanced performance
  • 📊 Performance optimized with list comprehensions and efficient memory management
  • 🌐 REST API with both GET and POST endpoints
  • 📝 JSON response format with detailed token analysis

Installation

Prerequisites

Python 3.8 or higher is required.

Install GiNZA Models

Choose one or both models based on your needs:

# Fast model (recommended for production)
pip install -U ginza ja_ginza

# Accurate model (higher memory usage, ~16GB RAM recommended)
pip install -U ginza ja_ginza_electra

Install ginzaserver

Install directly from GitHub:

pip install git+https://github.com/oyahiroki/ginzaserver

Usage

Running the Server

ginzaserver <port> <option>

Parameters:

  • port: Port number to listen on (e.g., 8888)
  • option: Model selection
    • 0: Use ja_ginza (faster, 10-20ms per request)
    • 1: Use ja_ginza_electra (more accurate, 40-50ms per request)

Example:

ginzaserver 8888 0

API Endpoints

POST Request

Send JSON data with a text field:

curl -X POST -H "Content-Type: application/json" \
  -d '{"text":"今日はいい天気です"}' \
  http://localhost:8888/

GET Request

Pass text as a URL-encoded query parameter:

curl "http://localhost:8888/?text=%E4%BB%8A%E6%97%A5%E3%81%AF%E3%81%84%E3%81%84%E5%A4%A9%E6%B0%97%E3%81%A7%E3%81%99"

Response Format

The server returns JSON with the following structure:

{
  "type": "doc",
  "sents": [
    {
      "tokens": [
        {
          "i": 0,
          "orth": "今日",
          "tag": "名詞-普通名詞-副詞可能",
          "pos": "NOUN",
          "lemma": "今日",
          "head.i": 3,
          "dep": "obl"
        },
        ...
      ]
    }
  ]
}

Token Fields:

  • i: Token index in the document
  • orth: Original word form
  • tag: Detailed part-of-speech tag
  • pos: Universal part-of-speech tag
  • lemma: Base form of the word
  • head.i: Index of the syntactic head
  • dep: Dependency relation

Client Example

A sample client is included in examples/ginzaclient.py:

import urllib.request
import json

url = 'http://localhost:8888'
method = 'POST'
headers = {'Content-Type': 'application/json'}

obj = {'text': '今日はいい天気です'}
requestbody = json.dumps(obj).encode('utf-8')

request = urllib.request.Request(url, data=requestbody, method=method, headers=headers)
with urllib.request.urlopen(request) as response:
    response_body = response.read().decode('utf-8')
    response = json.loads(response_body)
    print(json.dumps(response, indent=2, ensure_ascii=False))

Running as Python Script

You can also run the server directly as a Python script:

python ginzaserver/ginzaserver.py 8888 0

Performance Optimizations

Recent improvements include:

  • ✅ List comprehensions for faster token processing
  • ✅ Removed unnecessary del statements
  • ✅ Direct JSON encoding without intermediate variables
  • ✅ GPU acceleration support (automatically enabled if available)
  • ✅ Removed unused imports

GPU Support

The server automatically detects and enables GPU acceleration if available:

if spacy.prefer_gpu():
    spacy.require_gpu()

For CUDA support, install the appropriate spaCy version:

# For CUDA 11.5
pip install -U spacy[cuda115]

Uninstallation

pip uninstall ginzaserver

Troubleshooting

Memory Issues

If the server is killed due to out-of-memory errors, check system logs:

# Linux
dmesg -T | grep -E -i -B100 'killed process'

# Check available memory
free -h

Consider using the ja_ginza model (option 0) instead of ja_ginza_electra if memory is limited.

WSL/Container Localhost Access

When running in WSL or containers, you may need to bind to 0.0.0.0 instead of 127.0.0.1 to accept external connections. Modify the ip variable in ginzaserver.py if needed.

License

Apache License 2.0

Author

Hiroki Oya (oyahiroki@gmail.com)

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ginzaserver-0.1.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ginzaserver-0.1.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file ginzaserver-0.1.0.tar.gz.

File metadata

  • Download URL: ginzaserver-0.1.0.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for ginzaserver-0.1.0.tar.gz
Algorithm Hash digest
SHA256 35e93edd6020efa1d833af2f4146751938743fb64b35c76cbab906bf0f4e9b08
MD5 a5a6b54f9b6629c95f50099d2a5562eb
BLAKE2b-256 9dc051c73bda50cc037c51def3a3aa0bcf7ed49af26e8990aec1ce1005791654

See more details on using hashes here.

File details

Details for the file ginzaserver-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ginzaserver-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for ginzaserver-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91adb5d8a8fa65edc1f7665ea4866b021e782ee887ac41e8cf8837a7aabee1da
MD5 1c48420d5eb859bcc56cd572b09265ca
BLAKE2b-256 f7936c0980594c210aac896e54a434abd7cef2e98dea6bf254b863b1dcae55af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page