한국어 번역 능력 벤치마크, LLM-as-a-judge

These details have not been verified by PyPI

Project links

Project description

KorT

Korean Translation Benchmark, LLM-as-a-judge

KorT Image

Abstract

KorT는 대규모 언어 모델(LLM)을 활용하여 번역 품질을 정량적으로 평가하는 벤치마크입니다.

배경

현재 다양한 번역 서비스가 존재하지만, 번역 품질을 정량적으로 평가하고 체계적으로 비교하는 연구는 부족합니다. 기존의 BLEU와 같은 자동 평가 지표는 은어, 문화적 맥락 등 미묘한 언어적 차이를 정확히 포착하기 어렵고, 인간 평가는 시간과 비용이 많이 소요된다는 한계가 있습니다.

이에 저는 한국어-다국어 번역 역량을 엄격하게 평가하기 위해 설계된 새로운 벤치마크, KorT를 제안합니다. KorT는 'LLM 기반 평가(LLM-as-a-judge)' 패러다임을 적용하여 대규모 언어 모델(LLM)의 정교한 언어 이해 능력을 평가에 활용합니다. 이를 위해 번역하기 어려운 다양한 문장으로 구성된 데이터셋을 구축했습니다. 이 데이터셋은 여러 도메인과 언어적 현상(예: 중의성, 관용 표현, 문화적 참조 등)을 포괄합니다. 다양한 기계 번역(MT) 모델과 LLM이 생성한 번역 결과는, 평가 프롬프트를 사용하여 고성능 LLM에 의해 평가됩니다.

KorT의 핵심 목표는 기존 자동 평가 지표보다 인간의 판단과 높은 상관관계를 가지면서도 신뢰할 수 있고, 확장 가능하며, 정교한 평가 체계를 구축하는 것입니다. KorT 벤치마크 결과를 기반으로 MT 시스템의 순위를 보여주는 공개 리더보드를 운영할 예정입니다. 이를 통해 현재 번역 기술의 강점과 약점에 대한 통찰력을 제공하고, 특히 한국어와 관련된 까다로운 언어적 맥락에서의 번역 성능 향상을 촉진하고자 합니다. 궁극적으로는 고품질 다국어 기계 번역 기술 발전에 기여하는 것을 목표로 합니다.

About

현재 리더보드는 여기서 확인하실 수 있습니다.
평가 LLM은 claude-3-7-sonnet-20250219 (Reasoning)입니다. (Anthropic측의 지원)
모델 평가를 원하시면 여기 이메일로 문의해 주세요.
만약 자체 평가 프롬프트를 사용하셨다면, 함께 제공해 주시기 바랍니다.

Usage

Install

KorT 설치하기

From Pypi

pip install -U kort-cli[all]

로컬 모델을 위해 Transformers, Torch가 설치됩니다. 이를 원치 않다면 [all]을 빼고 설치해주세요.

pip install -U kort-cli

From Source

직접 설치할 수도 있습니다!

git clone https://github.com/deveworld/kort
cd kort
pip install .[all]

마찬가지로 Transformers, Torch가 설치되는 것을 원치 않으시다면 [all]을 빼고 설치해주세요.

pip install .

With uvx

직접적으로 설치하지 않고도 사용 가능합니다!

Generate

사용 가능한 번역기 목록 확인

python -m kort.scripts.generate -l

번역기를 선택하여 번역 생성

python -m kort.scripts.generate \
    -t openai \
    -n gpt-4.1-mini \
    --api_key sk-xxx

Evaluation

평가 가능한 모델 목록 확인

python -m kort.scripts.evaluate -l

생성된 파일을 입력으로 사용하여 평가 진행

python -m kort.scripts.evaluate \
    -t gemini \
    -n gemini-2.5-pro-preview-03-25 \
    --api_key AIzaxxx \
    --input generated/openai_gpt-4.1-mini.json

Batch Evaluation

Batch API를 사용하여 평가할 경우:

Batch Job 등록

python -m kort.scripts.eval_batch \
    -t claudebatch \
    -n claude-3-7-sonnet-20250219 \
    --api_key sk-ant-api03-xxx \
    --input generated/openai_gpt-4.1-mini.json

Batch Job 완료 후, Job ID를 사용하여 결과 취합

python -m kort.scripts.eval_batch \
    -t claudebatch \
    -n claude-3-7-sonnet-20250219 \
    --api_key sk-ant-api03-xxx \
    --input generated/openai_gpt-4.1-mini.json \
    --job_id msgbatch_xxx

LeaderBoard

아래 명령어로 리더보드 웹 서버 실행

python -m kort.scripts.leaderboard

또는 텍스트로 바로 보기

python -m kort.scripts.leaderboard -t

Contribute

문제가 있다면 주저하지 마시고 GitHub Issue를 등록해주세요.
코드 수정이나 개선 제안은 Pull Request(PR)를 통해 보내주시면 적극적으로 검토하겠습니다! ❤️

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.2

Jun 24, 2025

1.0.1

Jun 23, 2025

1.0.0

Jun 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kort_cli-1.0.2.tar.gz (47.7 kB view details)

Uploaded Jun 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kort_cli-1.0.2-py3-none-any.whl (67.5 kB view details)

Uploaded Jun 24, 2025 Python 3

File details

Details for the file kort_cli-1.0.2.tar.gz.

File metadata

Download URL: kort_cli-1.0.2.tar.gz
Upload date: Jun 24, 2025
Size: 47.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kort_cli-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`6f5c3b1a480bc75a93d488309a64f07c52195897bf8a7b57e2b7362f34ae98de`
MD5	`a7e72430a4ce33e89f87814d46bbd64b`
BLAKE2b-256	`32d7f0e7a568415dc558fb54b19f12ecb16bf2cf28ba21449d145d6b01563849`

See more details on using hashes here.

File details

Details for the file kort_cli-1.0.2-py3-none-any.whl.

File metadata

Download URL: kort_cli-1.0.2-py3-none-any.whl
Upload date: Jun 24, 2025
Size: 67.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kort_cli-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`730f57e40f95239c0682e73abf9583e6d0fc8bc73ee180baa2fe80224b78ff8b`
MD5	`79460d75c063b32d1d53dccc9af7978f`
BLAKE2b-256	`015dccd4d0da33435cd8212e4bd5baa9cbf38fb7625634cd6f6748e9346b81af`

See more details on using hashes here.

kort-cli 1.0.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

KorT

Abstract

About

Usage

Install

From Pypi

From Source

With uvx

Generate

Evaluation

Batch Evaluation

LeaderBoard

Contribute

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes