Skip to main content

한국어 형태소 분석기 - MeCab-Ko Python 바인딩 (Korean Morphological Analyzer)

Project description

mecab-ko-python

Python bindings for MeCab-Ko (Korean morphological analyzer)

Overview

This package provides Python bindings for MeCab-Ko, a Korean morphological analyzer implemented in Rust. The API is compatible with KoNLPy's Mecab interface, providing high-performance Korean morphological analysis with a familiar API.

Features

  • Fast: Rust-based implementation with zero-copy parsing
  • Memory-efficient: Optimized data structures for Korean text processing
  • Thread-safe: Safe concurrent operations
  • KoNLPy-compatible: Drop-in replacement for KoNLPy's Mecab
  • Type hints: Full type annotation support for better IDE integration

Installation

From PyPI (Recommended)

pip install mecab-ko-python

Pre-built wheels are available for:

  • Linux (x86_64, aarch64)
  • macOS (x86_64, Apple Silicon)
  • Windows (x86_64)

From Source

If you need to build from source:

# Install Rust toolchain (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install maturin
pip install maturin

# Build and install
git clone https://github.com/hephaex/mecab-ko.git
cd mecab-ko/rust/crates/mecab-ko-python
maturin develop --release

Usage

from mecab_ko import Mecab

# Create tokenizer instance
mecab = Mecab()

# Extract morphemes
morphemes = mecab.morphs("안녕하세요")
print(morphemes)
# ['안녕', '하', '세요']

# Extract nouns
nouns = mecab.nouns("아버지가방에들어가신다")
print(nouns)
# ['아버지', '가방']

# Part-of-speech tagging
tagged = mecab.pos("나는 학생입니다")
print(tagged)
# [('나', 'NP'), ('는', 'JX'), ('학생', 'NNG'), ('이', 'VCP'), ('ㅂ니다', 'EF')]

# MeCab format output
result = mecab.parse("안녕하세요")
print(result)
# 안녕    NNG,*,*,안녕,*,*,*,*
# 하      XSV,*,*,하,*,*,*,*
# 세요    EF,*,*,세요,*,*,*,*
# EOS

API Reference

Mecab(dicpath=None)

Create a new Mecab tokenizer instance.

Parameters:

  • dicpath (str, optional): Path to dictionary directory

Returns:

  • Mecab: Tokenizer instance

mecab.morphs(text)

Extract morphemes from text.

Parameters:

  • text (str): Input text

Returns:

  • list[str]: List of morphemes

mecab.nouns(text)

Extract nouns from text.

Parameters:

  • text (str): Input text

Returns:

  • list[str]: List of nouns

mecab.pos(text)

Perform part-of-speech tagging.

Parameters:

  • text (str): Input text

Returns:

  • list[tuple[str, str]]: List of (surface, pos_tag) tuples

mecab.parse(text)

Parse text and return MeCab format output.

Parameters:

  • text (str): Input text

Returns:

  • str: MeCab format string with tab-separated values

Korean POS Tags

The analyzer uses the Sejong POS tag set:

  • NNG: General noun (일반 명사)
  • NNP: Proper noun (고유 명사)
  • NP: Pronoun (대명사)
  • VV: Verb (동사)
  • VA: Adjective (형용사)
  • JX: Auxiliary particle (보조사)
  • JKS: Subject particle (주격조사)
  • JKO: Object particle (목적격조사)
  • EF: Final ending (종결어미)
  • And many more...

Performance

The Rust implementation provides significant performance improvements over the original C++ implementation:

  • Fast tokenization with zero-copy parsing
  • Memory-efficient data structures
  • Thread-safe operations

Migration from KoNLPy

If you're currently using KoNLPy's Mecab, you can migrate with minimal changes:

# Before (KoNLPy)
from konlpy.tag import Mecab
mecab = Mecab()

# After (mecab-ko-python)
from mecab_ko import Mecab
mecab = Mecab()

# The API is identical
mecab.morphs("안녕하세요")
mecab.nouns("아버지가방에들어가신다")
mecab.pos("나는 학생입니다")

Development Requirements

This crate uses PyO3 to create Python bindings. Building requires Python development headers.

System Dependencies

Ubuntu/Debian:

sudo apt install python3-dev

Fedora/RHEL:

sudo dnf install python3-devel

macOS (with Homebrew):

brew install python

Windows: Install Python from python.org with "Development headers" option selected.

Build Tools

# Install maturin (PyO3 build tool)
pip install maturin

Building and Testing

# Build and install in development mode
maturin develop

# Build release wheel
maturin build --release

# Run Python tests
maturin develop && pytest tests/

Note: Standard cargo test does not work for this crate because PyO3 cdylib requires Python development headers and a proper Python environment. Use maturin develop followed by pytest instead.

Linting

# Clippy (requires Python dev headers installed)
cargo clippy

# Format
cargo fmt

Publishing to PyPI

This package uses GitHub Actions for automated publishing to PyPI. To publish a new version:

  1. Update the version in Cargo.toml and pyproject.toml
  2. Create a new git tag: git tag v0.1.0 && git push origin v0.1.0
  3. GitHub Actions will automatically build wheels and publish to PyPI

License

This project is licensed under either of:

at your option.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mecab_ko_python-0.4.0.tar.gz (385.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mecab_ko_python-0.4.0-cp313-cp313-macosx_11_0_arm64.whl (358.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file mecab_ko_python-0.4.0.tar.gz.

File metadata

  • Download URL: mecab_ko_python-0.4.0.tar.gz
  • Upload date:
  • Size: 385.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for mecab_ko_python-0.4.0.tar.gz
Algorithm Hash digest
SHA256 68c8d13ce573da733a5bde6f485c914c8777150b6aeae695bb03da6316f7f58e
MD5 bd488f4225dd8cbbbceec90a258c9bec
BLAKE2b-256 033640b7d95935e208a2f9f38379ec9d714db6014cc4274b79b4fa47b447c8f8

See more details on using hashes here.

File details

Details for the file mecab_ko_python-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mecab_ko_python-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8f3e206cacc2fa72099c299f52b612245bcbe34fddd429f0c64b95ce96e20ec4
MD5 71833af2d6b9cf4da0f802b515a0a63e
BLAKE2b-256 35ed7df4c63068584b9d7eb36ece85ecf84a811678f2e46c40a0df74cedab07d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page