Skip to main content

CKIP NLP Wrappers

Project description

CKIP NLP Wrappers (Word Segmentation and Parser)

Introduction

Git

https://github.com/emfomy/pyckip

Github Release Github License Github Forks Github Stars Github Watchers

PyPI

https://pypi.org/project/pyckip

Pypi Version Pypi License Pypi Format Pypi Python Pypi Implementation Pypi Status

Author

Requirements

CkipWS (Optional)

CkipParser (Optional)

Installation

Denote <ckipws-linux-root> as the root path of CKIPWS Linux Version, and <ckipparser-linux-root> as the root path of CKIP-Parser Linux Version.

Step 1: Setup CKIPWS & CKIP-Parser environment

Add below command to ~/.bashrc:

export LD_LIBRARY_PATH=<ckipws-linux-root>/lib:<ckipparser-linux-root>/lib:$LD_LIBRARY_PATH

Step 2: Install Using Pip

pip install pyckip \
   --install-option='--ws' \
   --install-option='--ws-dir=<ckipws-linux-root>' \
   --install-option='--parser' \
   --install-option='--parser-dir=<ckipparser-linux-root>'

Ignore ws/parser options if one doesn’t have CKIPWS/CKIP-Parser.

Installation Options

Option

Detail

Default Value

--[no-]ws

Enable/disable CKIPWS.

False

--[no-]parser

Enable/disable CKIP-Parser.

False

--ws-dir=<ws-dir>

CKIPWS root directory.

--ws-lib-dir=<ws-lib-dir>

CKIPWS libraries directory

<ws-dir>/lib

--ws-share-dir=<ws-share-dir>

CKIPWS share directory

<ws-dir>

--parser-dir=<parser-dir>

CKIP-Parser root directory.

--parser-lib-dir=<parser-lib-dir>

CKIP-Parser libraries directory

<parser-dir>/lib

--parser-share-dir=<parser-share-dir>

CKIP-Parser share directory

<parser-dir>

--data2-dir=<data2-dir>

“Data2” directory

<ws-share-dir>/Data2

--rule-dir=<rule-dir>

“Rule” directory

<parser-share-dir>/Rule

--rdb-dir=<rdb-dir>

“RDB” directory

<parser-share-dir>/RDB

API

CkipWS

class ckipws.CkipWS(
   logger                = False,
   inifile               = None,
   data2dir              = None,
   lexfile               = None,
   new_style_format      = False,
   show_category         = True,
   sentence_max_word_num = 80,
)

The CKIP word segmentation driver.

logger (bool)

enable logger.

inifile (str)

the path to the INI file.

data2dir (str)

the path to the folder “Data2/”. (Use $CKIPWS_DATA2 if unset or null.)

lexfile (str)

the path to the user-defined lexicon file.

new_style_format (bool)

split sentences by newline characters (”\n”) rather than punctuations.

show_category (bool)

show part-of-speech tags.

sentence_max_word_num (int)

maximum number of words per sentence.


def ckipws.CkipWS.__call__(text, unicode=False)

Segment a sentence.

text (str)

the input sentence.

unicode (bool)

use Unicode for of input/output encoding; otherwise use system encoding.

return value (str)

the output sentence.


def ckipws.CkipWS.apply_list(text, unicode=False)

Segment a list of sentences.

ilist (str)

the list of input sentences (str).

unicode (bool)

use Unicode for of input/output encoding; otherwise use system encoding.

return value (str)

the list of output sentences (str).


def ckipws.CkipWS.apply_file(ifile, ofile, uwfile)

Segment a file.

ifile (str)

the input file.

ofile (str)

the output file (will be overwritten).

uwfile (str)

the unknown word file (will be overwritten).

CkipParser

class ckipparser.CkipParser(
   logger           = False,
   inifile          = None,
   wsinifile        = None,
   data2dir         = None,
   ruledir          = None,
   rdbdir           = None,
   do_ws            = True,
   do_parse         = True,
   do_role          = True,
   lexfile          = None,
   new_style_format = False,
   show_category    = True,
)

The CKIP parser driver.

logger (bool)

enable logger (logger is not support in parser).

inifile (str)

the path to the INI file.

wsinifile (str)

the path to the INI file.

data2dir (str)

the path to the folder “Data2/”. (Use $CKIPWS_DATA2 if unset or null.)

ruledir (str)

the path to the folder “Rule/”. (Use $CKIPPARSER_RULE if unset or null.)

rdbdir (str)

the path to the folder “RDB/”. (Use $CKIPPARSER_RDB if unset or null.)

do_ws (bool)

do word-segmentation.

do_parse (bool)

do parsing.

do_role (bool)

do role.

lexfile (str)

the path to the user-defined lexicon file.

new_style_format (bool)

split sentences by newline characters (”\n”) rather than punctuations.

show_category (bool)

show part-of-speech tags.


def ckipparser.CkipParser.__call__(text, unicode=False)

Segment a sentence.

text (str)

the input sentence.

unicode (bool)

use Unicode for of input/output encoding; otherwise use system encoding.

return value (str)

the output sentence.


def ckipparser.CkipParser.apply_list(text, unicode=False)

Segment a list of sentences.

ilist (str)

the list of input sentences (str).

unicode (bool)

use Unicode for of input/output encoding; otherwise use system encoding.

return value (str)

the list of output sentences (str).


def ckipparser.CkipParser.apply_file(ifile, ofile)

Segment a file.

ifile (str)

the input file.

ofile (str)

the output file (will be overwritten).

FAQ

  • The CKIPWS throws “what(): locale::facet::_S_create_c_locale name not valid”. What should I do?

apt-get install locales-all

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyckip-0.4.2.tar.gz (12.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page