CKIP NLP Wrappers
Project description
CKIP NLP Wrappers (Word Segmentation and Parser)
Introduction
Git
PyPI
Requirements
CkipWS (Optional)
CKIP Word Segmentation Linux version (20190524+)
CkipParser (Optional)
CKIP Parser Linux version (20190506+)
Boost C++ Libraries 1.54.0
Installation
Denote <ckipws-linux-root> as the root path of CKIPWS Linux Version, and <ckipparser-linux-root> as the root path of CKIP-Parser Linux Version.
Step 1: Setup CKIPWS & CKIP-Parser environment
Add below command to ~/.bashrc:
export LD_LIBRARY_PATH=<ckipws-linux-root>/lib:<ckipparser-linux-root>/lib:$LD_LIBRARY_PATH
Step 2: Install Using Pip
pip install pyckip \
--install-option='--ws' \
--install-option='--ws-dir=<ckipws-linux-root>' \
--install-option='--parser' \
--install-option='--parser-dir=<ckipparser-linux-root>'
Ignore ws/parser options if one doesn’t have CKIPWS/CKIP-Parser.
Installation Options
Option |
Detail |
Default Value |
---|---|---|
--[no-]ws |
Enable/disable CKIPWS. |
False |
--[no-]parser |
Enable/disable CKIP-Parser. |
False |
--ws-dir=<ws-dir> |
CKIPWS root directory. |
|
--ws-lib-dir=<ws-lib-dir> |
CKIPWS libraries directory |
<ws-dir>/lib |
--ws-share-dir=<ws-share-dir> |
CKIPWS share directory |
<ws-dir> |
--parser-dir=<parser-dir> |
CKIP-Parser root directory. |
|
--parser-lib-dir=<parser-lib-dir> |
CKIP-Parser libraries directory |
<parser-dir>/lib |
--parser-share-dir=<parser-share-dir> |
CKIP-Parser share directory |
<parser-dir> |
--data2-dir=<data2-dir> |
“Data2” directory |
<ws-share-dir>/Data2 |
--rule-dir=<rule-dir> |
“Rule” directory |
<parser-share-dir>/Rule |
--rdb-dir=<rdb-dir> |
“RDB” directory |
<parser-share-dir>/RDB |
API
CkipWS
class ckipws.CkipWS(
logger = False,
inifile = None,
data2dir = None,
lexfile = None,
new_style_format = False,
show_category = True,
sentence_max_word_num = 80,
)
The CKIP word segmentation driver.
- logger (bool)
enable logger.
- inifile (str)
the path to the INI file.
- data2dir (str)
the path to the folder “Data2/”. (Use $CKIPWS_DATA2 if unset or null.)
- lexfile (str)
the path to the user-defined lexicon file.
- new_style_format (bool)
split sentences by newline characters (”\n”) rather than punctuations.
- show_category (bool)
show part-of-speech tags.
- sentence_max_word_num (int)
maximum number of words per sentence.
def ckipws.CkipWS.__call__(text, unicode=False)
Segment a sentence.
- text (str)
the input sentence.
- unicode (bool)
use Unicode for of input/output encoding; otherwise use system encoding.
- return value (str)
the output sentence.
def ckipws.CkipWS.apply_list(text, unicode=False)
Segment a list of sentences.
- ilist (str)
the list of input sentences (str).
- unicode (bool)
use Unicode for of input/output encoding; otherwise use system encoding.
- return value (str)
the list of output sentences (str).
def ckipws.CkipWS.apply_file(ifile, ofile, uwfile)
Segment a file.
- ifile (str)
the input file.
- ofile (str)
the output file (will be overwritten).
- uwfile (str)
the unknown word file (will be overwritten).
CkipParser
class ckipparser.CkipParser(
logger = False,
inifile = None,
wsinifile = None,
data2dir = None,
ruledir = None,
rdbdir = None,
do_ws = True,
do_parse = True,
do_role = True,
lexfile = None,
new_style_format = False,
show_category = True,
)
The CKIP parser driver.
- logger (bool)
enable logger (logger is not support in parser).
- inifile (str)
the path to the INI file.
- wsinifile (str)
the path to the INI file.
- data2dir (str)
the path to the folder “Data2/”. (Use $CKIPWS_DATA2 if unset or null.)
- ruledir (str)
the path to the folder “Rule/”. (Use $CKIPPARSER_RULE if unset or null.)
- rdbdir (str)
the path to the folder “RDB/”. (Use $CKIPPARSER_RDB if unset or null.)
- do_ws (bool)
do word-segmentation.
- do_parse (bool)
do parsing.
- do_role (bool)
do role.
- lexfile (str)
the path to the user-defined lexicon file.
- new_style_format (bool)
split sentences by newline characters (”\n”) rather than punctuations.
- show_category (bool)
show part-of-speech tags.
def ckipparser.CkipParser.__call__(text, unicode=False)
Segment a sentence.
- text (str)
the input sentence.
- unicode (bool)
use Unicode for of input/output encoding; otherwise use system encoding.
- return value (str)
the output sentence.
def ckipparser.CkipParser.apply_list(text, unicode=False)
Segment a list of sentences.
- ilist (str)
the list of input sentences (str).
- unicode (bool)
use Unicode for of input/output encoding; otherwise use system encoding.
- return value (str)
the list of output sentences (str).
def ckipparser.CkipParser.apply_file(ifile, ofile)
Segment a file.
- ifile (str)
the input file.
- ofile (str)
the output file (will be overwritten).
FAQ
The CKIPWS throws “what(): locale::facet::_S_create_c_locale name not valid”. What should I do?
apt-get install locales-all
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.