Coding Makes the Life Easier

Project description

Documentation Status


Coding makes the life easier. This is a factory contains commonly used algorithms and useful links.


Available online documents: latest and develop.


Install oujago using pip:

$> pip install oujago

Install from source code:

$> python clean --all install

Download data from BaiDuYun:


Natural Language Processing

Hanzi Converter


>>> from oujago.nlp import FJConvert
>>> FJConvert.to_tradition('繁简转换器')
>>> FJConvert.to_simplify('繁簡轉換器')
>>> FJConvert.same('繁简转换器', '繁簡轉換器')
>>> True
>>> FJConvert.same('繁简转换器', '繁簡轉換')
>>> False
Chinese Segment

Support jieba, LTP, thulac, pynlpir etc. public segmentation methods.

>>> from oujago.nlp import seg
>>> sentence = "这是一个伸手不见五指的黑夜。我叫孙悟空,我爱北京,我爱Python和C++。"
>>> seg(sentence, mode='ltp')
['这', '是', '一个', '伸手', '不', '见', '五', '指', '的', '黑夜', '。', '我', '叫', '孙悟空',
',', '我', '爱', '北京', ',', '我', '爱', 'Python', '和', 'C', '+', '+', '。']
>>> seg(sentence, mode='jieba')
['这是', '一个', '伸手不见五指', '的', '黑夜', '。', '我', '叫', '孙悟空', ',', '我', '爱',
'北京', ',', '我', '爱', 'Python', '和', 'C++', '。']
>>> seg(sentence, mode='thulac')
['这', '是', '一个', '伸手不见五指', '的', '黑夜', '。', '我', '叫', '孙悟空', ',',
'我', '爱', '北京', ',', '我', '爱', 'Python', '和', 'C', '+', '+', '。']
>>> seg(sentence, mode='nlpir')
['这', '是', '一个', '伸手', '不见', '五指', '的', '黑夜', '。', '我', '叫', '孙悟空',
',', '我', '爱', '北京', ',', '我', '爱', 'Python', '和', 'C++', '。']
>>> seg("这是一个伸手不见五指的黑夜。")
['这是', '一个', '伸手不见五指', '的', '黑夜', '。']
>>> seg("这是一个伸手不见五指的黑夜。", mode='ltp')
['这', '是', '一个', '伸手', '不', '见', '五', '指', '的', '黑夜', '。']
>>> seg('我不喜欢日本和服', mode='jieba')
['我', '不', '喜欢', '日本', '和服']
>>> seg('我不喜欢日本和服', mode='ltp')
['我', '不', '喜欢', '日本', '和服']
>>> from oujago.nlp.postag import pos
>>> pos('我不喜欢日本和服', mode='jieba')
['r', 'd', 'v', 'ns', 'nz']
>>> pos('我不喜欢日本和服', mode='ltp')
['r', 'd', 'v', 'ns', 'n']

Change Log

0.1.9 (2017.07.06)
  • NLP moran NER
  • NLP thulac segment
  • NLP thulac postag
0.1.8 (2017.06.26)
  • NLP moran segment
  • NLP moran postag
0.1.7 (2017.06.20)
  • NLP jieba segment
  • NLP LTP segment
  • NLP jieba POSTag
  • NLP LTP Dependecy Parse
  • NLP LTP Semantic Role Labeling
0.1.6 (2017.06.19)
  • Hanzi Converter
  • Chinese Stopwords

