Unsupervised Korean Natural Language Processing Toolkits
Project description
It contains unsupervised word extraction, tokenizers and noun extractors.
These algorithms are not depending training corpus but extract patterns from data by theirselves.
Current version has follows
- Word extraction
- Cohesion score
- Branching Entropy
- Accessor Variety
- Tokenizers
- RegexTokenizer
- LTokenizer
- MaxScoreTokenizer
- Noun extractor
- LRNounExtractor
Following packages are helpful
- krwordrank: Unsupervised Korean word/keyword extractor
- https://github.com/lovit/KR-WordRank
- pip install krwordrank
- soyspacing: Korean spacing error corrector
- https://github.com/lovit/soyspacing
- pip install soyspacing
These algorithms are not depending training corpus but extract patterns from data by theirselves.
Current version has follows
- Word extraction
- Cohesion score
- Branching Entropy
- Accessor Variety
- Tokenizers
- RegexTokenizer
- LTokenizer
- MaxScoreTokenizer
- Noun extractor
- LRNounExtractor
Following packages are helpful
- krwordrank: Unsupervised Korean word/keyword extractor
- https://github.com/lovit/KR-WordRank
- pip install krwordrank
- soyspacing: Korean spacing error corrector
- https://github.com/lovit/soyspacing
- pip install soyspacing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soynlp-0.0.1.tar.gz
(7.8 kB
view hashes)
Built Distribution
soynlp-0.0.1-py3-none-any.whl
(9.8 kB
view hashes)