Named Entity Segmentation
Project description
Named Entity Segmentation
简介
本项目是字符串令牌流分割库; neseg -n 中国北京市联想科技有限公司 -d dict
功能
- 字符串令牌解析;
- 支持令牌流;
- 解析器可以是自定义字典机械分割,每个token一个独立字典;
- 解析器也可以是正则表达式;
- 分割分正向和反向,都是从头开始;
- 生成对应令牌名称和解析出来的字符创元组,最后剩下的归为一组;
应用场景
- 各种名称的解析,如中文机构名、药品名称、地址的分割标注;
TODO
- 设计参考re.scanner;
- 可以用生成器yield来做技术实现;
- 程序返回元组列表;
附录 - 源码文件说明
neseg
/lib
FMM.py 正向切词
RMM.py 反向切词
seg.py
main.py 主程序:无界面,参数命令行
changelog.md 软件更新日志
readme.md 软件使用、安装指南
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
neseg-0.7.2.tar.gz
(4.9 kB
view details)
File details
Details for the file neseg-0.7.2.tar.gz
.
File metadata
- Download URL: neseg-0.7.2.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae7f4b5bce95e431e96d1a1f114c67a4cfad9e87af180290f3514dffd759f6c6 |
|
MD5 | 67de008eb6fc5be2f9c1e4b2a5f64f73 |
|
BLAKE2b-256 | c16182aed97ab2820feca405170acdfaa5f0db9938afa1ff267911cf7440e825 |