一个古汉语分词工具
Project description
(AnChinSeg)古汉语分词及词性标注工具 Word Segmentation and Part-of-speech for Ancient Chinese
基于2022年的分词文章,做了古汉语的分词和词性标注 这是一个非常粗糙朴素的分词和标注词性的工具 词性效果评估如下: P: 92.82 R: 92.85 F: 92.84 分词效果评估如下: P: 97.19 R: 97.22 F: 97.20
Citation
词性标注并没有发表论文,但是如果您使用了我们的工具进行了学术研究,可以引用以下论文,我们是在该论文的基础上实现的
@inproceedings{tang-su-2022-slepen,
title = "That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory",
author = "Tang, Xuemei and
Su, Qi",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.540",
doi = "10.18653/v1/2022.acl-long.540",
pages = "7830--7840",
}
Requirements
环境配置请查看 requriments.txt
How to use it
1)请从百度网盘或者google drive下载模型model.dt放到model文件夹中 baidu链接: https://pan.baidu.com/s/1jIbqk5b4GYBEMAdBPVJwYg 提取码: dac4 google drive: https://drive.google.com/drive/folders/1zFK30h6PQYRDDZ2uEScLy0l5VoC7jXHU?usp=sharing
在文件夹下执行: #python segmenter.py --predict_data ./data/sample_data.txt --output_path ./data/output.txt (./data/sample_data.txt替换为你的需要分词的文件的路径,一行为一个句子,./data/output.txt替换为分词结果的存储位置)
最后分好词的格式如下: 端明殿_NA 学士_NA 兼_VT 翰林侍读_NA 学士_NA 朝散大夫_NA 右谏议大夫_NA 充_VT 集贤院_NA
4)词性标记参考台湾中央研究院 https://lingcorpus.iis.sinica.edu.tw/kiwi/dkiwi/middle_chinese_c_wordtype.html https://lingcorpus.iis.sinica.edu.tw/kiwi/akiwi/ancient_mandarin_chinese_c_wordtype.html
Contact
Please contact us at tangxuemei@polyu.edu.hk if you have any questions. Welcome to Research Center for Digital Humanities of Peking University! https://pkudh.org
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anchinsegmenter-0.6-py3-none-any.whl.
File metadata
- Download URL: anchinsegmenter-0.6-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01dc6ed62a5539dd75e1b2b9f9127abfb6d937b06f3d92e71040712aedb524bd
|
|
| MD5 |
8831e4d1e14d5f5e0c501e547d60b690
|
|
| BLAKE2b-256 |
d55a8e447e37aaa56fbbf39710ccfe5cdfe9b665ff873513f4b3b9377934f605
|