A text processing tools
Project description
依赖
python=3.8.5
spacy==2.3.1 en-core-web-sm==2.3.0 marshmallow== 3.15.0 transformers==4.17.0 setuptools-scm==6.4.2 seqeval == 1.2.2
序列化和反序列化
https://marshmallow.readthedocs.io/en/stable/ https://www.7forz.com/3694/
数据结构
{
"id": 1,
"document": "xxxx",
"": ""
}
信息抽取
实体抽取, 关系抽取,事件抽取, 属性抽取 以brat标注为例子: 标注文件开头标志 Entity: T
[entities]
Protein
Entity
T8 Negative_regulation 659 668 deficient
T9 Gene_expression 684 694 expression
{
"entities":[{"mention": "expression",
"type": "Gene_expression",
"start": 447,
"end": 457,
"id": "T1"}]
}
Rlation: R
[relations]
Protein-Component Arg1:Protein, Arg2:Entity
Subunit-Complex Arg1:Protein, Arg2:Entity
R1 Protein-Component Arg1:T11 Arg2:T19
R2 Protein-Component Arg1:T11 Arg2:T18
## 暂时不支持
Equiv Arg1:Protein, Arg2:Protein, <REL-TYPE>:symmetric-transitive
* Equiv T3 T4
{"relations": [{"type": "Part-of",
"arg1": {"mention": "c-Rel","type": "Protein","start": 139,"end": 144,"id": "T1"},
"arg2": {"mention": "NF-kappa B","type": "Complex", "start": 163, "end": 173, "id": "T2"},
"id": "R1"}]}
Event: E 暂时不支持
[events]
Gene_expression Theme:Protein
Binding Theme+:Protein
E3 Binding:T9 Theme:T4 Theme2:T5 Theme3:T6
E4 Binding:T20 Theme:T16 Theme2:T17 Theme3:T19
## 暂时不支持
E6 Negative_regulation:T10 Cause:E3 Theme:E5
Attribute: A 暂时不支持
[attributes]
Negation Arg:<EVENT>
Confidence Arg:<EVENT>, Value:Possible|Likely|Certain
解析不同格式文件,到统一的序列格式,
文件导出 yaml json pickle txt ann
功能
- 解析
- 数据格式转换
- 导出
- 可视化
- 统计数据 (单词词频, 词表大小, 总共单词数量, 标签类型, 标签个数, 标签频数)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
liyi-cute-0.0.1.tar.gz
(39.1 kB
view hashes)
Built Distribution
liyi_cute-0.0.1-py3-none-any.whl
(37.9 kB
view hashes)
Close
Hashes for liyi_cute-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6923282660fc64e2dbf0c2bba8311f88d4f8cea57b20ff3ba04229907baa6b94 |
|
MD5 | 3953fb0bb1e6901949a496448dc737ff |
|
BLAKE2b-256 | c46d3cb1b2d2966c6bbd3843900321a0d933b62e7648c3f7335410baad39e84a |