A lightweight natural language understanding library
Project description
lightNLU
一个小巧简单的基于模板匹配的自然语言理解框架。
简介
一个基于Python实现的小巧简单的基于模板匹配的自然语言理解框架。 这里的自然语言理解仅指意图识别和词槽提取。
安装
pip install lightnlu
特性
- 特别轻量
- 模板文件使用yml格式
- 支持多源数据导入
- 模板语法简明易懂
使用示例
step1:定制词表规则
如编写words.yml文件如下:
-
name: person
aliases:
- 人物
type: json
config:
path: data/person.json
-
name: place
aliases:
- 地点
- 位置
- 城市
- 区域
type: csv
config:
path: data/place.csv
-
name: relation
aliases: []
type: yml
config:
path: data/relation.yml
-
name: predicate
aliases: []
type: yml
config:
path: data/predicate.yml
其中对应的各yml、json、csv文件内容如下:
person.json中内容如下:
{"name": "曹操", "id": "1"}
{"name": "刘备", "id": "2"}
{"name": "诸葛亮", "id": "3"}
{"name": "曹丕", "id": "4"}
{"name": "曹植", "id": "5"}
place.csv中内容如下:
name,id
洛阳,1
长安,2
新野,3
赤壁,4
宛城,5
relation.yml中内容如下:
son:
- 儿子
father:
- 父亲
- 爸爸
predicatel.yml中内容如下:
is:
- 是
- 为
isnot:
- 不是
- 不为
step2:定制模板规则
如编写pattern.yml文件如下:
-
name: father_son_relation
patterns:
-
- [person, ~, son] # 规则为 [类型, id值, 词槽名称]
- [relation, father, ~]
- [predicate, is, null]
- [person, ~, father]
-
- [ person, ~, father ]
- [ predicate, is, null ]
- [ person, ~, son ]
- [ relation, father, ~ ]
-
name: test
patterns:
-
- [person, ~, person]
- [ predicate, is, null ]
- ['@person', ~, ttt]
在以上的模板规则中,对于每一个模板规则,需要指定其名字(name)及相应的模板(patterns)。
由于存在多个相近但不相同的模板对应同一种意图及词槽,所以这里的patterns是一个列表。
在以上的pattern.yml文件中,包含一个'@person',这里可以映射到person这个类别所对应的所有别名,具体来说,可以对应到["人物"]列表中的所有词汇。
step3:编写源代码及触发函数
示例如下:
# -*- coding: utf-8 -*-
import os
import sys
project_path = os.path.abspath(os.path.join(__file__, "../.."))
sys.path.insert(0, project_path)
from lightnlu.core import NER, Rule
if __name__ == '__main__':
path = os.path.join(project_path, 'data/words.yml')
ner = NER()
ner.build_from_yml(path, base_dir=project_path)
print(ner.entities)
path = os.path.join(project_path, 'data/pattern.yml')
rule = Rule()
rule.build_from_yml(path)
@rule.bind(rule_name="father_son_relation", domain="relation")
def test(slots: dict, text: str):
return {
"slots": slots,
"text": text
}
@rule.bind(rule_name="test", domain="hello_world")
def ppp(slots: dict, text: str):
return "slots: {}, text: {}".format(slots, text)
print(rule.actors)
text = "刘备和诸葛亮在新野旅游,途中遇上了曹操"
domain = "relation"
slots = ner.extract(text)
print(slots)
print(rule.match(slots, domain=domain))
text = "曹丕的父亲是曹操"
domain = "relation"
slots = ner.extract(text)
print(rule.match(slots, domain=domain))
print(rule.match_and_act(slots, domain=domain, text=text))
text = "曹操是曹丕的父亲"
domain = "relation"
slots = ner.extract(text)
print(rule.match(slots, domain=domain))
print(rule.match_and_act(slots, domain=domain, text=text))
text = "曹操是个人物"
domain = "hello_world"
slots = ner.extract(text)
print(rule.match(slots, domain=domain))
print(rule.match_and_act(slots, domain=domain, text=text))
执行结果如下:
defaultdict(<function default_type at 0x7fbba1df5670>, {'曹操': [{'type': 'person', 'id': '1'}], '刘备': [{'type': 'person', 'id': '2'}], '诸葛亮': [{'type': 'person', 'id': '3'}], '曹丕': [{'type': 'person', 'id': '4'}], '曹植': [{'type': 'person', 'id': '5'}], '人物': [{'type': '@person', 'id': None}], '洛阳': [{'type': 'place', 'id': '1'}], '长安': [{'type': 'place', 'id': '2'}], '新野': [{'type': 'place', 'id': '3'}], '赤壁': [{'type': 'place', 'id': '4'}], '宛城': [{'type': 'place', 'id': '5'}], '地点': [{'type': '@place', 'id': None}], '位置': [{'type': '@place', 'id': None}], '城市': [{'type': '@place', 'id': None}], '区域': [{'type': '@place', 'id': None}], '电站': [{'type': 'ban_words', 'id': ''}], '正在站': [{'type': 'ban_words', 'id': ''}], '引流线': [{'type': 'ban_words', 'id': ''}], '子导线': [{'type': 'ban_words', 'id': ''}], '甲母线': [{'type': 'ban_words', 'id': ''}], '规则': [{'type': '@ban_words', 'id': None}], '所属厂站': [{'type': 'attr', 'id': 'attr_ST_ID'}], '所属电厂': [{'type': 'attr', 'id': 'attr_ST_ID'}], '属于哪个厂站': [{'type': 'attr', 'id': 'attr_ST_ID'}], '属于哪个电厂': [{'type': 'attr', 'id': 'attr_ST_ID'}], '电压等级': [{'type': 'attr', 'id': 'attr_VOLTAGE_TYPE'}], '儿子': [{'type': 'relation', 'id': 'son'}], '父亲': [{'type': 'relation', 'id': 'father'}], '爸爸': [{'type': 'relation', 'id': 'father'}], '是': [{'type': 'predicate', 'id': 'is'}], '为': [{'type': 'predicate', 'id': 'is'}], '不是': [{'type': 'predicate', 'id': 'isnot'}], '不为': [{'type': 'predicate', 'id': 'isnot'}]})
defaultdict(<function _helper_func at 0x7fbba247d280>, {'relation': defaultdict(<class 'dict'>, {'father_son_relation': {'test': <function test at 0x7fbba1db5d30>}}), 'hello_world': defaultdict(<class 'dict'>, {'test': {'ppp': <function ppp at 0x7fbba1d7f040>}})})
[('刘备', {'type': 'person', 'id': '2'}, 0, 2), ('诸葛亮', {'type': 'person', 'id': '3'}, 3, 6), ('新野', {'type': 'place', 'id': '3'}, 7, 9), ('曹操', {'type': 'person', 'id': '1'}, 17, 19)]
[]
[{'name': 'father_son_relation', 'slots': {'son': {'word': '曹丕', 'type': 'person', 'id': '4', 'left': 0, 'right': 2}, 'father': {'word': '曹操', 'type': 'person', 'id': '1', 'left': 6, 'right': 8}}}]
{'father_son_relation': {'test': {'slots': {'son': {'word': '曹丕', 'type': 'person', 'id': '4', 'left': 0, 'right': 2}, 'father': {'word': '曹操', 'type': 'person', 'id': '1', 'left': 6, 'right': 8}}, 'text': '曹丕的父亲是曹操'}}}
[{'name': 'father_son_relation', 'slots': {'father': {'word': '曹操', 'type': 'person', 'id': '1', 'left': 0, 'right': 2}, 'son': {'word': '曹丕', 'type': 'person', 'id': '4', 'left': 3, 'right': 5}}}]
{'father_son_relation': {'test': {'slots': {'father': {'word': '曹操', 'type': 'person', 'id': '1', 'left': 0, 'right': 2}, 'son': {'word': '曹丕', 'type': 'person', 'id': '4', 'left': 3, 'right': 5}}, 'text': '曹操是曹丕的父亲'}}}
[{'name': 'test', 'slots': {'person': {'word': '曹操', 'type': 'person', 'id': '1', 'left': 0, 'right': 2}, 'ttt': {'word': '人物', 'type': '@person', 'id': None, 'left': 4, 'right': 6}}}]
{'test': {'ppp': "slots: {'person': {'word': '曹操', 'type': 'person', 'id': '1', 'left': 0, 'right': 2}, 'ttt': {'word': '人物', 'type': '@person', 'id': None, 'left': 4, 'right': 6}}, text: 曹操是个人物"}}
注意事项
- csv文件和json文件中必须包含name和id两个属性或列。
更新日志
- v0.1.1 初始版本
- v0.2.0 增加域(domain)这一概念
参考
- keyue123/poemElasticDemo: 基于Elasticsearch的KBQA
- liuhuanyong/QAonMilitaryKG: QAonMilitaryKG,QaSystem based on military knowledge graph that stores in mongodb which is different from the previous one, 基于mongodb存储的军事领域知识图谱问答项目,包括飞行器、太空装备等8大类,100余小类,共计5800项的军事武器知识库,该项目不使用图数据库进行存储,通过jieba进行问句解析,问句实体项识别,基于查询模板完成多类问题的查询,主要是提供一种工业界的问答思想demo。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lightNLU-0.2.2.tar.gz.
File metadata
- Download URL: lightNLU-0.2.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f57c614b26d865ef717a65c62b8eda17e22fb0d8178a21d2ff6c63e463a68da9
|
|
| MD5 |
d74d554f8d8027edb13e5b01a19e7371
|
|
| BLAKE2b-256 |
778a98985bb289ec897f75d2260d5dbf8d85b75c04e46bb76448d78ef21775d7
|
File details
Details for the file lightNLU-0.2.2-py3-none-any.whl.
File metadata
- Download URL: lightNLU-0.2.2-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b3482537dbaa9a97810778642aaf5a6ec301fa09c633207123f66feeb81c675
|
|
| MD5 |
454b2c94de49fb07e1c285e3d8e991de
|
|
| BLAKE2b-256 |
fa1e2c93a1570e3bb17225b6d0f651327fa6e1834574fba181a3b78b776eb61f
|