Skip to main content

A lightweight natural language understanding library

Project description

lightNLU

一个小巧简单的基于模板匹配的自然语言理解框架。

简介

一个基于Python实现的小巧简单的基于模板匹配的自然语言理解框架。 这里的自然语言理解仅指意图识别和词槽提取。

安装

pip install lightnlu

特性

  • 特别轻量
  • 模板文件使用yml格式
  • 支持多源数据导入
  • 模板语法简明易懂

使用示例

step1:定制词表规则

如编写words.yml文件如下:

-
  name: person
  aliases:
    - 人物
  type: json
  config:
    path: data/person.json
-
  name: place
  aliases:
    - 地点
    - 位置
    - 城市
    - 区域
  type: csv
  config:
    path: data/place.csv
-
  name: relation
  aliases: []
  type: yml
  config:
    path: data/relation.yml
-
  name: predicate
  aliases: []
  type: yml
  config:
    path: data/predicate.yml

其中对应的各yml、json、csv文件内容如下:

person.json中内容如下:

{"name": "曹操", "id": "1"}
{"name": "刘备", "id": "2"}
{"name": "诸葛亮", "id": "3"}
{"name": "曹丕", "id": "4"}
{"name": "曹植", "id": "5"}

place.csv中内容如下:

name,id
洛阳,1
长安,2
新野,3
赤壁,4
宛城,5

relation.yml中内容如下:

son:
  - 儿子
father:
  - 父亲
  - 爸爸

predicatel.yml中内容如下:

is:
  - 
  - 
isnot:
  - 不是
  - 不为

step2:定制模板规则

如编写pattern.yml文件如下:

-
  name: father_son_relation
  patterns:
    -
      - [person, ~, son] # 规则为 [类型, id值, 词槽名称]
      - [relation, father, ~]
      - [predicate, is, null]
      - [person, ~, father]
    -
      - [ person, ~, father ]
      - [ predicate, is, null ]
      - [ person, ~, son ]
      - [ relation, father, ~ ]
-
  name: test
  patterns:
    -
      - [person, ~, person]
      - [ predicate, is, null ]
      - ['@person', ~, ttt]

在以上的模板规则中,对于每一个模板规则,需要指定其名字(name)及相应的模板(patterns)。 由于存在多个相近但不相同的模板对应同一种意图及词槽,所以这里的patterns是一个列表。 在以上的pattern.yml文件中,包含一个'@person',这里可以映射到person这个类别所对应的所有别名,具体来说,可以对应到["人物"]列表中的所有词汇。

step3:编写源代码及触发函数

示例如下:

# -*- coding: utf-8 -*-
import os
import sys

project_path = os.path.abspath(os.path.join(__file__, "../.."))

sys.path.insert(0, project_path)

from lightnlu.core import NER, Rule

if __name__ == '__main__':
    path = os.path.join(project_path, 'data/words.yml')
    ner = NER()
    ner.build_from_yml(path, base_dir=project_path)
    print(ner.entities)
    path = os.path.join(project_path, 'data/pattern.yml')
    rule = Rule()
    rule.build_from_yml(path)

    @rule.bind(rule_name="father_son_relation", act_name="test")
    def test(father: str, son: str):
        return {
            "father": father,
            "son": son
        }


    @rule.bind(rule_name="test", act_name="ppp")
    def ppp(person: str, ttt: str):
        return {
            "person": person,
            "ttt": ttt
        }

    print(rule.actors)

    text = "刘备和诸葛亮在新野旅游,途中遇上了曹操"
    slots = ner.extract(text)
    print(slots)
    print(rule.match(slots))

    text = "曹丕的父亲是曹操"
    slots = ner.extract(text)
    print(rule.match(slots))
    print(rule.match_and_act(slots))

    text = "曹操是曹丕的父亲"
    slots = ner.extract(text)
    print(rule.match(slots))
    print(rule.match_and_act(slots))

    text = "曹操是个人物"
    slots = ner.extract(text)
    print(rule.match(slots))
    print(rule.match_and_act(slots))

执行结果如下:

defaultdict(<function default_type at 0x7f88a96b00d0>, {'曹操': [{'type': 'person', 'id': '1'}], '刘备': [{'type': 'person', 'id': '2'}], '诸葛亮': [{'type': 'person', 'id': '3'}], '曹丕': [{'type': 'person', 'id': '4'}], '曹植': [{'type': 'person', 'id': '5'}], '人物': [{'type': '@person', 'id': None}], '洛阳': [{'type': 'place', 'id': '1'}], '长安': [{'type': 'place', 'id': '2'}], '新野': [{'type': 'place', 'id': '3'}], '赤壁': [{'type': 'place', 'id': '4'}], '宛城': [{'type': 'place', 'id': '5'}], '地点': [{'type': '@place', 'id': None}], '位置': [{'type': '@place', 'id': None}], '城市': [{'type': '@place', 'id': None}], '区域': [{'type': '@place', 'id': None}], '电站': [{'type': 'ban_words', 'id': ''}], '正在站': [{'type': 'ban_words', 'id': ''}], '引流线': [{'type': 'ban_words', 'id': ''}], '子导线': [{'type': 'ban_words', 'id': ''}], '甲母线': [{'type': 'ban_words', 'id': ''}], '规则': [{'type': '@ban_words', 'id': None}], '所属厂站': [{'type': 'attr', 'id': 'attr_ST_ID'}], '所属电厂': [{'type': 'attr', 'id': 'attr_ST_ID'}], '属于哪个厂站': [{'type': 'attr', 'id': 'attr_ST_ID'}], '属于哪个电厂': [{'type': 'attr', 'id': 'attr_ST_ID'}], '电压等级': [{'type': 'attr', 'id': 'attr_VOLTAGE_TYPE'}], '儿子': [{'type': 'relation', 'id': 'son'}], '父亲': [{'type': 'relation', 'id': 'father'}], '爸爸': [{'type': 'relation', 'id': 'father'}], '是': [{'type': 'predicate', 'id': 'is'}], '为': [{'type': 'predicate', 'id': 'is'}], '不是': [{'type': 'predicate', 'id': 'isnot'}], '不为': [{'type': 'predicate', 'id': 'isnot'}]})
defaultdict(<class 'dict'>, {'father_son_relation': {'test': <function test at 0x7f88a967e940>}, 'test': {'ppp': <function ppp at 0x7f88a967e9d0>}})
[('刘备', {'type': 'person', 'id': '2'}, 0, 2), ('诸葛亮', {'type': 'person', 'id': '3'}, 3, 6), ('新野', {'type': 'place', 'id': '3'}, 7, 9), ('曹操', {'type': 'person', 'id': '1'}, 17, 19)]
[]
[{'name': 'father_son_relation', 'slots': {'son': '曹丕', 'father': '曹操'}}]
{'father_son_relation': {'test': {'father': '曹操', 'son': '曹丕'}}}
[{'name': 'father_son_relation', 'slots': {'father': '曹操', 'son': '曹丕'}}]
{'father_son_relation': {'test': {'father': '曹操', 'son': '曹丕'}}}
[{'name': 'test', 'slots': {'person': '曹操', 'ttt': '人物'}}]
{'test': {'ppp': {'person': '曹操', 'ttt': '人物'}}}

注意事项

  1. csv文件和json文件中必须包含name和id两个属性或列。

参考

  1. keyue123/poemElasticDemo: 基于Elasticsearch的KBQA
  2. liuhuanyong/QAonMilitaryKG: QAonMilitaryKG,QaSystem based on military knowledge graph that stores in mongodb which is different from the previous one, 基于mongodb存储的军事领域知识图谱问答项目,包括飞行器、太空装备等8大类,100余小类,共计5800项的军事武器知识库,该项目不使用图数据库进行存储,通过jieba进行问句解析,问句实体项识别,基于查询模板完成多类问题的查询,主要是提供一种工业界的问答思想demo。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightNLU-0.1.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

lightNLU-0.1.1-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file lightNLU-0.1.1.tar.gz.

File metadata

  • Download URL: lightNLU-0.1.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for lightNLU-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dc381b3d727d41175a438c0f0a7d4cef85320108a92c11323ba7f3580b47a14e
MD5 1156d5e569b32a0bdfcae4d4049553b3
BLAKE2b-256 5f6136736c0c2ebee86121faeba2bc53a48c7ec2505549989dd326364127f641

See more details on using hashes here.

File details

Details for the file lightNLU-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: lightNLU-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for lightNLU-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0febe2be4b4e5d85d31a2efcedb1875c966cb17aaaeb8660adbf6aa4220934f5
MD5 db72ba4bc9e8ab265521b58e8bfe24ae
BLAKE2b-256 b10c09a6f077bac54b5bf2fc5324c02311aeebdc88d556a7f916584dc3547c4a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page