No project description provided
Project description
普强内部NLP数据存储分享工具
安装
- pip install nlp-data
使用
-
Store的使用
# Store相当于是S3对象存储的一个Bucket的封装,每个数据类型对应一个Bucket from nlp_data import NLUDocStore # 查看文档 NLUDocStore.list() # 获取文档 docs = NLUDocStore.pull('xxx') # 推送文档 NLUDocStore.push(docs=docs, name='xxx')
-
Doc的使用
# Doc是nlp-data的一个存储结构,可以用来存储该格式的数据,以及对数据进行一些操作 # DocList是Doc的集合,可以用来存储多个Doc,相当于一个python List,有几本的append,extend等类方法, 但不同的DocList有特定的方法用来处理# 该数据类型 # 以NLUDoc为例,该文档里面有domain,slots,intention等字段,可以用来存储NLU的结果 from nlp_data import NLUDoc, NLUDocList # 创建一个NLUDoc doc = NLUDoc(text='添加明天上午跟张三开会的提醒') doc.set_domain('schedule_cmn') doc.set_intention('add_schedule') doc.set_slot(text='明天上午', label='date') doc.set_slot(text='跟张三开会', label='title') # 创建一个NLUDocList,并添加doc docs = NLUDocList() docs.append(doc) # 从abnf句式输出文件中批量初始化 docs = NLUDocList.from_abnf_output(output_dir='your/dir', domain='schedule_cmn') # 上传到bucket from nlp_data import NLUDocStore NLUDocStore.push(docs=docs, name='xxx')
-
Augmentor的使用
# Augmentor是nlp-data的一个数据增强工具,可以用来对数据进行增强 from nlp_data import GPTAugmentor, NLUDocStore, DialogueDocList, DialogueDoc # 创建一个Augmentor augmentor = GPTAugmentor(api_key='xxx') # 广东话或者四川话增强NLUDoc docs = NLUDocStore.pull('xxx') aug_docs = augmentor.augment_nlu_by_localism(docs, '广东话') # 根据主题和情景生成多轮对话 dialogue_docs = augmentor.generate_dialogue_docs(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行语音交互') # 对多轮对话数据增强 dialogue_docs = DialogueDocList() dialogue_docs.quick_add(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行交互', conversation=['你好,丰田', '在呢,有什么可以帮助你的', '我要添加一个明天上午跟张三开会的日程', '好的已为您添加成功']) aug_dialogue_docs = augmentor.augment_dialogue(dialogue_docs)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp_data-0.1.5.tar.gz
(15.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
nlp_data-0.1.5-py3-none-any.whl
(19.3 kB
view details)
File details
Details for the file nlp_data-0.1.5.tar.gz.
File metadata
- Download URL: nlp_data-0.1.5.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.1 Darwin/22.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eca7556d1a483f79c2340a9c164e1d2eb84875d025a1942092a9f44b0b0e9df6
|
|
| MD5 |
b66b3e16bdc372070bed08c91da11a9e
|
|
| BLAKE2b-256 |
5a4419c2bb7a597edbe81cfab05b6cc3a1c68408efe37007d224eed2e95a7a63
|
File details
Details for the file nlp_data-0.1.5-py3-none-any.whl.
File metadata
- Download URL: nlp_data-0.1.5-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.1 Darwin/22.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29bcc591e7f83b6d183b7bc2f41441deb1c762d7baf2645a933434a7eb98f712
|
|
| MD5 |
f33bac69021f933fcc8e518aac3ed23f
|
|
| BLAKE2b-256 |
e70391b47fbf12517168799d532d9a2540d520d1c62bdeb280e9c3872dfc2092
|