nlp-data

普强内部NLP数据存储分享工具

These details have not been verified by PyPI

Project description

普强内部NLP数据存储分享工具

安装

pypi安装

# 安装基本功能
pip install nlp-data
# 安装全部功能
pip install nlp-data[all]

安装包安装

下载dist文件夹下面最新的版本安装包.
pip install nlp_data-xxx-tar.gz

使用

Store的使用

    # Store相当于是S3对象存储的一个Bucket的封装,每个数据类型对应一个Bucket
    from nlp_data import NLUDocStore
    # 查看文档
    NLUDocStore.list()
    # 获取文档
    docs = NLUDocStore.pull('xxx')
    # 推送文档
    NLUDocStore.push(docs=docs, name='xxx')

Doc的使用

    # Doc是nlp-data的一个存储结构,可以用来存储该格式的数据,以及对数据进行一些操作
    # DocList是Doc的集合,可以用来存储多个Doc,相当于一个python List,有几本的append,extend等类方法, 但不同的DocList有特定的方法用来处理# 该数据类型
    # 以NLUDoc为例,该文档里面有domain,slots,intention等字段,可以用来存储NLU的结果
    from nlp_data import NLUDoc, NLUDocList
    # 创建一个NLUDoc
    doc = NLUDoc(text='添加明天上午跟张三开会的提醒')
    doc.set_domain('schedule_cmn')
    doc.set_intention('add_schedule')
    doc.set_slot(text='明天上午', label='date')
    doc.set_slot(text='跟张三开会', label='title')
    # 创建一个NLUDocList,并添加doc
    docs = NLUDocList()
    docs.append(doc)
    # 从abnf句式输出文件中批量初始化
    docs = NLUDocList.from_abnf_output(output_dir='your/dir', domain='schedule_cmn')
    # 上传到bucket
    from nlp_data import NLUDocStore
    NLUDocStore.push(docs=docs, name='xxx')

Augmentor的使用

  # Augmentor是nlp-data的一个数据增强工具,可以用来对数据进行增强
  from nlp_data import GPTAugmentor, NLUDocStore, DialogueDocList, DialogueDoc
  # 创建一个Augmentor
  augmentor = GPTAugmentor(api_key='xxx')
  # 广东话或者四川话增强NLUDoc
  docs = NLUDocStore.pull('xxx')
  aug_docs = augmentor.augment_nlu_by_localism(docs, '广东话')
  # 根据主题和情景生成多轮对话
  dialogue_docs = augmentor.generate_dialogue_docs(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行语音交互')
  # 对多轮对话数据增强
  dialogue_docs = DialogueDocList()
  dialogue_docs.quick_add(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行交互', conversation=['你好,丰田', '在呢,有什么可以帮助你的', '我要添加一个明天上午跟张三开会的日程', '好的已为您添加成功'])
  aug_dialogue_docs = augmentor.augment_dialogue(dialogue_docs)

S3的使用

s3是基础的S3对象存储的封装,可以用来创建bucket,上传下载文件等

  # 初始化
  s3 = S3Storage()
  # 列出所有bucket
  s3.list_buckets()
  # 创建bucket
  s3.create_bucket('test')
  # 列出bucket下所有文件
  s3.list_files('test')
  # 上传文件
  s3.upload_file(file_path='./test.txt', bucket_name='test')
  # 下载文件
  s3.download_file(object_name='./test.txt', bucket_name='test')
  # 删除文件
  s3.delete_file(bucket_name='test', file_name='test.txt')
  # 上传文件夹
  s3.upload_dir(bucket_name='test', dir='./tests')
  # 下载文件夹
  s3.download_dir(bucket_name='test', object_name='./tests', save_dir='./')
  # 删除文件夹
  s3.delete_dir(bucket_name='test', dir_name='tests')
  # 删除bucket
  s3.delete_bucket('test')

命令行

# 查看帮助
nlp-data --help
# 下载文件,当xxx为一个s3中的文件夹时,会下载该文件夹下所有文件
nlp-data download xxx.xxx --bucket xxx --save_path xxx
# 上传文件, 当xxx为一个文件夹时,会上传该文件夹下所有文件
nlp-data upload xxx --bucket xxx
# 删除文件, 当xxx为一个文件夹时,会删除该文件夹下所有文件
nlp-data delete xxx --bucket xxx

更新日志

0.1.7

增加了NLUDocList的from_file方法,可以从文件中批量初始化NLUDocList,需要文件为一行一个文本的格式
```
from nlp_data import NLUDocList

docs = NLUDocList.from_file('your/file/path', domain='domain_name')
```

0.1.8

修复了docarray 0.39版本的无法导入的bug
新增命令行工具
```
nlp-data --help
```

0.1.10

减少了基本依赖
完善了命令行工具
```
nlp-data --help
```

0.1.12

修复:
- 修复了S3Storage里面由于缓存buckets带来的bug。
- 默认不在依赖pandas
添加:
- 命令行显示文件添加了最后修改时间。
- 完善了README.md

0.1.13

修复:
- CLI的下载保存路径无法修改的BUG。

0.1.16

修改:
- 删除了无用依赖。
添加:
- 增加了NLUDoclist.convert_to_fasttext_dataset用于方便讲nlu数据转化为fasttext格式数据,参数: save_path 保存路径

0.1.18

修改:
- 修改docarray依赖错误问题。

0.1.23

添加:
- 增加了NLUDoclist.sample_by_intention用于根据意图进行采样,参数: n_sample 采样数量

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.9

Feb 18, 2025

0.4.3

Dec 24, 2024

0.4.0

Aug 20, 2024

0.3.91

Jun 19, 2024

0.3.8

May 17, 2024

0.3.7

Apr 24, 2024

0.3.6

Mar 14, 2024

0.3.5

Mar 14, 2024

0.3.4

Mar 11, 2024

0.3.3

Mar 11, 2024

0.3.2

Mar 1, 2024

0.3.1

Feb 29, 2024

0.3.0

Feb 29, 2024

0.2.9

Feb 28, 2024

0.2.8

Feb 28, 2024

0.2.7

Feb 28, 2024

0.2.6

Feb 28, 2024

0.2.5

Feb 26, 2024

0.2.4

Feb 26, 2024

0.2.3

Feb 26, 2024

0.2.2

Feb 22, 2024

0.2.1

Feb 19, 2024

0.2.0

Feb 7, 2024

0.1.32

Feb 7, 2024

0.1.31

Feb 4, 2024

0.1.30

Feb 4, 2024

0.1.29

Feb 4, 2024

0.1.28

Feb 4, 2024

0.1.27

Feb 4, 2024

0.1.26

Feb 2, 2024

0.1.25

Feb 1, 2024

This version

0.1.24

Jan 19, 2024

0.1.23

Jan 16, 2024

0.1.22

Jan 10, 2024

0.1.21

Jan 4, 2024

0.1.20

Jan 4, 2024

0.1.19

Jan 2, 2024

0.1.18

Dec 12, 2023

0.1.17

Dec 7, 2023

0.1.16

Dec 7, 2023

0.1.15

Nov 23, 2023

0.1.14

Nov 16, 2023

0.1.13

Nov 6, 2023

0.1.12

Oct 27, 2023

0.1.11

Oct 26, 2023

0.1.10

Oct 20, 2023

0.1.9

Oct 18, 2023

0.1.8

Oct 17, 2023

0.1.7

Oct 12, 2023

0.1.6

Sep 21, 2023

0.1.5

Sep 20, 2023

0.1.4

Sep 20, 2023

0.1.3

Sep 13, 2023

0.1.2

Sep 12, 2023

0.1.1

Sep 11, 2023

0.1.0

Sep 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_data-0.1.24.tar.gz (19.4 kB view details)

Uploaded Jan 19, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nlp_data-0.1.24-py3-none-any.whl (23.2 kB view details)

Uploaded Jan 19, 2024 Python 3

File details

Details for the file nlp_data-0.1.24.tar.gz.

File metadata

Download URL: nlp_data-0.1.24.tar.gz
Upload date: Jan 19, 2024
Size: 19.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.5.1 CPython/3.9.1 Darwin/23.2.0

File hashes

Hashes for nlp_data-0.1.24.tar.gz
Algorithm	Hash digest
SHA256	`d0c044858ab274f468a8c8d6d8d956bfa94f151bc7b3c81017a0b2d1cad180ad`
MD5	`a6b580d29047c67fe0d02dd64da4ac0a`
BLAKE2b-256	`88dc5c11dd57c07358cfa50b155320d9811f5bed48d57acff0e69d1fe8bf4d09`

See more details on using hashes here.

File details

Details for the file nlp_data-0.1.24-py3-none-any.whl.

File metadata

Download URL: nlp_data-0.1.24-py3-none-any.whl
Upload date: Jan 19, 2024
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.5.1 CPython/3.9.1 Darwin/23.2.0

File hashes

Hashes for nlp_data-0.1.24-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b5d12745b34258ec122e82731283b5a61f8f0082c8ee27dbfc01f8a74f45115`
MD5	`20d6621484ecc79841018596e58dc7ab`
BLAKE2b-256	`8b29b2e1cd3922bf9c553763d1f395fa7e0e998d05b4d3715c6abc7e12c99ad8`

See more details on using hashes here.

nlp-data 0.1.24

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

普强内部NLP数据存储分享工具

安装

使用

更新日志

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes