Skip to main content

No project description provided

Project description

普强内部NLP数据存储分享工具

安装

  • pip install nlp-data

使用

  • Store的使用

     # Store相当于是S3对象存储的一个Bucket的封装,每个数据类型对应一个Bucket
     from nlp_data import NLUDocStore
     # 查看文档
     NLUDocStore.list()
     # 获取文档
     docs = NLUDocStore.pull('xxx')
     # 推送文档
     NLUDocStore.push(docs=docs, name='xxx')
    
  • Doc的使用

        # Doc是nlp-data的一个存储结构,可以用来存储该格式的数据,以及对数据进行一些操作
        # DocList是Doc的集合,可以用来存储多个Doc,相当于一个python List,有几本的append,extend等类方法, 但不同的DocList有特定的方法用来处理# 该数据类型
        # 以NLUDoc为例,该文档里面有domain,slots,intention等字段,可以用来存储NLU的结果
        from nlp_data import NLUDoc, NLUDocList
        # 创建一个NLUDoc
        doc = NLUDoc(text='添加明天上午跟张三开会的提醒')
        doc.set_domain('schedule_cmn')
        doc.set_intention('add_schedule')
        doc.set_slot(text='明天上午', label='date')
        doc.set_slot(text='跟张三开会', label='title')
        # 创建一个NLUDocList,并添加doc
        docs = NLUDocList()
        docs.append(doc)
        # 从abnf句式输出文件中批量初始化
        docs = NLUDocList.from_abnf_output(output_dir='your/dir', domain='schedule_cmn')
        # 上传到bucket
        from nlp_data import NLUDocStore
        NLUDocStore.push(docs=docs, name='xxx')
    
  • Augmentor的使用

      # Augmentor是nlp-data的一个数据增强工具,可以用来对数据进行增强
      from nlp_data import GPTAugmentor, NLUDocStore, DialogueDocList, DialogueDoc
      # 创建一个Augmentor
      augmentor = GPTAugmentor(api_key='xxx')
      # 广东话或者四川话增强NLUDoc
      docs = NLUDocStore.pull('xxx')
      aug_docs = augmentor.augment_nlu_by_localism(docs, '广东话')
      # 根据主题和情景生成多轮对话
      dialogue_docs = augmentor.generate_dialogue_docs(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行语音交互')
      # 对多轮对话数据增强
      dialogue_docs = DialogueDocList()
      dialogue_docs.quick_add(theme='添加日程', situation='用户正在驾驶车辆与车机系统丰田进行交互', conversation=['你好,丰田', '在呢,有什么可以帮助你的', '我要添加一个明天上午跟张三开会的日程', '好的已为您添加成功'])
      aug_dialogue_docs = augmentor.augment_dialogue(dialogue_docs)
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_data-0.1.5.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp_data-0.1.5-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file nlp_data-0.1.5.tar.gz.

File metadata

  • Download URL: nlp_data-0.1.5.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.1 Darwin/22.1.0

File hashes

Hashes for nlp_data-0.1.5.tar.gz
Algorithm Hash digest
SHA256 eca7556d1a483f79c2340a9c164e1d2eb84875d025a1942092a9f44b0b0e9df6
MD5 b66b3e16bdc372070bed08c91da11a9e
BLAKE2b-256 5a4419c2bb7a597edbe81cfab05b6cc3a1c68408efe37007d224eed2e95a7a63

See more details on using hashes here.

File details

Details for the file nlp_data-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: nlp_data-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.1 Darwin/22.1.0

File hashes

Hashes for nlp_data-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 29bcc591e7f83b6d183b7bc2f41441deb1c762d7baf2645a933434a7eb98f712
MD5 f33bac69021f933fcc8e518aac3ed23f
BLAKE2b-256 e70391b47fbf12517168799d532d9a2540d520d1c62bdeb280e9c3872dfc2092

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page