Skip to main content

An open-source Chinese NLP Dataset Reader library, built on allennlp & pytorch.

Project description

chreader

中文自然语言处理数据集工具包

优秀特性

  • 易用
    • 支持自动下载和缓存,一行命令即可获得指定数据集
    • 支持命令行的方式展示已有数据集及其详细描述
    • 无缝衔接 allennlpcatalystpytorch_lightningFARM 等常用 NLP 框架
  • 丰富,支持分类、生成、标注等多种类型数据集,共计 2
  • 灵活
    • 可以自由添加自定义数据集,只需继承 ChDatasetReader 即可
    • 借助 allennlp 可使用各种 tokenizertoken_indexervocab 等组件,并对其进行高级配置

安装

  • git
    git clone https://github.com/wangyuxinwhy/chreader.git
    pip install -e .
    
  • pip
    pip install -U chreader
    

使用

构建 Dataset & DataLoader

from chreader import load_dataset, DataLoader
train_dataset = load_dataset("tnews", "train")
dev_dataset = load_dataset("tnews", "dev")
train_dataloader = DataLoader(train_dataset, batch_size=32)
dev_dataloader = DataLoader(dev_dataset, batch_size=32)
for data in train_dataloader:
    ...

命令行

// 列出所有可用数据集
chreader list

17EOZQ

// 展示数据集详细信息
chreader show tnews

prGxJd

TODO

  • 添加更多数据集
  • 添加 dataset_type 字段,现在只有 classification 一种
    • classification
      • sentiment
    • generation
      • summarization
    • tagging
      • ner
      • dependency_parsing
  • 支持外部的配置
  • 美化命令行的输出
  • 录一个 gif
  • 添加 docs
  • 添加 tutorial

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chreader-0.2.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

chreader-0.2.1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file chreader-0.2.1.tar.gz.

File metadata

  • Download URL: chreader-0.2.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for chreader-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5f0854cb3f5b83dec0deefee2673ae8490535c3c985cbed099406066c2fe3015
MD5 8fad0976f64d26e395546c9b7f46bb7e
BLAKE2b-256 c91219409e11cf54c1883a1cb903c401d8870aa752c1b49b81b3845fa3adcd1b

See more details on using hashes here.

File details

Details for the file chreader-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: chreader-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for chreader-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 29e9dc782d9727f83b07cca75eed86e7794fa25f49036f517d162a43f405e30b
MD5 d3ada5a22b61cab694cd8fbde7d82732
BLAKE2b-256 14924dff1e8b57a729947c8eed2cc179574e8e73d8aa386af080981aac36d486

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page