Deep learning toolbox for end-to-end text information extraction tasks.

These details have not been verified by PyPI

Project links

Homepage

Project description

Theta

Deep learning toolbox for end-to-end text information extraction tasks.

Theta定位是解决实际工程项目中文本信息抽取任务的实用工具箱，端到端实现从原始文本输入到结构化输出全过程。用户工作聚焦于输入数据格式转换，调整关键参数调度theta完成模型训练推理任务及输出格式化数据利用。

Theta应用场景包括国家级重点企业非结构化数据挖掘利用、开放域文本数据结构化抽取、各大在线实体关系抽取类评测赛事等。

Theta性能指标要求达到业内主流头部水准，近期参加了包括CCF2019、CHIP2019、CCKS2020、CCL2020等C字头顶级赛事，目前取得10余次决赛奖项，包括7次前三，2次第一。

更新

2022.09.06 0.50.0

nlp.entity_extraction, nlp.relation_extraction

安装

测试版

pip install git+http://github.com/idleuncle/theta.git

正式版

pip install -U theta

CLUE-CLUENER 细粒度命名实体识别

本数据是在清华大学开源的文本分类数据集THUCTC基础上，选出部分数据进行细粒度命名实体标注，原数据来源于Sina News RSS.

训练集：10748 验证集：1343

标签类别：数据分为10个标签类别，分别为: 地址（address），书名（book），公司（company），游戏（game），政府（goverment），电影（movie），姓名（name），组织机构（organization），职位（position），景点（scene）

数据下载地址：https://github.com/CLUEbenchmark/CLUENER2020

排行榜地址：https://cluebenchmarks.com/ner.html

完整代码见theta/examples/CLUENER：cluener.ipynb

选用bert-base-chinese预训练模型，CLUE测评F1得分77.160。

# 训练
make -f Makefile.cluener train

# 推理
make -f Makefile.cluener predict

# 生成提交结果文件
make -f Makefile.cluener submission

CLUE-TNEWS 今日头条中文新闻（短文）分类任务

以下样例是CLUE（中文任务基准测评）中今日头条中文新闻（短文）分类任务。

数据集来自今日头条的新闻版块，共提取了15个类别的新闻，包括旅游，教育，金融，军事等。

数据量：训练集(53,360)，验证集(10,000)，测试集(10,000)

例子： {"label": "102", "label_desc": "news_entertainment", "sentence": "江疏影甜甜圈自拍，迷之角度竟这么好看，美吸引一切事物"} 每一条数据有三个属性，从前往后分别是分类ID，分类名称，新闻字符串（仅含标题）。

选用bert-base-chinese预训练模型，CLUE测评F1得分56.100。

完整代码见theta/examples/TNEWS：tnews.ipynb

TNEWS数据集下载

导入基础库

import json
from tqdm import tqdm
from loguru import logger
import numpy as np

from theta.modeling import load_glue_examples
from theta.modeling.glue import GlueTrainer, load_model, get_args
from theta.utils import load_json_file

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.51.0

Sep 13, 2022

0.30.2

Jun 11, 2022

0.30.1

Jun 4, 2022

0.30.0

Jun 3, 2022

0.28.1

Aug 2, 2021

0.28.0

Aug 2, 2021

0.27.8

Aug 2, 2021

0.27.0

Aug 2, 2021

0.26.0

Jul 19, 2021

0.25.0

Jul 4, 2021

0.24.1

May 18, 2021

0.24.0

May 15, 2021

0.22.0

Jul 2, 2020

0.21.0

Jun 19, 2020

0.20.0

Jun 16, 2020

0.1.3

Dec 19, 2019

0.1.1

Oct 23, 2018

0.1.0

Oct 23, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theta-0.51.0.tar.gz (192.5 kB view details)

Uploaded Sep 13, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

theta-0.51.0-py3-none-any.whl (255.2 kB view details)

Uploaded Sep 13, 2022 Python 3

File details

Details for the file theta-0.51.0.tar.gz.

File metadata

Download URL: theta-0.51.0.tar.gz
Upload date: Sep 13, 2022
Size: 192.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.6

File hashes

Hashes for theta-0.51.0.tar.gz
Algorithm	Hash digest
SHA256	`f43e3bf8c5cc592e27d2f16045db143c3fa05b82983236ff6fb158fa01c56599`
MD5	`907bbc3a5acfc5a362b20da2bb6faf90`
BLAKE2b-256	`11fecc4b283d59c134f2f7144e0dd5ac41e1a2a6154936a99ea76e32ce6c42b0`

See more details on using hashes here.

File details

Details for the file theta-0.51.0-py3-none-any.whl.

File metadata

Download URL: theta-0.51.0-py3-none-any.whl
Upload date: Sep 13, 2022
Size: 255.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.6

File hashes

Hashes for theta-0.51.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8350f856004ea47396114846ac8cf3bc59db3b16685969190c2ac828be697969`
MD5	`b85ec988eb626f924c36d7be84398a2a`
BLAKE2b-256	`2d2462a8d66543c57815712a0845fda6be2b333b9268952985843cdd48d98611`

See more details on using hashes here.

theta 0.51.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Theta

更新

安装

CLUE-CLUENER 细粒度命名实体识别

CLUE-TNEWS 今日头条中文新闻（短文）分类任务

导入基础库

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes