Skip to main content

A Chinese relation extraction data utility toolkit based on CasRel model

Project description

CasRel 数据处理工具包

这是一个基于 CasRel 模型的中文关系抽取数据处理工具包,支持 BERT 预训练模型。

支持的 BERT 模型

  • bert-base-chinese: 默认中文 BERT 基础模型,适用于通用中文任务,参数量适中,bert_dim=768。

安装

  • pip install casrel_datautils

使用示例

以下是一个使用 casrel_datautils 进行数据加载和单条样本处理的示例代码:

from casrel_datautils.Base_Conf import BaseConfig
from casrel_datautils.data_loader import get_dataloader
from casrel_datautils.process import single_sample_process

# 配置基础参数
baseconf = BaseConfig(
    bert_path=r"C:\Lucky_dt\2_bj\BJ_AI23_KG\12days\KG_code\chapter4_code\CasRel_RE\bert-base-chinese", #模型路径
    train_data=r"本地数据路径train.json",
    test_data=r"本地数据路径test.json",
    rel_data=r"本地关系数据路径relation.json",
    batch_size=2
)

# 获取数据加载器
dataloaders = get_dataloader(baseconf)

# 单条样本处理
sample = {"text": "这是一个测试句子"}
input_tensor, mask_tensor = single_sample_process(baseconf, sample)
print(input_tensor.shape)
print(mask_tensor.shape)

说明

  • BaseConfig: 用于设置 BERT 模型路径、数据路径和批次大小等参数。
  • get_dataloader: 返回训练、验证和测试的数据加载器。
  • single_sample_process: 处理单条文本样本,返回输入张量和掩码张量。

注意事项

  • 确保数据文件(如 train.jsontest.jsonrelation.json)路径正确。
  • 根据任务需求选择合适的 BERT 模型。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

casrel_datautils_lj-22.1.5.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

casrel_datautils_lj-22.1.5-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file casrel_datautils_lj-22.1.5.tar.gz.

File metadata

  • Download URL: casrel_datautils_lj-22.1.5.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for casrel_datautils_lj-22.1.5.tar.gz
Algorithm Hash digest
SHA256 f82c7a02b1eaad4f123d520f0b758a78fa5cc1da27ff254c0b23cb96b6c2fae2
MD5 1b7c8751836d0ac90290e902afa7fd0b
BLAKE2b-256 5d17777c586a16dff944fb5a5668706b1825e88110fb2529b796442a5a8a5ca7

See more details on using hashes here.

File details

Details for the file casrel_datautils_lj-22.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for casrel_datautils_lj-22.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d5e7a89114fce3e54c431664190daa4ae0b816433ecd9602b526bed1691804f7
MD5 5a2a5b754018fc4bfaf0f0e716d8acde
BLAKE2b-256 29887a4bcb2dd4770730fa3cdcddc9a7b7a50b8313b1141ef9e35cbb8af331a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page