Skip to main content

Load fully-typed information extraction data in a single line.

Project description

Information Extraction Datasets

This package takes care of all of the tedium when loading various information extraction datasets, providing the data in fully validated and typed Pydantic objects.

Datasets

BioRED

Example
from ie_datasets import BioRED
BioRED.load_units(BioRED.Split.TRAIN)

ChemProt

Example
from ie_datasets import ChemProt
ChemProt.load_units(ChemProt.Split.TRAIN)

CrossRE

Example
from ie_datasets import CrossRE
CrossRE.load_units(CrossRE.Split.TRAIN, domain=CrossRE.Domain.AI)

CUAD

Example
from ie_datasets import CUAD
CUAD.load_units()

DEFT

Example
from ie_datasets import DEFT
DEFT.load_units(DEFT.Split.TRAIN, category=DEFT.Category.BIOLOGY)

NOTE: DEFT's data files contain an overwhelming number of errata. For now, we drop the errors instead of fixing them. This means that we are loading a subset of DEFT, not the full dataset.

DocRED

Example
from ie_datasets import DocRED
DocRED.load_schema()
DocRED.load_units(DocRED.Split.TRAIN_ANNOTATED)

NOTE: DocRED has been superseded by Re-DocRED

HyperRED

Example
from ie_datasets import HyperRED
HyperRED.load_units(HyperRED.Split.TRAIN)

KnowledgeNet

Example
from ie_datasets import KnowledgeNet
KnowledgeNet.load_units(KnowledgeNet.Split.TRAIN)

NOTE: The test split of KnowledgeNet is unlabelled.

Re-DocRED

Example
from ie_datasets import ReDocRED
ReDocRED.load_schema()
ReDocRED.load_units(ReDocRED.Split.TRAIN)

SciERC

Example
from ie_datasets import SciERC
SciERC.load_units(SciERC.Split.TRAIN)

SciREX

Example
from ie_datasets import SciREX
SciREX.load_units(SciREX.Split.TRAIN)

SoMeSci

Example
from ie_datasets import SoMeSci
SoMeSci.load_schema()
SoMeSci.load_units(SoMeSci.Split.TRAIN, group=SoMeSci.Group.CREATION_SENTENCES)

TPLinker/NYT

Example
from ie_datasets import TPLinkerNYT
TPLinkerNYT.load_schema()
TPLinkerNYT.load_units(TPLinkerNYT.Split.TRAIN)

TPLinker/WebNLG

Example
from ie_datasets import TPLinkerWebNLG
TPLinkerWebNLG.load_schema()
TPLinkerWebNLG.load_units(TPLinkerWebNLG.Split.TRAIN)

WikiEvents

Example
from ie_datasets import WikiEvents
WikiEvents.load_ontology()
WikiEvents.load_units(WikiEvents.Split.TRAIN)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ie_datasets-0.0.6.tar.gz (518.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ie_datasets-0.0.6-py3-none-any.whl (74.0 kB view details)

Uploaded Python 3

File details

Details for the file ie_datasets-0.0.6.tar.gz.

File metadata

  • Download URL: ie_datasets-0.0.6.tar.gz
  • Upload date:
  • Size: 518.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.6.tar.gz
Algorithm Hash digest
SHA256 236129a21e702878fc855cfcc19dbc60b6441e8c8b8c2b6a25c16cd511c6aa04
MD5 b13b454ee6ef02340a5ff7227ab8eb9c
BLAKE2b-256 8caaefd237cdab4b7f982b4336b10b0b5879ef0a693ba9429cae601405594c42

See more details on using hashes here.

File details

Details for the file ie_datasets-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: ie_datasets-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 74.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5e7c346c56ad448d30b1700143ac9037e3a5a1c727c5309788a490f1cf6ec7af
MD5 ebd2b13f1f5630cf9ccbb7e9205a0333
BLAKE2b-256 e1f38c66ffb6681d02901064be5a1d76e562de41d90db8447f8a1171c9a69480

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page