Skip to main content

Load fully-typed information extraction data in a single line.

Project description

Information Extraction Datasets

This package takes care of all of the tedium when loading various information extraction datasets, providing the data in fully validated and typed Pydantic objects.

Datasets

BioRED

Example
from ie_datasets import BioRED
BioRED.load_units(BioRED.Split.TRAIN)

ChemProt

Example
from ie_datasets import ChemProt
ChemProt.load_units(ChemProt.Split.TRAIN)

CrossRE

Example
from ie_datasets import CrossRE
CrossRE.load_units(CrossRE.Split.TRAIN, domain=CrossRE.Domain.AI)

CUAD

Example
from ie_datasets import CUAD
CUAD.load_units()

DEFT

Example
from ie_datasets import DEFT
DEFT.load_units(DEFT.Split.TRAIN, category=DEFT.Category.BIOLOGY)

NOTE: DEFT's data files contain an overwhelming number of errata. For now, we drop the errors instead of fixing them. This means that we are loading a subset of DEFT, not the full dataset.

DocRED

Example
from ie_datasets import DocRED
DocRED.load_schema()
DocRED.load_units(DocRED.Split.TRAIN_ANNOTATED)

NOTE: DocRED has been superseded by Re-DocRED

HyperRED

Example
from ie_datasets import HyperRED
HyperRED.load_units(HyperRED.Split.TRAIN)

KnowledgeNet

Example
from ie_datasets import KnowledgeNet
KnowledgeNet.load_units(KnowledgeNet.Split.TRAIN)

NOTE: The test split of KnowledgeNet is unlabelled.

Re-DocRED

Example
from ie_datasets import ReDocRED
ReDocRED.load_schema()
ReDocRED.load_units(ReDocRED.Split.TRAIN)

SciERC

Example
from ie_datasets import SciERC
SciERC.load_units(SciERC.Split.TRAIN)

SciREX

Example
from ie_datasets import SciREX
SciREX.load_units(SciREX.Split.TRAIN)

SoMeSci

Example
from ie_datasets import SoMeSci
SoMeSci.load_schema()
SoMeSci.load_units(SoMeSci.Split.TRAIN, group=SoMeSci.Group.CREATION_SENTENCES)

TPLinker/NYT

Example
from ie_datasets import TPLinkerNYT
TPLinkerNYT.load_schema()
TPLinkerNYT.load_units(TPLinkerNYT.Split.TRAIN)

TPLinker/WebNLG

Example
from ie_datasets import TPLinkerWebNLG
TPLinkerWebNLG.load_schema()
TPLinkerWebNLG.load_units(TPLinkerWebNLG.Split.TRAIN)

WikiEvents

Example
from ie_datasets import WikiEvents
WikiEvents.load_ontology()
WikiEvents.load_units(WikiEvents.Split.TRAIN)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ie_datasets-0.0.7.tar.gz (529.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ie_datasets-0.0.7-py3-none-any.whl (76.2 kB view details)

Uploaded Python 3

File details

Details for the file ie_datasets-0.0.7.tar.gz.

File metadata

  • Download URL: ie_datasets-0.0.7.tar.gz
  • Upload date:
  • Size: 529.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.7.tar.gz
Algorithm Hash digest
SHA256 3626688f60ebe20bb0d2f2851bdbddbe65677093fdfad9bab39fbd3f5cbb7e77
MD5 f50b59b3db74da63872dcf5d873a6d41
BLAKE2b-256 bcc298f82dfd64c3de6f2b4213460b477b8f6036c9bcb9d16f0462ee5ee8d3ee

See more details on using hashes here.

File details

Details for the file ie_datasets-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: ie_datasets-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 76.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ab09137909e49092b1708da9b6721177eb50315275aa174d747ea7f90f6a8f2a
MD5 a813f3da3a705239f6208fb06a5631b5
BLAKE2b-256 ef23aa4990621eb434202cbacf6fb6f52059d90913e999e62df5d361474473aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page