Skip to main content

Load fully-typed information extraction data in a single line.

Project description

Information Extraction Datasets

This package takes care of all of the tedium when loading various information extraction datasets, providing the data in fully validated and typed Pydantic objects.

Datasets

BioRED

Example
from ie_datasets import BioRED
BioRED.load_units(BioRED.Split.TRAIN)

ChemProt

Example
from ie_datasets import ChemProt
ChemProt.load_units(ChemProt.Split.TRAIN)

CrossRE

Example
from ie_datasets import CrossRE
CrossRE.load_units(CrossRE.Split.TRAIN, domain=CrossRE.Domain.AI)

CUAD

Example
from ie_datasets import CUAD
CUAD.load_units()

DEFT

Example
from ie_datasets import DEFT
DEFT.load_units(DEFT.Split.TRAIN, category=DEFT.Category.BIOLOGY)

NOTE: DEFT's data files contain an overwhelming number of errata. For now, we drop the errors instead of fixing them. This means that we are loading a subset of DEFT, not the full dataset.

DocRED

Example
from ie_datasets import DocRED
DocRED.load_schema()
DocRED.load_units(DocRED.Split.TRAIN_ANNOTATED)

NOTE: DocRED has been superseded by Re-DocRED

HyperRED

Example
from ie_datasets import HyperRED
HyperRED.load_units(HyperRED.Split.TRAIN)

KnowledgeNet

Example
from ie_datasets import KnowledgeNet
KnowledgeNet.load_units(KnowledgeNet.Split.TRAIN)

NOTE: The test split of KnowledgeNet is unlabelled.

Re-DocRED

Example
from ie_datasets import ReDocRED
ReDocRED.load_schema()
ReDocRED.load_units(ReDocRED.Split.TRAIN)

SciERC

Example
from ie_datasets import SciERC
SciERC.load_units(SciERC.Split.TRAIN)

SciREX

Example
from ie_datasets import SciREX
SciREX.load_units(SciREX.Split.TRAIN)

SoMeSci

Example
from ie_datasets import SoMeSci
SoMeSci.load_schema()
SoMeSci.load_units(SoMeSci.Split.TRAIN, group=SoMeSci.Group.CREATION_SENTENCES)

TPLinker/NYT

Example
from ie_datasets import TPLinkerNYT
TPLinkerNYT.load_schema()
TPLinkerNYT.load_units(TPLinkerNYT.Split.TRAIN)

TPLinker/WebNLG

Example
from ie_datasets import TPLinkerWebNLG
TPLinkerWebNLG.load_schema()
TPLinkerWebNLG.load_units(TPLinkerWebNLG.Split.TRAIN)

WikiEvents

Example
from ie_datasets import WikiEvents
WikiEvents.load_ontology()
WikiEvents.load_units(WikiEvents.Split.TRAIN)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ie_datasets-0.0.8.tar.gz (553.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ie_datasets-0.0.8-py3-none-any.whl (77.1 kB view details)

Uploaded Python 3

File details

Details for the file ie_datasets-0.0.8.tar.gz.

File metadata

  • Download URL: ie_datasets-0.0.8.tar.gz
  • Upload date:
  • Size: 553.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.8.tar.gz
Algorithm Hash digest
SHA256 d18d52eea2b7456a9f87c28beab7c2b0f28e64370f79baf1cffd265d8da68b8f
MD5 7864899418b023d732d2bec8263fa327
BLAKE2b-256 05c05598d2545914437414463997d2bb29b90bd42a95b7242ed77b2ba59735c1

See more details on using hashes here.

File details

Details for the file ie_datasets-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: ie_datasets-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 77.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7963b7656e5aa8af2e23eca87815a91d6c1085a8db6d86cc6cfa01f9bb8d8868
MD5 4fdd476b846a4de93c5df9472509ff52
BLAKE2b-256 10d6fa8776b20c887d8bc8d97784b18ebadc7a614c2c9b32671638eebc7d2a5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page