Skip to main content

Load fully-typed information extraction data in a single line.

Project description

Information Extraction Datasets

This package takes care of all of the tedium when loading various information extraction datasets, providing the data in fully validated and typed Pydantic objects.

Datasets

BioRED

Example
from ie_datasets import BioRED
BioRED.load_units("Train")
BioRED.load_units("Dev")
BioRED.load_units("Test")

ChemProt

Example
from ie_datasets import ChemProt
ChemProt.load_units("train")
ChemProt.load_units("validation")
ChemProt.load_units("test")

CrossRE

Example
from ie_datasets import CrossRE
for domain in ("ai", "literature", "music", "news", "politics", "science"):
    CrossRE.load_units("train")
    CrossRE.load_units("dev")
    CrossRE.load_units("test")

CUAD

Example
from ie_datasets import CUAD
CUAD.load_units()

DEFT

Example
from ie_datasets import DEFT
for category in ("biology", "history", "physics", "psychology", "economic", "sociology", "government",):
    DEFT.load_units(split="train", category=category)
    DEFT.load_units(split="dev", category=category)
    DEFT.load_units(split="test", category=category)

NOTE: DEFT's data files contain an overwhelming number of errata. For now, we drop the errors instead of fixing them.

DocRED

Example
from ie_datasets import DocRED
DocRED.load_schema()
DocRED.load_units("train_annotated")
DocRED.load_units("train_distant")
DocRED.load_units("validation")
DocRED.load_units("test")

NOTE: DocRED has been superseded by Re-DocRED

HyperRED

Example
from ie_datasets import HyperRED
HyperRED.load_units("train")
HyperRED.load_units("validation")
HyperRED.load_units("test")

KnowledgeNet

Example
from ie_datasets import KnowledgeNet
KnowledgeNet.load_units("train")
KnowledgeNet.load_units("test-no-facts") # unlabelled

SciERC

Example
from ie_datasets import SciERC
SciERC.load_units("train")
SciERC.load_units("dev")
SciERC.load_units("test")

SoMeSci

Example
from ie_datasets import SoMeSci
SoMeSci.load_schema()
for group in ("Creation_sentences", "PLoS_methods", "PLoS_sentences", "Pubmed_fulltext"):
    SoMeSci.load_units(group=group, split="train")
    SoMeSci.load_units(group=group, split="devel")
    SoMeSci.load_units(group=group, split="test")

Re-DocRED

Example
from ie_datasets import ReDocRED
ReDocRED.load_schema()
ReDocRED.load_units("train")
ReDocRED.load_units("validation")
ReDocRED.load_units("test")

TPLinker/NYT

Example
from ie_datasets import TPLinkerNYT
TPLinkerNYT.load_schema()
TPLinkerNYT.load_units("train")
TPLinkerNYT.load_units("valid")
TPLinkerNYT.load_units("test")

TPLinker/WebNLG

Example
from ie_datasets import TPLinkerWebNLG
TPLinkerWebNLG.load_schema()
TPLinkerWebNLG.load_units("train")
TPLinkerWebNLG.load_units("valid")
TPLinkerWebNLG.load_units("test")

WikiEvents

Example
from ie_datasets import WikiEvents
WikiEvents.load_ontology()
WikiEvents.load_units("train")
WikiEvents.load_units("dev")
WikiEvents.load_units("test")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ie_datasets-0.0.5.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ie_datasets-0.0.5-py3-none-any.whl (57.9 kB view details)

Uploaded Python 3

File details

Details for the file ie_datasets-0.0.5.tar.gz.

File metadata

  • Download URL: ie_datasets-0.0.5.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.5.tar.gz
Algorithm Hash digest
SHA256 724e75a337546e14c11383693b0932d1078e04e696d178e59b7b66b1286f903f
MD5 fc326dd06b839cf0a222a5df803f6f35
BLAKE2b-256 e88aaa7892ce75fe0660b8a2221346805a5ddc138b04d2ff7aaa020643a207e5

See more details on using hashes here.

File details

Details for the file ie_datasets-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: ie_datasets-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 57.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ie_datasets-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 66910c8c5f60e25c7cb28114ae7ae0bb5566efe48076d6785d0270534f740ad5
MD5 35d32d27b800a8409e9c3ec2ae8a396f
BLAKE2b-256 95d8d9abd4543bc95b7978f11137d157b2427128d12ce282fbf274a5179294c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page