Skip to main content

Package for working with OAPapers dataset.

Project description

OAPapersLoader

This repository contains python loaders for OAPapers corpus and derived datasets. It accompanies the repository https://github.com/KNOT-FIT-BUT/OAPapers and provides more lightweight solution without exhaustive dependencies to load the OAPapers corpus and derived datasets.

Install

pip install oapaersloader

Usage

An example of loading OARelatedWork dataset with references:

from oapapersloader.datasets import OARelatedWork, OADataset

with OARelatedWork("train.jsonl", "train.jsonl.index") as dataset, \
            OADataset("references.jsonl", "references.jsonl.index") as references:
    d = dataset[0]
    print("Document:", dataset[0].title)
    print("Cited paper:", references.get_by_id(d.citations[0]).title)

The OARelatedWork will load the target papers with related work sections and the OADataset will load dataset of all references that can be used for loading cited papers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oapapersloader-1.0.1.tar.gz (18.4 kB view hashes)

Uploaded Source

Built Distribution

oapapersloader-1.0.1-py3-none-any.whl (16.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page