A project which focuses on automating and transferring chemical data extraction using span categorization and relation extraction models.

Project description

`chemrel`

ChemREL Command Line Interface (CLI).

Automate and transfer chemical data extraction using span categorization and relation extraction models.

To initialize the assets required by the CLI, run the following command.

$ chemrel init

Usage:

$ chemrel [OPTIONS] COMMAND [ARGS]...

Options:

--install-completion: Install completion for the current shell.
--show-completion: Show completion for the current shell, to copy it or customize the installation.
--help: Show this message and exit.

Commands:

aux: Run one of a number of auxiliary data...
clean: Removes intermediate files to start data...
init: Initializes files required by package at...
predict: Predicts the spans and/or relations in a...
rel: Configure and/or train a relation...
span: Configure and/or train a span...

`chemrel aux`

Run one of a number of auxiliary data processing commands.

Usage:

$ chemrel aux [OPTIONS] COMMAND [ARGS]...

Options:

--help: Show this message and exit.

Commands:

extract-elsevier-paper: Converts Elsevier paper with specified DOI...
extract-paper: Converts paper PDF at specified path into...

`chemrel aux extract-elsevier-paper`

Converts Elsevier paper with specified DOI code into a sequence of JSONL files each corresponding to a text chunk, where each JSONL line is tokenized by sentence. Example: if provided path is dir/file and the Paper text contains two chunks, files dir/file_1.jsonl and dir/file_2.jsonl will be generated; otherwise, if the Paper text contains one chunk, dir/file.jsonl will be generated.

Usage:

$ chemrel aux extract-elsevier-paper [OPTIONS] DOI_CODE API_KEY JSONL_PATH

Arguments:

DOI_CODE: DOI code of paper, not in URL form [required]
API_KEY: Elsevier API key [required]
JSONL_PATH: Filepath to save JSONL files to, ignores filename extension [required]

Options:

--char-limit INTEGER: Character limit of each text chunk in generated Paper object
--help: Show this message and exit.

`chemrel aux extract-paper`

Converts paper PDF at specified path into a sequence of JSONL files each corresponding to a text chunk, where each JSONL line is tokenized by sentence. Example: if provided path is dir/file and the Paper text contains two chunks, files dir/file_1.jsonl and dir/file_2.jsonl will be generated; otherwise, if the Paper text contains one chunk, dir/file.jsonl will be generated.

Usage:

$ chemrel aux extract-paper [OPTIONS] PAPER_PATH JSONL_PATH

Arguments:

PAPER_PATH: File path of paper PDF [required]
JSONL_PATH: Filepath to save JSONL files to, ignores filename extension [required]

Options:

--char-limit INTEGER: Character limit of each text chunk in generated Paper object
--help: Show this message and exit.

`chemrel clean`

Removes intermediate files to start data preparation and training from a clean slate.

Usage:

$ chemrel clean [OPTIONS]

Options:

--help: Show this message and exit.

`chemrel init`

Initializes files required by package at given path.

Usage:

$ chemrel init [OPTIONS] [PATH]

Arguments:

[PATH]: File path in which to initialize required files [default: ./]

Options:

--help: Show this message and exit.

`chemrel predict`

Predicts the spans and/or relations in a given text using the given models.

Usage:

$ chemrel predict [OPTIONS] COMMAND [ARGS]...

Options:

--help: Show this message and exit.

Commands:

rel: Predicts spans and the relations between...
span: Predicts spans contained in given text and...

`chemrel predict rel`

Predicts spans and the relations between them contained in given text determined by the given models and prints them.

Usage:

$ chemrel predict rel [OPTIONS] SC_MODEL_PATH REL_MODEL_PATH TEXT

Arguments:

SC_MODEL_PATH: File path of span categorization model to be used [required]
REL_MODEL_PATH: File path of relation extraction model to be used [required]
TEXT: Text content to predict spans within [required]

Options:

--help: Show this message and exit.

`chemrel predict span`

Predicts spans contained in given text and prints them.

Usage:

$ chemrel predict span [OPTIONS] SC_MODEL_PATH TEXT

Arguments:

SC_MODEL_PATH: File path of span categorization model to be used [required]
TEXT: Text content to predict spans within [required]

Options:

--help: Show this message and exit.

`chemrel rel`

Configure and/or train a relation extraction model.

Usage:

$ chemrel rel [OPTIONS] COMMAND [ARGS]...

Options:

--help: Show this message and exit.

Commands:

process-data: Parses the gold-standard annotations from...
test: Applies the best relation extraction model...
tl-cpu: Trains the relation extraction (rel) model...
tl-gpu: Trains the relation extraction (rel) model...
train-cpu: Trains the relation extraction (rel) model...
train-gpu: Trains the relation extraction (rel) model...

`chemrel rel process-data`

Parses the gold-standard annotations from the Prodigy annotations.

Usage:

$ chemrel rel process-data [OPTIONS]

Options:

--annotations-file TEXT: File path of Prodigy annotations [default: assets/goldrels.jsonl]
--train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
--test-file TEXT: File path of test data corpus [default: reldata/test.spacy]
--help: Show this message and exit.

`chemrel rel test`

Applies the best relation extraction model to unseen text and measures accuracy at different thresholds.

Usage:

$ chemrel rel test [OPTIONS]

Options:

--trained-model TEXT: File path of trained model to be used [default: reltraining/model-best]
--test-file TEXT: File path of test data corpus [default: reldata/test.spacy]
--help: Show this message and exit.

`chemrel rel tl-cpu`

Trains the relation extraction (rel) model using transfer learning on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel tl-cpu [OPTIONS]

Options:

--tl-tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/rel_TL_tok2vec.cfg]
--train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
--help: Show this message and exit.

`chemrel rel tl-gpu`

Trains the relation extraction (rel) model with a Transformer using transfer learning on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel tl-gpu [OPTIONS]

Options:

--tl-trf-config TEXT: File path of config file for transformer span categorization model [default: configs/rel_TL_trf.cfg]
--train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
--gpu-id TEXT: The GPU device identifier to be used [default: 0]
--help: Show this message and exit.

`chemrel rel train-cpu`

Trains the relation extraction (rel) model on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel train-cpu [OPTIONS]

Options:

--tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/rel_tok2vec.cfg]
--train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
--help: Show this message and exit.

`chemrel rel train-gpu`

Trains the relation extraction (rel) model with a Transformer on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel train-gpu [OPTIONS]

Options:

--trf-config TEXT: File path of config file for transformer span categorization model [default: configs/rel_trf.cfg]
--train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
--gpu-id TEXT: The GPU device identifier to be used [default: 0]
--help: Show this message and exit.

`chemrel span`

Configure and/or train a span categorization model.

Usage:

$ chemrel span [OPTIONS] COMMAND [ARGS]...

Options:

--help: Show this message and exit.

Commands:

process-data: Instructs to use the Prodigy function...
test: Applies the best span categorization model...
tl-cpu: Trains the span categorization (sc) model...
tl-gpu: Trains the span categorization (sc) model...
train-cpu: Trains the span categorization (sc) model...
train-gpu: Trains the span categorization (sc) model...

`chemrel span process-data`

Instructs to use the Prodigy function (data-to-spacy) for data processing.

Usage:

$ chemrel span process-data [OPTIONS]

Options:

--help: Show this message and exit.

`chemrel span test`

Applies the best span categorization model to unseen text and measures accuracy at different thresholds.

Usage:

$ chemrel span test [OPTIONS]

Options:

--trained-model TEXT: File path of trained model to be used [default: sctraining/model-best]
--test-file TEXT: File path of test data corpus [default: scdata/test.spacy]
--gpu-id TEXT: The GPU device identifier to be used
--help: Show this message and exit.

`chemrel span tl-cpu`

Trains the span categorization (sc) model using transfer learning on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel span tl-cpu [OPTIONS]

Options:

--tl-tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/sc_TL_tok2vec.cfg]
--train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
--help: Show this message and exit.

`chemrel span tl-gpu`

Trains the span categorization (sc) model using transfer learning on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel span tl-gpu [OPTIONS]

Options:

--tl-trf-config TEXT: File path of config file for transformer span categorization model [default: configs/sc_TL_trf.cfg]
--train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
--gpu-id TEXT: The GPU device identifier to be used [default: 0]
--help: Show this message and exit.

`chemrel span train-cpu`

Trains the span categorization (sc) model on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel span train-cpu [OPTIONS]

Options:

--tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/sc_tok2vec.cfg]
--train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
--help: Show this message and exit.

`chemrel span train-gpu`

Trains the span categorization (sc) model on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel span train-gpu [OPTIONS]

Options:

--trf-config TEXT: File path of config file for transformer span categorization model [default: configs/sc_trf.cfg]
--train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
--dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
--gpu-id TEXT: The GPU device identifier to be used [default: 0]
--help: Show this message and exit.

Project details

Release history Release notifications | RSS feed

This version

1.0.2

Dec 27, 2023

1.0.1

Dec 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemrel-1.0.2.tar.gz (15.8 kB view details)

Uploaded Dec 27, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chemrel-1.0.2-py3-none-any.whl (20.0 kB view details)

Uploaded Dec 27, 2023 Python 3

File details

Details for the file chemrel-1.0.2.tar.gz.

File metadata

Download URL: chemrel-1.0.2.tar.gz
Upload date: Dec 27, 2023
Size: 15.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for chemrel-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`116db82fdc6978a72aae3a855a9c36f0253c3c51046187e2efbb4c7b68e0b3ff`
MD5	`40eb6dbe3e4a816b3f78ba79f530a796`
BLAKE2b-256	`45770822f35b16444bf63bd981b9cb4a8be8cebc908f64d6892bb8f3286fac7f`

See more details on using hashes here.

File details

Details for the file chemrel-1.0.2-py3-none-any.whl.

File metadata

Download URL: chemrel-1.0.2-py3-none-any.whl
Upload date: Dec 27, 2023
Size: 20.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for chemrel-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51cf16cbf8a9fd013ad0a8d688717b79bfba469a0ce0e5efd92cb7c96dd783d6`
MD5	`7cdd70136474582241273606d54db774`
BLAKE2b-256	`29817682817a318cc1e24ce8d1a7b69cf4d6f0934d06fa6ef5f1d17dcb1dfa41`

See more details on using hashes here.

chemrel 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

chemrel

chemrel aux

chemrel aux extract-elsevier-paper

chemrel aux extract-paper

chemrel clean

chemrel init

chemrel predict

chemrel predict rel

chemrel predict span

chemrel rel

chemrel rel process-data

chemrel rel test

chemrel rel tl-cpu

chemrel rel tl-gpu

chemrel rel train-cpu

chemrel rel train-gpu

chemrel span

chemrel span process-data

chemrel span test

chemrel span tl-cpu

chemrel span tl-gpu

chemrel span train-cpu

chemrel span train-gpu

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`chemrel`

`chemrel aux`

`chemrel aux extract-elsevier-paper`

`chemrel aux extract-paper`

`chemrel clean`

`chemrel init`

`chemrel predict`

`chemrel predict rel`

`chemrel predict span`

`chemrel rel`

`chemrel rel process-data`

`chemrel rel test`

`chemrel rel tl-cpu`

`chemrel rel tl-gpu`

`chemrel rel train-cpu`

`chemrel rel train-gpu`

`chemrel span`

`chemrel span process-data`

`chemrel span test`

`chemrel span tl-cpu`

`chemrel span tl-gpu`

`chemrel span train-cpu`

`chemrel span train-gpu`