A project which focuses on automating and transferring chemical data extraction using span categorization and relation extraction models.
Project description
chemrel
ChemREL Command Line Interface (CLI).
Automate and transfer chemical data extraction using span categorization and relation extraction models.
To initialize the assets required by the CLI, run the following command.
$ chemrel init
Usage:
$ chemrel [OPTIONS] COMMAND [ARGS]...
Options:
--install-completion
: Install completion for the current shell.--show-completion
: Show completion for the current shell, to copy it or customize the installation.--help
: Show this message and exit.
Commands:
aux
: Run one of a number of auxiliary data...clean
: Removes intermediate files to start data...init
: Initializes files required by package at...predict
: Predicts the spans and/or relations in a...rel
: Configure and/or train a relation...span
: Configure and/or train a span...
chemrel aux
Run one of a number of auxiliary data processing commands.
Usage:
$ chemrel aux [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.
Commands:
extract-elsevier-paper
: Converts Elsevier paper with specified DOI...extract-paper
: Converts paper PDF at specified path into...
chemrel aux extract-elsevier-paper
Converts Elsevier paper with specified DOI code into a sequence of JSONL files each corresponding to a text
chunk, where each JSONL line is tokenized by sentence. Example: if provided path is dir/file
and the Paper text
contains two chunks, files dir/file_1.jsonl
and dir/file_2.jsonl
will be generated; otherwise, if the Paper
text contains one chunk, dir/file.jsonl
will be generated.
Usage:
$ chemrel aux extract-elsevier-paper [OPTIONS] DOI_CODE API_KEY JSONL_PATH
Arguments:
DOI_CODE
: DOI code of paper, not in URL form [required]API_KEY
: Elsevier API key [required]JSONL_PATH
: Filepath to save JSONL files to, ignores filename extension [required]
Options:
--char-limit INTEGER
: Character limit of each text chunk in generated Paper object--help
: Show this message and exit.
chemrel aux extract-paper
Converts paper PDF at specified path into a sequence of JSONL files each corresponding to a text chunk, where each
JSONL line is tokenized by sentence. Example: if provided path is dir/file
and the Paper text contains two
chunks, files dir/file_1.jsonl
and dir/file_2.jsonl
will be generated; otherwise, if the Paper text contains
one chunk, dir/file.jsonl
will be generated.
Usage:
$ chemrel aux extract-paper [OPTIONS] PAPER_PATH JSONL_PATH
Arguments:
PAPER_PATH
: File path of paper PDF [required]JSONL_PATH
: Filepath to save JSONL files to, ignores filename extension [required]
Options:
--char-limit INTEGER
: Character limit of each text chunk in generated Paper object--help
: Show this message and exit.
chemrel clean
Removes intermediate files to start data preparation and training from a clean slate.
Usage:
$ chemrel clean [OPTIONS]
Options:
--help
: Show this message and exit.
chemrel init
Initializes files required by package at given path.
Usage:
$ chemrel init [OPTIONS] [PATH]
Arguments:
[PATH]
: File path in which to initialize required files [default: ./]
Options:
--help
: Show this message and exit.
chemrel predict
Predicts the spans and/or relations in a given text using the given models.
Usage:
$ chemrel predict [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.
Commands:
rel
: Predicts spans and the relations between...span
: Predicts spans contained in given text and...
chemrel predict rel
Predicts spans and the relations between them contained in given text determined by the given models and prints them.
Usage:
$ chemrel predict rel [OPTIONS] SC_MODEL_PATH REL_MODEL_PATH TEXT
Arguments:
SC_MODEL_PATH
: File path of span categorization model to be used [required]REL_MODEL_PATH
: File path of relation extraction model to be used [required]TEXT
: Text content to predict spans within [required]
Options:
--help
: Show this message and exit.
chemrel predict span
Predicts spans contained in given text and prints them.
Usage:
$ chemrel predict span [OPTIONS] SC_MODEL_PATH TEXT
Arguments:
SC_MODEL_PATH
: File path of span categorization model to be used [required]TEXT
: Text content to predict spans within [required]
Options:
--help
: Show this message and exit.
chemrel rel
Configure and/or train a relation extraction model.
Usage:
$ chemrel rel [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.
Commands:
process-data
: Parses the gold-standard annotations from...test
: Applies the best relation extraction model...tl-cpu
: Trains the relation extraction (rel) model...tl-gpu
: Trains the relation extraction (rel) model...train-cpu
: Trains the relation extraction (rel) model...train-gpu
: Trains the relation extraction (rel) model...
chemrel rel process-data
Parses the gold-standard annotations from the Prodigy annotations.
Usage:
$ chemrel rel process-data [OPTIONS]
Options:
--annotations-file TEXT
: File path of Prodigy annotations [default: assets/goldrels.jsonl]--train-file TEXT
: File path of training data corpus [default: reldata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: reldata/dev.spacy]--test-file TEXT
: File path of test data corpus [default: reldata/test.spacy]--help
: Show this message and exit.
chemrel rel test
Applies the best relation extraction model to unseen text and measures accuracy at different thresholds.
Usage:
$ chemrel rel test [OPTIONS]
Options:
--trained-model TEXT
: File path of trained model to be used [default: reltraining/model-best]--test-file TEXT
: File path of test data corpus [default: reldata/test.spacy]--help
: Show this message and exit.
chemrel rel tl-cpu
Trains the relation extraction (rel) model using transfer learning on the CPU and evaluates it on the dev corpus.
Usage:
$ chemrel rel tl-cpu [OPTIONS]
Options:
--tl-tok2vec-config TEXT
: File path of config file for Tok2Vec span categorization model [default: configs/rel_TL_tok2vec.cfg]--train-file TEXT
: File path of training data corpus [default: reldata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: reldata/dev.spacy]--help
: Show this message and exit.
chemrel rel tl-gpu
Trains the relation extraction (rel) model with a Transformer using transfer learning on the GPU and evaluates it on the dev corpus.
Usage:
$ chemrel rel tl-gpu [OPTIONS]
Options:
--tl-trf-config TEXT
: File path of config file for transformer span categorization model [default: configs/rel_TL_trf.cfg]--train-file TEXT
: File path of training data corpus [default: reldata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: reldata/dev.spacy]--gpu-id TEXT
: The GPU device identifier to be used [default: 0]--help
: Show this message and exit.
chemrel rel train-cpu
Trains the relation extraction (rel) model on the CPU and evaluates it on the dev corpus.
Usage:
$ chemrel rel train-cpu [OPTIONS]
Options:
--tok2vec-config TEXT
: File path of config file for Tok2Vec span categorization model [default: configs/rel_tok2vec.cfg]--train-file TEXT
: File path of training data corpus [default: reldata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: reldata/dev.spacy]--help
: Show this message and exit.
chemrel rel train-gpu
Trains the relation extraction (rel) model with a Transformer on the GPU and evaluates it on the dev corpus.
Usage:
$ chemrel rel train-gpu [OPTIONS]
Options:
--trf-config TEXT
: File path of config file for transformer span categorization model [default: configs/rel_trf.cfg]--train-file TEXT
: File path of training data corpus [default: reldata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: reldata/dev.spacy]--gpu-id TEXT
: The GPU device identifier to be used [default: 0]--help
: Show this message and exit.
chemrel span
Configure and/or train a span categorization model.
Usage:
$ chemrel span [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.
Commands:
process-data
: Instructs to use the Prodigy function...test
: Applies the best span categorization model...tl-cpu
: Trains the span categorization (sc) model...tl-gpu
: Trains the span categorization (sc) model...train-cpu
: Trains the span categorization (sc) model...train-gpu
: Trains the span categorization (sc) model...
chemrel span process-data
Instructs to use the Prodigy function (data-to-spacy) for data processing.
Usage:
$ chemrel span process-data [OPTIONS]
Options:
--help
: Show this message and exit.
chemrel span test
Applies the best span categorization model to unseen text and measures accuracy at different thresholds.
Usage:
$ chemrel span test [OPTIONS]
Options:
--trained-model TEXT
: File path of trained model to be used [default: sctraining/model-best]--test-file TEXT
: File path of test data corpus [default: scdata/test.spacy]--gpu-id TEXT
: The GPU device identifier to be used--help
: Show this message and exit.
chemrel span tl-cpu
Trains the span categorization (sc) model using transfer learning on the CPU and evaluates it on the dev corpus.
Usage:
$ chemrel span tl-cpu [OPTIONS]
Options:
--tl-tok2vec-config TEXT
: File path of config file for Tok2Vec span categorization model [default: configs/sc_TL_tok2vec.cfg]--train-file TEXT
: File path of training data corpus [default: scdata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: scdata/dev.spacy]--help
: Show this message and exit.
chemrel span tl-gpu
Trains the span categorization (sc) model using transfer learning on the GPU and evaluates it on the dev corpus.
Usage:
$ chemrel span tl-gpu [OPTIONS]
Options:
--tl-trf-config TEXT
: File path of config file for transformer span categorization model [default: configs/sc_TL_trf.cfg]--train-file TEXT
: File path of training data corpus [default: scdata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: scdata/dev.spacy]--gpu-id TEXT
: The GPU device identifier to be used [default: 0]--help
: Show this message and exit.
chemrel span train-cpu
Trains the span categorization (sc) model on the CPU and evaluates it on the dev corpus.
Usage:
$ chemrel span train-cpu [OPTIONS]
Options:
--tok2vec-config TEXT
: File path of config file for Tok2Vec span categorization model [default: configs/sc_tok2vec.cfg]--train-file TEXT
: File path of training data corpus [default: scdata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: scdata/dev.spacy]--help
: Show this message and exit.
chemrel span train-gpu
Trains the span categorization (sc) model on the GPU and evaluates it on the dev corpus.
Usage:
$ chemrel span train-gpu [OPTIONS]
Options:
--trf-config TEXT
: File path of config file for transformer span categorization model [default: configs/sc_trf.cfg]--train-file TEXT
: File path of training data corpus [default: scdata/train.spacy]--dev-file TEXT
: File path of dev corpus [default: scdata/dev.spacy]--gpu-id TEXT
: The GPU device identifier to be used [default: 0]--help
: Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.