Skip to main content

A project which focuses on automating and transferring chemical data extraction using span categorization and relation extraction models.

Project description

chemrel

ChemREL Command Line Interface (CLI).

Automate and transfer chemical data extraction using span categorization and relation extraction models.

To initialize the assets required by the CLI, run the following command.

$ chemrel init

Usage:

$ chemrel [OPTIONS] COMMAND [ARGS]...

Options:

  • --install-completion: Install completion for the current shell.
  • --show-completion: Show completion for the current shell, to copy it or customize the installation.
  • --help: Show this message and exit.

Commands:

  • aux: Run one of a number of auxiliary data...
  • clean: Removes intermediate files to start data...
  • init: Initializes files required by package at...
  • predict: Predicts the spans and/or relations in a...
  • rel: Configure and/or train a relation...
  • span: Configure and/or train a span...

chemrel aux

Run one of a number of auxiliary data processing commands.

Usage:

$ chemrel aux [OPTIONS] COMMAND [ARGS]...

Options:

  • --help: Show this message and exit.

Commands:

  • extract-elsevier-paper: Converts Elsevier paper with specified DOI...
  • extract-paper: Converts paper PDF at specified path into...

chemrel aux extract-elsevier-paper

Converts Elsevier paper with specified DOI code into a sequence of JSONL files each corresponding to a text chunk, where each JSONL line is tokenized by sentence. Example: if provided path is dir/file and the Paper text contains two chunks, files dir/file_1.jsonl and dir/file_2.jsonl will be generated; otherwise, if the Paper text contains one chunk, dir/file.jsonl will be generated.

Usage:

$ chemrel aux extract-elsevier-paper [OPTIONS] DOI_CODE API_KEY JSONL_PATH

Arguments:

  • DOI_CODE: DOI code of paper, not in URL form [required]
  • API_KEY: Elsevier API key [required]
  • JSONL_PATH: Filepath to save JSONL files to, ignores filename extension [required]

Options:

  • --char-limit INTEGER: Character limit of each text chunk in generated Paper object
  • --help: Show this message and exit.

chemrel aux extract-paper

Converts paper PDF at specified path into a sequence of JSONL files each corresponding to a text chunk, where each JSONL line is tokenized by sentence. Example: if provided path is dir/file and the Paper text contains two chunks, files dir/file_1.jsonl and dir/file_2.jsonl will be generated; otherwise, if the Paper text contains one chunk, dir/file.jsonl will be generated.

Usage:

$ chemrel aux extract-paper [OPTIONS] PAPER_PATH JSONL_PATH

Arguments:

  • PAPER_PATH: File path of paper PDF [required]
  • JSONL_PATH: Filepath to save JSONL files to, ignores filename extension [required]

Options:

  • --char-limit INTEGER: Character limit of each text chunk in generated Paper object
  • --help: Show this message and exit.

chemrel clean

Removes intermediate files to start data preparation and training from a clean slate.

Usage:

$ chemrel clean [OPTIONS]

Options:

  • --help: Show this message and exit.

chemrel init

Initializes files required by package at given path.

Usage:

$ chemrel init [OPTIONS] [PATH]

Arguments:

  • [PATH]: File path in which to initialize required files [default: ./]

Options:

  • --help: Show this message and exit.

chemrel predict

Predicts the spans and/or relations in a given text using the given models.

Usage:

$ chemrel predict [OPTIONS] COMMAND [ARGS]...

Options:

  • --help: Show this message and exit.

Commands:

  • rel: Predicts spans and the relations between...
  • span: Predicts spans contained in given text and...

chemrel predict rel

Predicts spans and the relations between them contained in given text determined by the given models and prints them.

Usage:

$ chemrel predict rel [OPTIONS] SC_MODEL_PATH REL_MODEL_PATH TEXT

Arguments:

  • SC_MODEL_PATH: File path of span categorization model to be used [required]
  • REL_MODEL_PATH: File path of relation extraction model to be used [required]
  • TEXT: Text content to predict spans within [required]

Options:

  • --help: Show this message and exit.

chemrel predict span

Predicts spans contained in given text and prints them.

Usage:

$ chemrel predict span [OPTIONS] SC_MODEL_PATH TEXT

Arguments:

  • SC_MODEL_PATH: File path of span categorization model to be used [required]
  • TEXT: Text content to predict spans within [required]

Options:

  • --help: Show this message and exit.

chemrel rel

Configure and/or train a relation extraction model.

Usage:

$ chemrel rel [OPTIONS] COMMAND [ARGS]...

Options:

  • --help: Show this message and exit.

Commands:

  • process-data: Parses the gold-standard annotations from...
  • test: Applies the best relation extraction model...
  • tl-cpu: Trains the relation extraction (rel) model...
  • tl-gpu: Trains the relation extraction (rel) model...
  • train-cpu: Trains the relation extraction (rel) model...
  • train-gpu: Trains the relation extraction (rel) model...

chemrel rel process-data

Parses the gold-standard annotations from the Prodigy annotations.

Usage:

$ chemrel rel process-data [OPTIONS]

Options:

  • --annotations-file TEXT: File path of Prodigy annotations [default: assets/goldrels.jsonl]
  • --train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
  • --test-file TEXT: File path of test data corpus [default: reldata/test.spacy]
  • --help: Show this message and exit.

chemrel rel test

Applies the best relation extraction model to unseen text and measures accuracy at different thresholds.

Usage:

$ chemrel rel test [OPTIONS]

Options:

  • --trained-model TEXT: File path of trained model to be used [default: reltraining/model-best]
  • --test-file TEXT: File path of test data corpus [default: reldata/test.spacy]
  • --help: Show this message and exit.

chemrel rel tl-cpu

Trains the relation extraction (rel) model using transfer learning on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel tl-cpu [OPTIONS]

Options:

  • --tl-tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/rel_TL_tok2vec.cfg]
  • --train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
  • --help: Show this message and exit.

chemrel rel tl-gpu

Trains the relation extraction (rel) model with a Transformer using transfer learning on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel tl-gpu [OPTIONS]

Options:

  • --tl-trf-config TEXT: File path of config file for transformer span categorization model [default: configs/rel_TL_trf.cfg]
  • --train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
  • --gpu-id TEXT: The GPU device identifier to be used [default: 0]
  • --help: Show this message and exit.

chemrel rel train-cpu

Trains the relation extraction (rel) model on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel train-cpu [OPTIONS]

Options:

  • --tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/rel_tok2vec.cfg]
  • --train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
  • --help: Show this message and exit.

chemrel rel train-gpu

Trains the relation extraction (rel) model with a Transformer on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel rel train-gpu [OPTIONS]

Options:

  • --trf-config TEXT: File path of config file for transformer span categorization model [default: configs/rel_trf.cfg]
  • --train-file TEXT: File path of training data corpus [default: reldata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: reldata/dev.spacy]
  • --gpu-id TEXT: The GPU device identifier to be used [default: 0]
  • --help: Show this message and exit.

chemrel span

Configure and/or train a span categorization model.

Usage:

$ chemrel span [OPTIONS] COMMAND [ARGS]...

Options:

  • --help: Show this message and exit.

Commands:

  • process-data: Instructs to use the Prodigy function...
  • test: Applies the best span categorization model...
  • tl-cpu: Trains the span categorization (sc) model...
  • tl-gpu: Trains the span categorization (sc) model...
  • train-cpu: Trains the span categorization (sc) model...
  • train-gpu: Trains the span categorization (sc) model...

chemrel span process-data

Instructs to use the Prodigy function (data-to-spacy) for data processing.

Usage:

$ chemrel span process-data [OPTIONS]

Options:

  • --help: Show this message and exit.

chemrel span test

Applies the best span categorization model to unseen text and measures accuracy at different thresholds.

Usage:

$ chemrel span test [OPTIONS]

Options:

  • --trained-model TEXT: File path of trained model to be used [default: sctraining/model-best]
  • --test-file TEXT: File path of test data corpus [default: scdata/test.spacy]
  • --gpu-id TEXT: The GPU device identifier to be used
  • --help: Show this message and exit.

chemrel span tl-cpu

Trains the span categorization (sc) model using transfer learning on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel span tl-cpu [OPTIONS]

Options:

  • --tl-tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/sc_TL_tok2vec.cfg]
  • --train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
  • --help: Show this message and exit.

chemrel span tl-gpu

Trains the span categorization (sc) model using transfer learning on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel span tl-gpu [OPTIONS]

Options:

  • --tl-trf-config TEXT: File path of config file for transformer span categorization model [default: configs/sc_TL_trf.cfg]
  • --train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
  • --gpu-id TEXT: The GPU device identifier to be used [default: 0]
  • --help: Show this message and exit.

chemrel span train-cpu

Trains the span categorization (sc) model on the CPU and evaluates it on the dev corpus.

Usage:

$ chemrel span train-cpu [OPTIONS]

Options:

  • --tok2vec-config TEXT: File path of config file for Tok2Vec span categorization model [default: configs/sc_tok2vec.cfg]
  • --train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
  • --help: Show this message and exit.

chemrel span train-gpu

Trains the span categorization (sc) model on the GPU and evaluates it on the dev corpus.

Usage:

$ chemrel span train-gpu [OPTIONS]

Options:

  • --trf-config TEXT: File path of config file for transformer span categorization model [default: configs/sc_trf.cfg]
  • --train-file TEXT: File path of training data corpus [default: scdata/train.spacy]
  • --dev-file TEXT: File path of dev corpus [default: scdata/dev.spacy]
  • --gpu-id TEXT: The GPU device identifier to be used [default: 0]
  • --help: Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemrel-1.0.2.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemrel-1.0.2-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file chemrel-1.0.2.tar.gz.

File metadata

  • Download URL: chemrel-1.0.2.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for chemrel-1.0.2.tar.gz
Algorithm Hash digest
SHA256 116db82fdc6978a72aae3a855a9c36f0253c3c51046187e2efbb4c7b68e0b3ff
MD5 40eb6dbe3e4a816b3f78ba79f530a796
BLAKE2b-256 45770822f35b16444bf63bd981b9cb4a8be8cebc908f64d6892bb8f3286fac7f

See more details on using hashes here.

File details

Details for the file chemrel-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: chemrel-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for chemrel-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 51cf16cbf8a9fd013ad0a8d688717b79bfba469a0ce0e5efd92cb7c96dd783d6
MD5 7cdd70136474582241273606d54db774
BLAKE2b-256 29817682817a318cc1e24ce8d1a7b69cf4d6f0934d06fa6ef5f1d17dcb1dfa41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page