Skip to main content

A CLI for extracting drugs from text records

Project description

logo

Drug Extraction CLI

Demo

demo-gif

Description

This application takes a CSV file and parses text records from another CSV file to detect and extract search term mentions using string similarity algorithms to account for common misspellings. It is named for the drug searching it does most commonly for us at IPOP but is flexible enough to accept any type search terms.

NOTE: In our text-preprocessing, we specifically allow hyphens ("-") to to their frequency in drug terminologies. If you want to see this functionality removed or put behind a feature flag, please file an Issue.

If you are wondering about specific use cases, check out the Examples folder!

Requires

  • cargo package manager (rust toolchain)
  • just (optional dev-dependency if you clone this repo)
  • Valid UTF-8 encoded CSV data

Installation

To install the drug-extraction-cli application, simply:

Python Developers / Data Scientists

Please use pipx since it is designed specifically for this use case of installing Python CLI apps into isolated virtual environments.

pipx install extract-drugs

Rust Developers

cargo install drug-extraction-cli

IMPORTANT! Both of these will install an executable called extract-drugs.

No matter how you install the package from either packaging index, the binary program will be named extract-drugs for more intuitive commands.

INFO: The naming discrepancy is due to to how maturin handles package names and wanting to both keep the same CLI command/name and maintain the Rust namespace. Apologies, but you'll be fine 🙂.

Usage

This application has two commands: interactive and search. Both of these commands have the same underlying functionality, the latter allows you to pass command-line arguments and is better suited to automated processing or advanced users while the former allows interactive declaration of the same configuration options and is better for new or first time users.

API documentation for the library can be found on docs.rs.

Interactive

This will present you with a series of prompts to help you select correct options. Highly recommended for new users or one-off runs.

Usage:

extract-drugs interactive

This command is demoed in the GIF above.

Search

search functions the same as interactive but allows you to declaratively provide the configuration options.

Output Data Dictionary

This tool will output an output.csv file with the following format:

Column Name Description Data Type Limits/Ranges
row_id Identifier from --id-col if provided, else line number of row in --data-file String None
search_term The search term, cleaned and normalized. This is the actual term that was compared. String None
matched_term The matched term, cleaned and normalized. This is the actual term that was compared. String None
edits The osa edit distance Integer 0-2 (top limit due to exclusion filter)
similarity_score The jaro_winkler similarity score Float 0.95-1.0 (bottom limit due to exclusion filter)
search_field The field that this match was found in, from --search-cols String None
metadata The attached metadata to search_term in the search_terms file String or None None

Examples

For a whole showcase of example runs of this tool check out the shell scripts inside the examples folder.

For a showcase of potential analytical value that can be derived from running this tool, checkout the Jupyter Notebooks in the same folder!

Support

If you encounter any issues or need support please either contact @nanthony007 or open an issue.

Contributing

See CONTRIBUTING.md.

MIT License

LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_drugs-1.4.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extract_drugs-1.4.0-py3-none-macosx_11_0_arm64.whl (691.4 kB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file extract_drugs-1.4.0.tar.gz.

File metadata

  • Download URL: extract_drugs-1.4.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for extract_drugs-1.4.0.tar.gz
Algorithm Hash digest
SHA256 2429ffb1d57f78b2bdf0abdffbaf50e394fafa0ac7501a5ee446f5e0f26f3ec8
MD5 ecedd9dd3739b0e50cc6f1b6718edee7
BLAKE2b-256 967b56e47bd4964cec2d6c816ff696dc626faa9f6bec90abc177ae8f683ed86e

See more details on using hashes here.

File details

Details for the file extract_drugs-1.4.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for extract_drugs-1.4.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ee6ceb64060a8046b59c696e5548bb36d10e01202c8e727748db25b62adc07c1
MD5 a2c24e4fdb96a74e74b44e4411aefd66
BLAKE2b-256 0979901565e9d160fe5f154d585aea2d346540ac402ba463f91fd9ae96e3f586

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page