Skip to main content

A CLI for extracting drugs from text records

Project description

logo

Drug Extraction CLI

Demo

demo-gif

Description

This application takes a CSV file and parses text records from another CSV file to detect and extract search term mentions using string similarity algorithms to account for common misspellings. It is named for the drug searching it does most commonly for us at IPOP but is flexible enough to accept any type search terms.

NOTE: In our text-preprocessing, we specifically allow hyphens ("-") to to their frequency in drug terminologies. If you want to see this functionality removed or put behind a feature flag, please file an Issue.

If you are wondering about specific use cases, check out the Examples folder!

Requires

  • cargo package manager (rust toolchain)
  • just (optional dev-dependency if you clone this repo)
  • Valid UTF-8 encoded CSV data

Installation

To install the drug-extraction-cli application, simply:

Python Developers / Data Scientists

Please use pipx since it is designed specifically for this use case of installing Python CLI apps into isolated virtual environments.

pipx install extract-drugs

Rust Developers

cargo install drug-extraction-cli

IMPORTANT! Both of these will install an executable called extract-drugs.

No matter how you install the package from either packaging index, the binary program will be named extract-drugs for more intuitive commands.

INFO: The naming discrepancy is due to to how maturin handles package names and wanting to both keep the same CLI command/name and maintain the Rust namespace. Apologies, but you'll be fine 🙂.

Usage

This application has two commands: interactive and search. Both of these commands have the same underlying functionality, the latter allows you to pass command-line arguments and is better suited to automated processing or advanced users while the former allows interactive declaration of the same configuration options and is better for new or first time users.

API documentation for the library can be found on docs.rs.

Interactive

This will present you with a series of prompts to help you select correct options. Highly recommended for new users or one-off runs.

Usage:

extract-drugs interactive

This command is demoed in the GIF above.

Search

search functions the same as interactive but allows you to declaratively provide the configuration options.

Output Data Dictionary

This tool will output an output.csv file with the following format:

Column Name Description Data Type Limits/Ranges
row_id Identifier from --id-col if provided, else line number of row in --data-file String None
search_term The search term, cleaned and normalized. This is the actual term that was compared. String None
matched_term The matched term, cleaned and normalized. This is the actual term that was compared. String None
edits The osa edit distance Integer 0-2 (top limit due to exclusion filter)
similarity_score The jaro_winkler similarity score Float 0.95-1.0 (bottom limit due to exclusion filter)
search_field The field that this match was found in, from --search-cols String None
metadata The attached metadata to search_term in the search_terms file String or None None

Examples

For a whole showcase of example runs of this tool check out the shell scripts inside the examples folder.

For a showcase of potential analytical value that can be derived from running this tool, checkout the Jupyter Notebooks in the same folder!

Support

If you encounter any issues or need support please either contact @nanthony007 or open an issue.

Contributing

See CONTRIBUTING.md.

MIT License

LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_drugs-1.3.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

extract_drugs-1.3.0-py3-none-macosx_11_0_arm64.whl (653.9 kB view details)

Uploaded Python 3 macOS 11.0+ ARM64

File details

Details for the file extract_drugs-1.3.0.tar.gz.

File metadata

  • Download URL: extract_drugs-1.3.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for extract_drugs-1.3.0.tar.gz
Algorithm Hash digest
SHA256 3a648091ab4231c6101d89b235b6e18efafb172056508a55d6c84c67fc93a2b4
MD5 cfecea818d65be0e99705f1b78d10029
BLAKE2b-256 99fae8f0cc77d2b265c693e4804eca6abac02379196755bfb0be001f9ee82b58

See more details on using hashes here.

File details

Details for the file extract_drugs-1.3.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for extract_drugs-1.3.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 344393ea6f34de886ce960c19c2a508f178c8d3fd18a33d04b4640df8543496b
MD5 f74a55196dd98b730f69d5e23e117d2c
BLAKE2b-256 312b2f4806aff7c4ab64a3f73ee7d2a6f570c0a49101c775e8fe888df110de8c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page