Skip to main content

Open-source PHI-filtering software. A fork of philter-ucsf.

Project description

UCSF Philter Lite

Python Package PyPI version

philter_lite is a fork of the wonderful work done here: https://github.com/BCHSI/philter-ucsf

The fork aims to tailor the functionality to more of a production-level setup. This includes:

  • Stateless functions
  • Stronger type-checking, hints, and data contracts
  • Improved unit test coverage (hopefully)

It does this at the expense of breaking the model evaluation functionality provided in the original library. If you are developing a new set of filters, it is recommended that you evaluate them using them in the original UCSF Philter. You can then

There are some minor memory improvements here, and no known performance improvements; the main goal is to improve stability and extensibility of the code, but there shouldn't be the expectation that it will run faster.

Citations

If you use this software for any publication, please cite: Norgeot, B., Muenzen, K., Peterson, T.A. et al. Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes. npj Digit. Med. 3, 57 (2020). https://doi.org/10.1038/s41746-020-0258-y

Installing Philter

To install Philter from PyPi, run the following command:

pip3 install philter-lite

The main philter CLI can be executed by running:

philter_lite

Running Philter: A Step-by-Step Guide

Philter is a command-line based clinical text de-identification software that removes protected health information (PHI) from any plain text file. Although the software has built-in evaluation capabilities and can compare Philter PHI-reduced notes with a corresponding set of ground truth annotations, annotations are not required to run Philter. The following steps may be used to 1) run Philter in the command line without ground truth annotations, or 2) generate Philter-compatible annotations and run Philter in evaluation mode using ground truth annotations. Although any set of notes and corresponding annotations may be used with Philter, the examples provided here will correspond to the I2B2 dataset, which Philter uses in its default configuration.

Before running Philter either with or without evaluation, make sure to familiarize yourself with the various options that may be used for any given Philter run:

Flags:

usage: philter [-h] [-i INPUT] [-a ANNO] [-o OUTPUT] [-f FILTERS] [-x XML]
               [-c COORDS] [--eval_output EVAL_OUTPUT] [-v VERBOSE]
               [-e RUN_EVAL] [-t FREQ_TABLE] [-n INITIALS]
               [--outputformat OUTPUTFORMAT] [--ucsfformat UCSFFORMAT]
               [--prod PROD] [--cachepos CACHEPOS]

Philter -- PHI filter for clinical notes

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to the directory or the file that contains the
                        PHI note, the default is ./data/i2b2_notes/
  -a ANNO, --anno ANNO  Path to the directory or the file that contains the
                        PHI annotation, the default is ./data/i2b2_anno/
  -o OUTPUT, --output OUTPUT
                        Path to the directory to save the PHI-reduced notes
                        in, the default is ./data/i2b2_results/
  -f FILTERS, --filters FILTERS
                        Path to our config file, the default is
                        ./configs/integration_1.json
  -x XML, --xml XML     Path to the json file that contains all xml data
  -c COORDS, --coords COORDS
                        Path to the json file that contains the coordinate map
                        data
  --eval_output EVAL_OUTPUT
                        Path to the directory that the detailed eval files
                        will be outputted to
  -v VERBOSE, --verbose VERBOSE
                        When verbose is true, will emit messages about script
                        progress
  -e RUN_EVAL, --run_eval RUN_EVAL
                        When run_eval is true, will run our eval script and
                        emit summarized results to terminal
  -t FREQ_TABLE, --freq_table FREQ_TABLE
                        When freqtable is true, will output a unigram/bigram
                        frequency table of all note words and their PHI/non-
                        PHI counts
  -n INITIALS, --initials INITIALS
                        When initials is true, will include initials PHI in
                        recall/precision calculations
  --outputformat OUTPUTFORMAT
                        Define format of annotation, allowed values are
                        "asterisk", "i2b2". Default is "asterisk"
  --ucsfformat UCSFFORMAT
                        When ucsfformat is true, will adjust eval script for
                        slightly different xml format
  --prod PROD           When prod is true, this will run the script with
                        output in i2b2 xml format without running the eval
                        script
  --cachepos CACHEPOS   Path to a directoy to store/load the pos data for all
                        notes. If no path is specified then memory caching
                        will be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

philter_lite-0.6.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

philter_lite-0.6.0-py3-none-any.whl (2.3 MB view details)

Uploaded Python 3

File details

Details for the file philter_lite-0.6.0.tar.gz.

File metadata

  • Download URL: philter_lite-0.6.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.14.5 Linux/6.17.0-1013-azure

File hashes

Hashes for philter_lite-0.6.0.tar.gz
Algorithm Hash digest
SHA256 f4af81d53668c208aa911c7924412266fcc53a1783ee927a031b81437079909f
MD5 bd782e6abd6d4054718647968b76855a
BLAKE2b-256 29c72c8414739385bf02fb0026cd1dfbd1bbe509ec1907d3b986aaa59c7b9805

See more details on using hashes here.

File details

Details for the file philter_lite-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: philter_lite-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.14.5 Linux/6.17.0-1013-azure

File hashes

Hashes for philter_lite-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39f667c73699ea0e67046f2b4272894a148bfb50d8cff2bcabe898652e389879
MD5 8889ce1f8679ae73336129907e6a5dcd
BLAKE2b-256 9ced75db221ff4f94267b168a0e8d186e4bfbbab49eb414df6141353712bdfa9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page