Skip to main content

CRATE: clinical records anonymisation and text extraction

Project description

# CRATE
**Clinical Records Anonymisation and Text Extraction (CRATE)**

## Purpose
- Anonymises relational databases.
- Operates a GATE natural language processing (NLP) pipeline.
- Includes a tool to audit all MySQL queries (with user details) via a TCP
proxy.
- Web app for
- querying the anonymised database
- managing a consent-to-contact process

## Directory structure with key files

- `anonymise/`
- **`anonymise.py`** – core program
- `launch_makedata.sh` – launcher for make_demo_database.py
- `launch_multiprocess_anonymiser.sh` – parallel processing
(multiprocess) launcher for anonymise.py
- `make_demo_database.py` – creates a demonstration database
- `test_anonymisation.py` – generates a comparison of records between
source and destination databases, to check anonymisation.

- `bug_reports/` – relating to bugs in others' code

- `built_packages/` – workspace to store new Debian package files

- **`crateweb/`** – Django web application, as above

- `ditched/` – ignored

- **`docs/`** – documentation

- `mysql_auditor/` – auditing tool for MySQL
- `mysql_auditor.conf` – sample configuration file; edit for your own
needs.
- `mysql_auditor.sh` – launcher for mysql-proxy with auditing script;
it fires up mysql-proxy (which communicates with MySQL on port A and makes
another MySQL instance appear on port B, inserting a script in between);
it stores the stdout/stderr output from the script in a disk log if
requested.
- `query_auditor_mysqlproxy.lua` – Lua script that implements the
auditor; this is used by the external mysql-proxy tool; its output is to
stdout/stderr.

- `nlp_manager/` – NLP interface tool
- `buildjava.sh` – script to compile the necessary Java source on your
machine
- `CamAnonGatePipeline.java` – Java code to interface between
nlp_manager.py (via stdin/stdout) and the Java-based external GATE tools
(via code); must be compiled before use
- `launch_multiprocess_nlp.sh` – parallel processing (multiprocess)
launcher for nlp_manager.py
- `nlp_manager.py` – core program to pipe parts of a database to a GATE
program and insert the output back into a database; uses
CamAnonGatePipeline.java to communicate with the NLP app
- `runjavademo.sh` – directly executes CamAnonGatePipeline using the
ANNIE demo GATE app, for testing

- `pythonlib/` – common RNC python libraries (a Git subtree)

- `tools/`
- **`install_virtualenv.sh`** – creates a suitable virtualenv for CRATE
- ...

- `working/` – ignored

- `changelog.Debian` – Debian package changelog and general version history
- `LICENCE` – Apache license applicable to CRATE
- `README.md` – this file
- `requirements.txt` – Python PIP requirements
- `requirements-ubuntu.txt` – Ubuntu/Debian package requirements
- `VERSION.txt` – package version number, read by package build script

## Copyright/licensing

- CRATE: copyright © 2015-2015 Rudolf Cardinal (rudolf@pobox.com).
- Licensed under the Apache License, version 2.0: see LICENSE file.
- Third-party code/libraries included:
- aspects of CamAnonGatePipeline.java are based on demonstration GATE code,
copyright © University of Sheffield, and licensed under the GNU LGPL
(which license is therefore used for npl_manager/CamAnonGatePipeline.java;
q.v.).

Project details


Release history Release notifications | RSS feed

This version

0.14

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page