CRATE: clinical records anonymisation and text extraction
Project description
Purpose
Anonymises relational databases.
Performs some specific preprocessing tasks; e.g.
preprocesses some specific databases (e.g. Servelec RiO EMR);
drafts a “data dictionary” for anonymisation, with special knowledge of some databases (e.g. TPP SystmOne);
fetches some word lists, e.g. forenames/surnames/eponyms.
Provides a natural language processing (NLP) pipeline.
Web app for
querying the anonymised database
managing a consent-to-contact process
Documentation
Sources
Python package: https://pypi.org/project/crate-anon/
Source code: https://github.com/RudolfCardinal/crate
Licence
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
Licensed under the GNU GPL v3+: see LICENSE file.
Some third-party libraries have slightly different licences:
aspects of CamAnonGatePipeline.java are based on demonstration GATE code, copyright (C); University of Sheffield, and licensed under the GNU LGPL; see https://gate.ac.uk/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.