Estimate the similarity of medical concepts based on Unified Medical Language System (UMLS)
Project description
UMLS-Similarity
Estimate the similarity of medical concepts based on Unified Medical Language System (UMLS) and WordNet
Installation
First of all, please install Perl environment (Strawberry).
For UMLS use:
-
Install MySQL and MySQL Workbench and the MySQL Home folder should not have space in its path;
-
Download the UMLS and extract the subset;
-
Goto UMLS's META and NET folders and Load UMLS data into MySQL database with scripts;
-
Install necessary libs with 'cpanm' command with the flag
--force
like below:cpanm UMLS::Interface --force cpanm UMLS::Similarity --force
Errors may occur in the above process, just ignore them.
-
Please check if you have installed
DBI
,DBD::mysql
; install them if not;-
Issue: mysql.xs.dll not found problem, please found more details in link.
-
Solution: Copying C:\strawberry\c\bin\libmysql.dll_ to c:\strawberry\perl\vendor\lib\auto\mysql
-
-
Finished!
For WordNet use (skip it if not)
- Download the WordNet-2.1 if you want to use WordNet Similarity (if not, please skip)
- Set WNHome environment variables (if you need to use WordNet Similarity)
- Install
WordNet::QueryData
viacpanm
command in perl - Install
WordNet::Similarity
viacpanm
command in perl - Finished!
Finally, install our Python package umls-similrity
via pip
pip install umls-similarity
Available similarity measures
- Leacock and Chodorow (1998) referred to as lch
- Wu and Palmer (1994) referred to as wup
- Zhong, et al. (2002) referred to as zhong
- The basic path measure referred to as path
- The undirected path measure referred to as upath
- Rada, et. al. (1989) referred to as cdist
- Nguyan and Al-Mubaid (2006) referred to as nam
- Resnik (1996) referred to as res
- Lin (1988) referred to as lin
- Jiang and Conrath (1997) referred to as jcn
- The vector measure referred to as vector
- Pekar and Staab (2002) referred to as pks
- Pirro and Euzenat (2010) referred to as faith
- Maedche and Staab (2001) referred to as cmatch
- Batet, et al (2011) referred to as batet
- Sanchez, et al. (2012) referred to as sanchez
Let Codes Speak
Example Code 1: Estimate similarity between two medical concepts using UMLS
from umls_similarity.umls import UMLSSimilarity
import os
if __name__ == "__main__":
# define MySQL information that stores UMLS data in your computer
mysql_info = {}
mysql_info["database"] = "umls"
mysql_info["username"] = "root"
mysql_info["password"] = "{I am not gonna tell you}"
mysql_info["hostname"] = "localhost"
# Perl bin's path which will be automatically detected by the lib, but you can also manually specify in its constructor
# perl_bin_path = r"C:\Strawberry\perl\bin\perl"
# create an instance
umls_sim = UMLSSimilarity(mysql_info=mysql_info,
# perl_bin_path=''
)
# show the names of all available measures so you can pass them into the following `measure` parameter
measures=umls_sim.get_all_measures()
print(measures)
# Directly pass two CUIs into the function below:
sims = umls_sim.similarity(cui1="C0017601", cui2="C0232197", measure="lch")
print(sims[0]) # only one pair with two concepts
# Or batch process many CUI pairs from a text file where each line is formatted like 'C0006949<>C0031507'
current_path = os.path.dirname(os.path.realpath(__file__))
sims = umls_sim.similarity_from_file(current_path + r"\cuis_umls_sim.txt", measure="lch")
for sim in sims:
print(sim)
Example Code 2: Estimate similarity between concept using WordNet 2.1
from umls_similarity.wordnet import WNSimilarity
if __name__ == "__main__":
wn_root_path = r"C:\Program Files (x86)\WordNet\2.1"
# perl_bin_path=r"C:\Strawberry\perl\bin\perl"
var1 = "dog#n#1"
var2 = "orange#n#1"
wn_sim = WNSimilarity(wn_root_path=wn_root_path)
sims = wn_sim.similarity(var1, var2)
print(sims)
for k, v in enumerate(sims):
print(k, '\t', v, '\t', sims[v])
Credits
This project is a wrapper of the Perl library of UMLS::Similarity and UMLS::Interface.
Note: There are plenty of unexpected errors to occur during the installation of the perl library of UMLS::Similarity
, possibly because I am not an expert about Perl and its library use.
License
The umls-similarity
Python package is provided by Donghua Chen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file umls-similarity-0.0.2.tar.gz
.
File metadata
- Download URL: umls-similarity-0.0.2.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc6f7a484313bbf5be3443fb21c4f510332c980204cb71f0f0853c4f83bbbc13 |
|
MD5 | 6ca32ff01edb86000af675b025e5ba92 |
|
BLAKE2b-256 | 18522648fb51ec1f4762a6f4b84720649853a735c8b05806d44781659e277951 |
File details
Details for the file umls_similarity-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: umls_similarity-0.0.2-py3-none-any.whl
- Upload date:
- Size: 55.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bccd2c32ac1bbd122a13a7387b16b9d91fe395a5eb9816c420aaecb2e126c8ce |
|
MD5 | ecd760a05376fe0210a8ab6de4714dae |
|
BLAKE2b-256 | dd2ee9660fd96d590d1816c3a31842685139270c2cb19a78b189fb4eac189350 |