Skip to main content

A python implementation of IAMsystem algorithm

Project description

iamsystem

test Linux PyPI version fury.io PyPI license PyPI pyversions Code style: black

A python implementation of IAMsystem algorithm, a fast dictionary-based approach for semantic annotation, a.k.a entity linking.

Installation

pip install iamsystem

Usage

You provide a list of keywords you want to detect in a document, you can add and combine abbreviations, normalization methods (lemmatization, stemming) and approximate string matching algorithms, IAMsystem algorithm performs the semantic annotation.

See the documentation for the configuration details.

Quick example

from iamsystem import Matcher

matcher = Matcher.build(
    keywords=["North America", "South America"],
    stopwords=["and"],
    abbreviations=[("amer", "America")],
    spellwise=[dict(measure="Levenshtein", max_distance=1)],
    w=2,
)
annots = matcher.annot_text(text="Northh and south Amer.")
for annot in annots:
    print(annot)
# Northh Amer	0 6;17 21	North America
# south Amer	11 21	South America

Algorithm

The algorithm was developed in the context of a PhD thesis. It proposes a solution to quickly annotate documents using a large dictionary (> 300K keywords) and fuzzy matching algorithms. No string distance algorithm is implemented in this package, it imports and leverages external libraries like spellwise, pysimstring and nltk. Its algorithmic complexity is O(n(log(m))) with n the number of tokens in a document and m the size of the dictionary. The formalization of the algorithm is available in this paper.

The algorithm was initially developed in Java (https://github.com/scossin/IAMsystem). It has participated in several semantic annotation competitions in the medical field where it has obtained satisfactory results, for example by obtaining the best results in the Codiesp shared task. A dictionary-based model can achieve close performance to a transformer-based model when the task is simple or when the training set is small. Its main advantage is its speed, which allows a baseline to be generated quickly.

Citation

@article{cossin_iam_2018,
	title = {{IAM} at {CLEF} {eHealth} 2018: {Concept} {Annotation} and {Coding} in {French} {Death} {Certificates}},
	shorttitle = {{IAM} at {CLEF} {eHealth} 2018},
	url = {http://arxiv.org/abs/1807.03674},
	urldate = {2018-07-11},
	journal = {arXiv:1807.03674 [cs]},
	author = {Cossin, Sébastien and Jouhet, Vianney and Mougin, Fleur and Diallo, Gayo and Thiessard, Frantz},
	month = jul,
	year = {2018},
	note = {arXiv: 1807.03674},
	keywords = {Computer Science - Computation and Language},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iamsystem-0.5.1.tar.gz (65.8 kB view details)

Uploaded Source

Built Distribution

iamsystem-0.5.1-py3-none-any.whl (56.1 kB view details)

Uploaded Python 3

File details

Details for the file iamsystem-0.5.1.tar.gz.

File metadata

  • Download URL: iamsystem-0.5.1.tar.gz
  • Upload date:
  • Size: 65.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.7

File hashes

Hashes for iamsystem-0.5.1.tar.gz
Algorithm Hash digest
SHA256 cd3f03680ca39438853efeb86685dc35d043990cd2a073f8db081a1ece3b44e4
MD5 d5b9f92f746d1f09f12f0a3badca72af
BLAKE2b-256 362d2c8e4adbb9e75f893c801c246a7600f1f76258c7ccabd89a24b6f3cdbc4b

See more details on using hashes here.

File details

Details for the file iamsystem-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: iamsystem-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 56.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.7

File hashes

Hashes for iamsystem-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 236e2ea4e21c58e1bc2a8f3518c88c6c8f32fb3bc736b38d030b1d5e2f6ac01b
MD5 10e1856d1d41b00bda85c0aa96755308
BLAKE2b-256 8920fda2a6bae7c67efa4f570c8fd4344c17acf98fbf2947c30995795682036b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page