Skip to main content

A Python library for a learning health system

Project description

medkit

medkit logo

CI docs status pre-commit status test: status
Package PyPI version PyPI Python versions
Project License: MIT Formatter: Ruff Project: Hatch

medkit is a toolkit for a learning health system, developed by the HeKA research team.

This python library aims at:

  1. Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features.

  2. Developing supervised models from these various modalities for decision support in healthcare.

Installation

To install medkit with basic functionalities:

pip install medkit-lib

To install medkit with all its optional features:

pip install 'medkit-lib[all]'

Example

A basic named-entity recognition pipeline using medkit:

# 1. Define individual operations.
from medkit.text.preprocessing import CharReplacer, LIGATURE_RULES, SIGN_RULES
from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer
from medkit.text.context.negation_detector import NegationDetector
from medkit.text.ner.hf_entity_matcher import HFEntityMatcher

# Preprocessing
char_replacer = CharReplacer(rules=LIGATURE_RULES + SIGN_RULES)
# Segmentation
sent_tokenizer = SentenceTokenizer(output_label="sentence")
synt_tokenizer = SyntagmaTokenizer(output_label="syntagma")
# Negation detection
neg_detector = NegationDetector(output_label="is_negated")
# Entity recognition
entity_matcher = HFEntityMatcher(model="my-BERT-model", attrs_to_copy=["is_negated"])

# 2. Combine operations into a pipeline.
from medkit.core.pipeline import Pipeline, PipelineStep

ner_pipeline = Pipeline(
    input_keys=["full_text"],
    output_keys=["entities"],
    steps=[
        PipelineStep(char_replacer, input_keys=["full_text"], output_keys=["clean_text"]),
        PipelineStep(sent_tokenizer, input_keys=["clean_text"], output_keys=["sentences"]),
        PipelineStep(synt_tokenizer, input_keys=["sentences"], output_keys=["syntagmas"]),
        PipelineStep(neg_detector, input_keys=["syntagmas"], output_keys=[]),
        PipelineStep(entity_matcher, input_keys=["syntagmas"], output_keys=["entities"]),
    ],
)

# 3. Run the NER pipeline on a BRAT document.
from medkit.io import BratInputConverter

docs = BratInputConverter().load(path="/path/to/dataset/")
entities = ner_pipeline.run([doc.raw_segment for doc in docs])

Getting started

To get started with medkit, please checkout our documentation.

This documentation also contains tutorials and examples showcasing the use of medkit for different tasks.

Contributing

Thank you for your interest into medkit !

We'll be happy to get your inputs !

If your problem has not been reported by another user, please open an issue, whether it's for:

  • reporting a bug,
  • discussing the current state of the code,
  • submitting a fix,
  • proposing new features,
  • or contributing to documentation, ...

If you want to propose a pull request, you can read CONTRIBUTING.md.

Contact

Feel free to contact us by sending an email to medkit-maintainers@inria.fr.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medkit_lib-0.17.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medkit_lib-0.17.0-py3-none-any.whl (286.6 kB view details)

Uploaded Python 3

File details

Details for the file medkit_lib-0.17.0.tar.gz.

File metadata

  • Download URL: medkit_lib-0.17.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for medkit_lib-0.17.0.tar.gz
Algorithm Hash digest
SHA256 ae8a256e7f9f4d9e135e2e2e7bc31f7baf7f8ecb05488440d83c5ccb1bd04ec5
MD5 983f5d18acf2f5777faa6aa51c7074a4
BLAKE2b-256 d83a1a2a4e97b4145553a663def10394ca3e07a104346e94b28a6394f05d1c06

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_lib-0.17.0.tar.gz:

Publisher: release.yaml on medkit-lib/medkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_lib-0.17.0-py3-none-any.whl.

File metadata

  • Download URL: medkit_lib-0.17.0-py3-none-any.whl
  • Upload date:
  • Size: 286.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for medkit_lib-0.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cedc75e900731fda7b83cd048b6ee0508f5e296a5720cd41647f0f22a5347ad2
MD5 ea85f421d120f270a3d0077fa89ee5e7
BLAKE2b-256 7a49d2987379e77f0447070bedc8342870fc751ae0bba25a870e6aaf79b7791c

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_lib-0.17.0-py3-none-any.whl:

Publisher: release.yaml on medkit-lib/medkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page