Skip to main content

Model storage and lifecycle management for MedCAT with support for local, remote [TODO], and cloud [TODO] backends.

Project description

MedCAT-den

This is a remote (or user / machine local) storage addon for MedCAT. The idea is that instead of having duplicate models on disk for every project and user, you can reuse the ones centrally saved somewhere else.

However, the nuance here is that the model still gets loaded into memory locally. This is so that we do not have to worry about data moving the underlying text data between machines.

The idea / functionality

The current idea is that the den will provide the following functionality

  • Allow listing of available models
  • Allow fetching a model
  • Allow pushing a model (after fine tuning)

The workflow for the user would be (roughly) as follows:

  • Instantiate a den instance
    • The default could be set up solely by environmental values
    • Though if nothing is set, it may default to a lower-functionality user-specific storage
  • Use the den to get a list of available models
    • Each model has a hash and some description
  • Fetch a model from the den to use in local memory
    • This downloads the model pack to a temporary folder
    • Then extracts it there
    • Subsequently loads it into memory
    • And removes the temporary files
  • Use model as needed
    • Either inference
    • Or fine-tuning
  • If fine-tuning done, can push it back to the den
    • This will pack up the model to a temporary folder
    • Then push the .zip as an experiment to the remote

How to use

To use a the MedCAT-den, juse get yourself a den:

from medcat_den.den import get_default_den
den = get_default_den()

And then list your available models:

models = den.list_available_models()
print("Models:", models)

Then get the specific model pack

cat = den.fetch_model(models[0])

Once you're done with your model, you can push it back using

den.push_model(cat, "Did some fine-tuning")

Injecting the den to CAT.load_model_pack

There is now the option to inject the den functionality directly into CAT.load_model_pack. That is to say, if this is used, CAT.load_model_pack will use MedCAT-Den to fetch the model from either the remote or local den.

There are a number of ways to do this.

  1. You can use the context manager approach:
    from medcat_den.injection import injected_den
    with injected_den():
      pass # Do the model load
    # now the injection is turned off
    
  2. You can directly call the injector before using anything else:
    from medcat_den.injection import inject_into_medcat, uninject_into_medcat
    inject_into_medcat()
    # do the model load(s)
    uninject_into_medcat()  # undo injection
    

As a note, the inject_into_medcat and injected_den methods allow you pass a few options:

  • den_getter: The method to get the relevant den for your use case. Defaults to get_default_den.
  • model_name_mapper: The model name mapper (if specified). Can either be a dict based mapping or a function based one.
  • prefix: The prefix for the Den-based models. If not specified, all models are expected to be den-based ones. Otherwise, only prefixed models will be loaded from the den and others will be loaded off disk.

Using the den as a runtime injection target

Given the above, one might find themselves in a situation where they want to run the injector as part of an entire runtime. There's ways of doing that as well:

  1. Running a module
    # instead of
    python -m path.to.my_module arg1 arg2
    # you can do
    python -m medcat_den --with-injection -m path.to.my_module arg1 arg2
    
  2. Running a script
    # instead of
    python -m path/to/my_module.py arg1 arg2
    # you can do
    python -m medcat_den --with-injection python -m path/to/my_module.py arg1 arg2
    
  3. Running a code string
    # instead of
    python -c "from my_module import do_my_stuff;do_my_stuff()"
    # you can do
    python -m medcat_den --with-injection python -c "from my_module import do_my_stuff;do_my_stuff()"
    
  4. Running interactively
    # instead of
    python
    # you can do
    python -m medcat_den --with-injection python -i
    
  5. Running interactively after something else (i.e a module or a script)
    # instead of
    python <whatever>
    # you can do
    python -m medcat_den --with-injection python -i <whatever>
    

Settings

The above created a default den. If not prior configuraiton is done, this will be a user-local model cache.

However, there's a set of environmental variables that can be set in order to curate the default den:

Environmetnal variable name Values Description Comments
MEDCAT_DEN_TYPE LOCAL_USER, LOCAL_MACHINE The type of den to use Currently, only local dens have been implemented, but remote (e.g MedCATtery or even cloud) options can be implemented.
MEDCAT_DEN_PATH str The save path (for local backends) This is normally automatically specified based on OS and whether it's user or machien local. But can be overwritten here as well.
MEDCAT_DEN_REMOTE_HOST str The host path to the remote (e.g MedCATtery) This is currently not yet implemented
MEDCAT_DEN_LOCAL_CACHE_PATH str The local cache path (if required). This allows caching of models from remote dens
MEDCAT_DEN_LOCAL_CACHE_EXPIRATION_TIME int The expriation time for local cache (in seconds) The default is 10 days
MEDCAT_DEN_LOCAL_CACHE_MAX_SIZE int The maximum size of the cache in bytes The default is 100 GB
MEDCAT_DEN_LOCAL_CACHE_EVICTION_POLICY str The eviction policy for the local cache The default is LRU

When creating a den, the resolver will use the explicitly passed values first, and if none are provided, it will default to the ones defined in the environmental variables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medcat_den-0.2.1.tar.gz (33.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medcat_den-0.2.1-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file medcat_den-0.2.1.tar.gz.

File metadata

  • Download URL: medcat_den-0.2.1.tar.gz
  • Upload date:
  • Size: 33.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for medcat_den-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4fd6834a3efecbbafc4bcd40f13c53b26d34f39abe9c3b9e7966fb8a84cec7c9
MD5 566dd2639ecfe6d0c1f655f02e460886
BLAKE2b-256 66bb719c4edb6a7124fbc2ed79458da0db2430cfd4b4c20b65130bf070131fc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for medcat_den-0.2.1.tar.gz:

Publisher: medcat-den_release.yml on CogStack/cogstack-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medcat_den-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: medcat_den-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for medcat_den-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 574adda71ec902e22096f760a491b927132c9ede4ee943547772db347561388c
MD5 4d242bbecef0de7ccb30911e656d25b0
BLAKE2b-256 45e91473a150c7d86b00a8e3af0566805f8ce27b0f0eb4c89b3d7f97f9940f5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for medcat_den-0.2.1-py3-none-any.whl:

Publisher: medcat-den_release.yml on CogStack/cogstack-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page