Skip to main content

modular models for efficient ML development

Project description

modmod

modmod is a library for making Mod-ular Mod-els. The primary problem that modmod solves is how to load models at runtime without instantiating them multiple times; in that respect, it is essentially a dependency injection system for models.

Installation

To use modmod, just install it with your package manager in the usual way. If you use Pipenv, you can copy/paste this:

pipenv install modmod

Usage

There are two main pieces of modmod: Models and Pools.

A Pool is a container for models. A Model can be treated like an augmented function which is a Model factory.

Here's an example of defining the simplest possible model:

from modmod.model import Model

class AddThings(Model):
    def call(self, x: int, y: int) -> int:
        return x + y

And here is how you would use it:

import modmod.pool

pool = modmod.pool.get()

adder = pool.get(AddThings)

z = adder(1, 2)
print(z) # prints 3

You can also take a shortcut to get the model:

adder = AddThings.get()

However, this should never be done inside a model, bceause it will use the default pool and will have strange side effects if anyone tries to use your model in a non-default pool.

Models with initialization

Sometimes a model needs to be initialized to load in data or do other one-time startup tasks. To do this, you just override the constructor and the create method. Here's an example for stripping stopwords:

import nltk
from modmod.model import Model

class RemoveStopwords(Model):
  def __init__(self, pool: Pool, config: Dict[str, Any], stopwords: List[str]) -> None:
    super().__init__(pool, config)
    self.stopwords = stopwords

  @classmethod
  def create(cls, pool: Pool, config: Dict[str, Any]) -> 'RemoveStopwords':
    nltk.download('stopwords')
    stopwords = nltk.corpus.stopwords.words('english')
    stopwords.append('')
    stopwords.remove('not')
    stopwords.remove('no')
    return RemoveStopwords(pool, config, stopwords)

  def call(self, words: List[str]) -> List[str]:
    return list(filter(lambda w: w not in self.stopwords, words))

The create method is invoked when you call RemoveStopwords.get(). It is only called the first time you get a model; after that, the created model lives in the pool, and it will not be re-initialized.

Why are __init__ and create both required? This is a good question. The reason comes down to configurability and use in testing environments. In the example above, if you wanted to experiment with a new list of stopwords, you could use the constructor to create a model with that list and then add it into the pool:

pool = modmod.pool.get('stopwords-experiment')
config = {}

remove_new_stopwords = RemoveStopwords(pool, config, ['stop', 'word', 'list'])
pool.add_model(remove_new_stopwords, RemoveStopwords)

Once it's added to the pool, any calls to RemoveStopwords.get('stopwords-experiment') will find and retrieve the manually created model.

Note: create is generally overridden if you have to do a heavy operation, like downloading a file or reading in some data. If you are just using the pool and the config object, it's perfectly acceptable to override __init__ and leave the default behavior for create.

Configuring the pool

Every model gets configuration passed into them, and this comes from the pool. So, if you need configuration, you need to configure the pool.

Note: the pool must be configured before you get any models, since configuring it overwrites the existing pool.

To configure the default pool:

import modmod.pool

config = {'opt1': 2}

modmod.pool.configure(config)

Non-default Pools

Sometimes you will want separate pools for separate tasks. One example of this is for unit testing: you may want to test with multiple configurations of the model. To do this, you can use separate pools.

The first step is to configure the pool:

import modmod.pool

poolname = 'my-pool'
config = {'opt1': 2}

modmod.pool.configure(config, poolname)

The second step is just to use the pool!

import modmod.pool

pool = modmod.pool.get('my-pool')

adder = pool.get(AddThings)
# Equivalent:
adder = AddThings.get('my-pool')

Roadmap

We have a few initiatives on the roadmap. Each of these will be a version bump:

  • Add support for data and model versioning, add support for model training
  • Add hooks for profiling, debugging, caching

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modmod-0.2.5.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

modmod-0.2.5-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file modmod-0.2.5.tar.gz.

File metadata

  • Download URL: modmod-0.2.5.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for modmod-0.2.5.tar.gz
Algorithm Hash digest
SHA256 01f8f913334924daefd831aef6e8e0322177f05c5e176116e6d348ec1f73cd01
MD5 3df88ca23f60d65f585f7a806ea8cda9
BLAKE2b-256 e2a6fc1aa6d794efd0121c41ef88013b51128d8a93141a4d09441a8708a56150

See more details on using hashes here.

File details

Details for the file modmod-0.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for modmod-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 20b0e23032fd4b7d73ca6f67085d08f094d7ba18dbaa96c3312caf55f65ba505
MD5 8aca5d7353726fe6c2fbcf1a57af0c33
BLAKE2b-256 67db600b0e3ad735b628fb3c752a738bc1ffa7b039198cf9b5c65fe76810e63c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page