modular models for efficient ML development
Project description
modmod
modmod is a library for making Mod-ular Mod-els. The primary problem that modmod solves is how to load models at runtime without instantiating them multiple times; in that respect, it is essentially a dependency injection system for models.
Installation
To use modmod, just install it with your package manager in the usual way. If you use Pipenv, you can copy/paste this:
pipenv install modmod
Usage
There are two main pieces of modmod: Models and Pools.
A Pool
is a container for models. A Model
can be treated like an augmented
function which is a Model
factory.
Here's an example of defining the simplest possible model:
from modmod.model import Model
class AddThings(Model):
def call(self, x: int, y: int) -> int:
return x + y
And here is how you would use it:
import modmod.pool
pool = modmod.pool.get()
adder = pool.get(AddThings)
z = adder(1, 2)
print(z) # prints 3
You can also take a shortcut to get the model:
adder = AddThings.get()
However, this should never be done inside a model, bceause it will use the default pool and will have strange side effects if anyone tries to use your model in a non-default pool.
Models with initialization
Sometimes a model needs to be initialized to load in data or do other one-time
startup tasks. To do this, you just override the constructor and the create
method. Here's an example for stripping stopwords:
import nltk
from modmod.model import Model
class RemoveStopwords(Model):
def __init__(self, pool: Pool, config: Dict[str, Any], stopwords: List[str]) -> None:
super().__init__(pool, config)
self.stopwords = stopwords
@classmethod
def create(cls, pool: Pool, config: Dict[str, Any]) -> 'RemoveStopwords':
nltk.download('stopwords')
stopwords = nltk.corpus.stopwords.words('english')
stopwords.append('')
stopwords.remove('not')
stopwords.remove('no')
return RemoveStopwords(pool, config, stopwords)
def call(self, words: List[str]) -> List[str]:
return list(filter(lambda w: w not in self.stopwords, words))
The create
method is invoked when you call RemoveStopwords.get()
. It is
only called the first time you get a model; after that, the created model
lives in the pool, and it will not be re-initialized.
Why are __init__
and create
both required? This is a good question.
The reason comes down to configurability and use in testing environments.
In the example above, if you wanted to experiment with a new list of
stopwords, you could use the constructor to create a model with that list and
then add it into the pool:
pool = modmod.pool.get('stopwords-experiment')
config = {}
remove_new_stopwords = RemoveStopwords(pool, config, ['stop', 'word', 'list'])
pool.add_model(remove_new_stopwords, RemoveStopwords)
Once it's added to the pool, any calls to
RemoveStopwords.get('stopwords-experiment')
will find and retrieve the
manually created model.
Note: create
is generally overridden if you have to do a heavy operation,
like downloading a file or reading in some data. If you are just using the pool
and the config object, it's perfectly acceptable to override __init__
and
leave the default behavior for create
.
Configuring the pool
Every model gets configuration passed into them, and this comes from the pool. So, if you need configuration, you need to configure the pool.
Note: the pool must be configured before you get any models, since configuring it overwrites the existing pool.
To configure the default pool:
import modmod.pool
config = {'opt1': 2}
modmod.pool.configure(config)
Non-default Pools
Sometimes you will want separate pools for separate tasks. One example of this is for unit testing: you may want to test with multiple configurations of the model. To do this, you can use separate pools.
The first step is to configure the pool:
import modmod.pool
poolname = 'my-pool'
config = {'opt1': 2}
modmod.pool.configure(config, poolname)
The second step is just to use the pool!
import modmod.pool
pool = modmod.pool.get('my-pool')
adder = pool.get(AddThings)
# Equivalent:
adder = AddThings.get('my-pool')
Roadmap
We have a few initiatives on the roadmap. Each of these will be a version bump:
- Add support for data and model versioning, add support for model training
- Add hooks for profiling, debugging, caching
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file modmod-0.2.5.tar.gz
.
File metadata
- Download URL: modmod-0.2.5.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
01f8f913334924daefd831aef6e8e0322177f05c5e176116e6d348ec1f73cd01
|
|
MD5 |
3df88ca23f60d65f585f7a806ea8cda9
|
|
BLAKE2b-256 |
e2a6fc1aa6d794efd0121c41ef88013b51128d8a93141a4d09441a8708a56150
|
File details
Details for the file modmod-0.2.5-py3-none-any.whl
.
File metadata
- Download URL: modmod-0.2.5-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
20b0e23032fd4b7d73ca6f67085d08f094d7ba18dbaa96c3312caf55f65ba505
|
|
MD5 |
8aca5d7353726fe6c2fbcf1a57af0c33
|
|
BLAKE2b-256 |
67db600b0e3ad735b628fb3c752a738bc1ffa7b039198cf9b5c65fe76810e63c
|