Skip to main content

Open-source Python library providing a collection of Support Vector Machine (SVM) classifiers for multiple-instance learning (MIL): SVM, NSK, sMIL and sAwMIL.

Project description

Sparse Multiple-Instance Learning in Python

PyPI - Version PyPI - Python Version PyPI - Status GitHub License Documentation DOI

sAwMIL (Sparse Aware Multiple-Instance Learning) is an open-source Python library providing a collection of Support Vector Machine (SVM) classifiers for multiple-instance learning (MIL). It builds upon ideas from the earlier misvm package, adapting it for the latest Python version, as well as introducing new models.

In Single-Instance Learning (SIL), the dataset consists of pairs of an instance and a label:

$$ \langle \mathbf{x}_i, y_i \rangle \text{ , where } \mathbf{x}_i \in \mathbb{R}^{d} \text{ and } y_i \in \mathcal{Y}. $$

In binary settings, the label is $y \in {0,1}$. To solve this problem, we can use a standard SVM model.

In Multiple-Instance Learning (MIL), the dataset consists of bags of instances paired with a single bag-level label:

$$ \langle \mathbf{X}i, y_i \rangle \text{ , where } \mathbf{X}i = { \mathbf{x}{1}, \mathbf{x}{2}, ..., \mathbf{x}_{n_i} }, \mathbf{x}_j \in \mathbb{R}^{d} \text{ and } y_i \in \mathcal{Y}. $$

To solve this problem, we can use NSK or sMIL models.

In some cases, each bag, along with the instances and a label, could contain a intra-bag mask that specifies which items are likely to contain the signal related to $y$. In that case, we have a triplet of $\langle \mathbf{X}_i, \mathbf{M}_i, y_i \rangle$, where

$$ \mathbf{M}i = {m_1, m_1,... m{n_i}}, \text{ where } m_j \in {0,1}. $$

To solve this problem, one can use the sAwMIL model.

Installation

sawmil supports three QP backends:

By default, the base package installs without any solver; pick one (or both) via extras.

Base package (no solver)

pip install sawmil
# it installs numpy>=1.22 and scikit-learn>=1.7.0

Option 1: Gurobi backend

Gurobi is commercial software. You’ll need a valid license (academic or commercial), refer to the official website.

pip install "sawmil[gurobi]"
# in additionl to the base packages, it install gurobi>12.0.3

Option 2: OSQP backend

pip install "sawmil[osqp]"
# in additionl to the base packages, it installs osqp>=1.0.4 and scipy>=1.16.1

Option 3: DAQP backend

pip install "sawmil[daqp]"
# in additionl to the base packages, it installs daqp>=0.5 and scipy>=1.16.1

Option 4 — All supported solvers

pip install "sawmil[full]"

Picking the solver in code

from sawmil import SVM, RBF

k = RBF(gamma = 0.1)
# solver= "osqp" (default is "gurobi")
# SVM is for single-instances 
clf = SVM(C=1.0, 
          kernel=k, 
          solver="osqp").fit(X, y)

Quick start

1. Generate Dummy Data

from sawmil.data import generate_dummy_bags
import numpy as np
rng = np.random.default_rng(0)

ds = generate_dummy_bags(
    n_pos=300, n_neg=100, inst_per_bag=(5, 15), d=2,
    pos_centers=((+2,+1), (+4,+3)),
    neg_centers=((-1.5,-1.0), (-3.0,+0.5)),
    pos_scales=((2.0, 0.6), (1.2, 0.8)),
    neg_scales=((1.5, 0.5), (2.5, 0.9)),
    pos_intra_rate=(0.25, 0.85),
    ensure_pos_in_every_pos_bag=True,
    neg_pos_noise_rate=(0.00, 0.05),
    pos_neg_noise_rate=(0.00, 0.20),
    outlier_rate=0.1,
    outlier_scale=8.0,
    random_state=42,
)

2. Fit NSK with RBF Kernel

Load a kernel:

from sawmil.kernels import get_kernel, RBF
k1 = get_kernel("rbf", gamma=0.1)
k2 = RBF(gamma=0.1)
# k1 == k2

Fit NSK Model:

from sawmil.nsk import NSK

clf = NSK(C=1, kernel=k, 
          # bag kernel settings
          normalizer='average',
          # solver params
          scale_C=True, 
          tol=1e-8, 
          verbose=False).fit(ds, None)
y = ds.y
print("Train acc:", clf.score(ds, y))

3. Fit sMIL Model with Linear Kernel

from sawmil.smil import sMIL

k = get_kernel("linear") # base (single-instance kernel)
clf = sMIL(C=0.1, 
           kernel=k, 
           scale_C=True, 
           tol=1e-8, 
           verbose=False).fit(ds, None)

See more examples in the example.ipynb notebook.

4. Fit sAwMIL with Combined Kernels

from sawmil.kernels import Product, Polynomial, Linear, RBF, Sum, Scale
from sawmil.sawmil import sAwMIL

k = Sum(Linear(), 
        Scale(0.5, 
              Product(Polynomial(degree=2), RBF(gamma=1.0))))

clf = sAwMIL(C=0.1, 
             kernel=k,
             solver="gurobi", 
             eta=0.95) # here eta is high, since all items in the bag are relevant
clf.fit(ds)
print("Train acc:", clf.score(ds, ds.y))

Citation

If you use sawmil package in academic work, please cite:

Savcisens, G. & Eliassi-Rad, T. sAwMIL: Python package for Sparse Multiple-Instance Learning (2025).

@software{savcisens2025sawmil,
  author = {Savcisens, Germans and Eliassi-Rad, Tina},
  title = {sAwMIL: Python package for Sparse Multiple-Instance Learning},
  year = {2025},
  doi = {10.5281/zenodo.16990499},
  url = {https://github.com/carlomarxdk/sawmil}
}

If you want to reference a specific version of the package, find the correct DOI here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sawmil-0.1.11.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sawmil-0.1.11-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file sawmil-0.1.11.tar.gz.

File metadata

  • Download URL: sawmil-0.1.11.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sawmil-0.1.11.tar.gz
Algorithm Hash digest
SHA256 730d85e2f15d17884346801d757d14902c568f0cdafadf41b4b8fc8ca3195e25
MD5 54c681c9c608b32a628cd4e7a399ae74
BLAKE2b-256 f4a324798ccfc555b5879aaddb165834ffb004d95d6ffebd9320367c91d849ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for sawmil-0.1.11.tar.gz:

Publisher: publish.yml on carlomarxdk/sawmil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sawmil-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: sawmil-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sawmil-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 e156a404e2e7a7e012e14312af4d536bf4498f16a12fb5999fa082d12a721143
MD5 95147177b2bd81ab994b77e22af5f044
BLAKE2b-256 f9adc6d54548aec1b07dad11d7293c904a84d3110535fd46772499b6367e095c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sawmil-0.1.11-py3-none-any.whl:

Publisher: publish.yml on carlomarxdk/sawmil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page