Calculating exact and approximate confidence and information metrics for deep learning on general purpose and chemistry tasks.

These details have not been verified by PyPI

Project links

Project description

🧐 duvida

GitHub Workflow Status (with branch) PyPI - Python Version PyPI

duvida (Portuguese for doubt) is a suite of python tools for calculating confidence and information metrics for deep learning. It provides lower-level function transforms for exact and approximate Hessian diagonals in JAX and pytorch, as well as a higher-level framework for calculating confidence and information metrics of geenral purpose and chemistry-specific neural networks.

As a bonus, duvida also provides an easy command-line interface for training and testing models.

Installation
Python API
- Neural networks
- Exact and approximate Hessian diagonals
More advanced API
Command-line interface
Issues, problems, suggestions
Documentation

Installation

The easy way

You can install the precompiled version directly using pip. You need to specify the machine learning framework that you want to use:

$ pip install duvida[jax]
# or
$ pip install duvida[jax_cuda12]  # for JAX installing CUDA 12 for GPU support
# or
$ pip install duvida[jax_cuda12_local]  # for JAX using a locally-installed CUDA 12
# or
$ pip install duvida[torch]

If you want to use duvida for chemistry machine learning and AI (using the pytorch backend), use:

$ pip install duvida[chem]

We have implemented JAX and pytorch functional transformations for approximate and exact Hessian diagonals, and doubtscore and information sensitivity. These can be used with JAX- and pytorch-based frameworks.

At the moment, training and inference of full models in ModelBox objects is implemented only in pytorch.

From source

Clone the repository, then cd into it. Then run:

$ pip install -e .[torch]

Python API

Neural networks

The core of duvida is the ModelBox, which is a container for a trainable model and its training data. These are connected because measures of confidence and information gain depend directly on the information or evidence already seen by the model.

There are several ModelBox classes for specific deep learning architechtures in pytorch.

>>> from duvida.torch.models import _MODEL_CLASSES
>>> from pprint import pprint
>>> pprint(_MODEL_CLASSES)
{'chemprop': <class 'duvida.torch.chem.ChempropModelBox'>,
 'fingerprint': <class 'duvida.torch.chem.FPMLPModelBox'>,
 'fp': <class 'duvida.torch.chem.FPMLPModelBox'>,
 'mlp': <class 'duvida.torch.models.mlp.MLPModelBox'>}

The modelboxes chemprop and fingerprint (alias fp) featurize SMILES representations of chemical structures. The modelbox mlp is a general purpose multilayer perceptron.

You can set up your model with various training parameters.

from duvida.autoclass import AutoClass
modelbox = AutoClass(
    "fingerprint",
    n_units=16,
    n_hidden=2,
    ensemble_size=10,
)

The internal neural network is instantiated on loading training data.

modelbox.load_training_data(
    filename="hf://scbirlab/fang-2023-biogen-adme@scaffold-split:train",
    inputs="smiles",
    labels="clogp",
)

The filename can be a Huggingface dataset, in which case it is automatically downloaded. The "@" indicates the dataset configuration, and the ":" indicates the specific data split.

Alternatively, the training data can be a local CSV or TSV file. In-memory Pandas dataframes or dictionaries can be supplied through the data argument.

With training data loaded, the model can be trained!

modelbox.train(
    val_filename="hf://scbirlab/fang-2023-biogen-adme@scaffold-split:test",
    epochs=10,
    batch_size=128,
)

The ModelBox.train() method uses pytorch Lightning under the hood, so other options such as callbacks for this framework should be accepted.

Saving and sharing a trained model

duvida provides a basic checkpointing mechanism to save model weights and training data to later reload.

modelbox.save_checkpoint("checkpoint.dv")
modelbox.load_checkpoint("checkpoint.dv")

Evaluating and predicting on new data

duvida ModelBoxes provide methods for evaluating predictions on new data.

from duvida.evaluation import rmse, pearson_r, spearman_r
predictions, metrics = modelbox.evaluate(
    filename="hf://scbirlab/fang-2023-biogen-adme@scaffold-split:test",
    metrics={
        "RMSE": rmse, 
        "Pearson r": pearson_r, 
        "Spearman rho": spearman_r
    },
)

Calculating uncertainty and information metrics

duvida ModelBoxes provide methods for calculating prediction variance of ensembles, doubtscore, and information sensitivity.

doubtscore = modelbox.doubtscore(
    filename="hf://scbirlab/fang-2023-biogen-adme@scaffold-split:test"
)
info_sens = modelbox.information_sensitivity(
    filename="hf://scbirlab/fang-2023-biogen-adme@scaffold-split:test",
    approx="bekas",  # approximate Hessian diagonals
    n=10,
)

To avoid storing large datasets in memory, duvida uses Huggingface datasets under the hood to cache data. Results can be instantiated in memory with a little effort. For example:

doubtscore = doubtscore.to_pandas()

See the Huggingface datasets documentation for more.

Exact and approximate Hessian diagonals

duvida provides functional transforms for JAX and pytorch that calculate either exact or approximate Hessian diagonals.

You can check which backend you're using:

>>> from duvida.stateless.config import config
>>> config
Config(backend='jax', precision='double', fallback=True)

It can be changed:

>>> config.set_backend("torch")
'torch'
>>> config
Config(backend='torch', precision='double', fallback=True)

Now you can calculate exact Hessian diagonals without calculating the full matrix:

>>> from duvida.stateless.utils import hessian
>>> import duvida.stateless.numpy as dnp 
>>> f = lambda x: dnp.sum(x ** 3. + x ** 2. + 4.)
>>> a = dnp.array([1., 2.])
>>> exact_diagonal(f)(a) == dnp.diag(hessian(f)(a))
Array([ True,  True], dtype=bool)

Various approximations are also allowed.

>>> from duvida.stateless.hessians import get_approximators
>>> get_approximators()  # No arguments to list available
('squared_jacobian', 'exact_diagonal', 'bekas', 'rough_finite_difference')

Now apply:

>>> approx_hessian_diag = get_approximators("bekas")
>>> g = lambda x: dnp.sum(dnp.sum(x) ** 3. + x ** 2. + 4.)
>>> a = dnp.array([1., 2.])
>>> dnp.diag(hessian(g)(a))  # Exact for reference
Array([38., 38.], dtype=float64)
>>> approx_hessian_diag(g, n=1000)(a)  # Less accurate when parameters interact
Array([38.52438307, 38.49679655], dtype=float64)
>>> approx_hessian_diag(g, n=1000, seed=1)(a)  # Change the seed to alter the outcome
Array([39.07878869, 38.97796601], dtype=float64)

More advanced Python API: Implementing a new `ModelBox`

Bringing a new pytorch model to duvida is relatively straightforward. First, write your model, adding Lighning logic and a create_model() method:

from typing import Callable, Iterable, List, Mapping, Optional

from torch.nn import BatchNorm1d, Dropout, Linear, Module, SiLU, Sequential
from duvida.torch.models.ensemble import TorchEnsembleMixin
from duvida.torch.models.lt import LightningMixin
from torch.nn import Module
from torch.optim import Adam, Optimizer

class SimpleMLP(torch.nn.Module, LightningMixin):

    def __init__(
        self, 
        n_input: int, 
        n_units: int = 16, 
        n_out: int = 1,
        activation: Callable = torch.nn.SiLU,  # Smooth activation to prevent vanishing gradient
        learning_rate: float = .01,
        optimizer: Optimizer = Adam,
        *args, **kwargs
    ):
        super().__init__(*args, **kwargs)
        self.n_input = n_input
        self.n_units = n_units
        self.activation = activation
        self.n_out = n_out
        self.model_layers = torch.nn.Sequential([
            torch.nn.Linear(self.n_input, self.n_units),
            self.activation(),
            torch.nn.Linear(self.n_units, self.n_out),
        ])
        # Lightning logic
        self._init_lightning(
            optimizer=optimizer, 
            learning_rate=learning_rate, 
            model_attr='model_layers',  # the attribute containing the model
        )

    def forward(self, x):
        return self.model_layers(x)

Then subclass duvida.torch.nn.ModelBox and implement the create_model() method, which should simply return your instantiated model. If you want to preprocess input data on the fly, then add a preprocess_data() method which takes a data dictionary and returns a data dictionary.

from typing import Dict

from duvida.torch.nn import ModelBox
import numpy as np

class MLPModelBox(ModelBox):
    
    def __init__(self, *args, **kwargs):
        super().__init__()
        self._mlp_kwargs = kwargs

    def create_model(self, *args, **kwargs):
        return SimpleMLP(
            n_input=self.input_shape[-1],
            n_out=self.output_shape[-1], 
            *args, **kwargs,
            **self._mlp_kwargs,
        )

    # Define this method if your data needs preprocessing
    @staticmethod
    def preprocess_data(data: Dict[str, np.ndarray], _in_key, _out_key, **kwargs) -> Dict[str, np.ndarray]:
        return {
            _in_key: your_featurizer(data[_in_key]), 
            _out_key: np.asarray(data[_out_key])
        }

If the built-in ModelBoxes don't suit your needs, you can subclass the base_classes.ModelBoxBase abstract class, making sure to implement its abstract methods.

Command-line interface

duvida has a command-line interface for training and checkpointing the built-in models.

$ duvida --help

To train:

$ duvida train hf://scbirlab/fang-2023-biogen-adme@scaffold-split:train -2 hf://scbirlab/fang-2023-biogen-adme@scaffold-split:test --ensemble-size 10 --epochs 10 --learning-rate 0.001

You can read about all the options here:

$ duvida train --help

There is also a simple hyperparameter utility.

$ printf '{"model_class": "fingerprint", use_2d": [true, false], "n_units": 16, "n_hidden": 3}' | duvida hyperprep -o hyperopt.json

This generates a file containing the Cartesian product of the JSON items. It can be indexed (0-based) with the -i <int> option to supply a specific training configuration like so:

$ duvida train hf://scbirlab/fang-2023-biogen-adme@scaffold-split:train -2 hf://scbirlab/fang-2023-biogen-adme@scaffold-split:test -c hyperopt.json -i 0

Issues, problems, suggestions

Add to the issue tracker.

Documentation

(To come at ReadTheDocs.)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3.post1

Oct 15, 2025

0.0.3

Oct 15, 2025

0.0.2

Oct 4, 2025

This version

0.0.1

Jun 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duvida-0.0.1.tar.gz (66.1 kB view details)

Uploaded Jun 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duvida-0.0.1-py3-none-any.whl (79.5 kB view details)

Uploaded Jun 5, 2025 Python 3

File details

Details for the file duvida-0.0.1.tar.gz.

File metadata

Download URL: duvida-0.0.1.tar.gz
Upload date: Jun 5, 2025
Size: 66.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for duvida-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0855fbc6739ffd36d6ca3c402f69221bd5bcacecd27b06cdbf5f3991b3bd465d`
MD5	`8cdde5d7fa57ed47b4a01b7d6a55f435`
BLAKE2b-256	`5038c1b9da8a5133f8678d9693b0e0598cff35403ac712fff1356db3d6e9975f`

See more details on using hashes here.

File details

Details for the file duvida-0.0.1-py3-none-any.whl.

File metadata

Download URL: duvida-0.0.1-py3-none-any.whl
Upload date: Jun 5, 2025
Size: 79.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for duvida-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1cee491ed10271b4e0fa5f56ec470c8e12b24399b1b974d01af90f7ed88438b6`
MD5	`7dce6518f70c4520a03e01d9ac821729`
BLAKE2b-256	`343a165a0788774301256e1d444583de0b9da919531e71fda9d8ba165d262ab3`

See more details on using hashes here.

duvida 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧐 duvida

Installation

The easy way

From source

Python API

Neural networks

Saving and sharing a trained model

Evaluating and predicting on new data

Calculating uncertainty and information metrics

Exact and approximate Hessian diagonals

More advanced Python API: Implementing a new `ModelBox`

Command-line interface

Issues, problems, suggestions

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

duvida 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧐 duvida

Installation

The easy way

From source

Python API

Neural networks

Saving and sharing a trained model

Evaluating and predicting on new data

Calculating uncertainty and information metrics

Exact and approximate Hessian diagonals

More advanced Python API: Implementing a new ModelBox

Command-line interface

Issues, problems, suggestions

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

More advanced Python API: Implementing a new `ModelBox`