The torchosr module is a set of tools necessary for processing open set recognition problems with pytorch.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

torchosr

The torchosr module is a set of tools for Open Set Recognition in Python, compatible with PyTorch library.

Package documentation can be accessed through readthedocs

Citation policy

If you use torchosr in a scientific publication, we would appreciate citation to the following article:

@misc{komorniczak2023torchosr,
      title={torchosr -- a PyTorch extension package for Open Set Recognition models evaluation in Python}, 
      author={Joanna Komorniczak and Pawel Ksieniewicz},
      year={2023},
      eprint={2305.09646},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Quick start guide

Installation

To use the torchosr package, it will be absolutely useful to install it. Fortunately, it is available in the PyPI repository, so you may install it using pip:

pip install torchosr

In case of necessity to expand it with functions that it does not yet include, it is also possible to install the module directly from the source code. If any modifications are introduced, they propagate to the module currently available to the environment.

git clone https://github.com/w4k2/torchosr.git
cd torchosr
make install

Minimal processing example

The torchosr package can be imported in the standard Python manner.

# Importing torchosr
import torchosr

The code below allows loading the MNIST_base dataset.

# Import transforms for pre-processing
from torchvision import transforms

# Load MNIST dataset
data = torchosr.data.base_datasets.MNIST_base(root = 'data', download = True, transform = transforms.Compose([transforms.Resize(28),transforms.ToTensor()]))

> Dataset MNIST_base
> Number of datapoints: 70000
> Root location: data

Then, for the loaded file, the configure_division function will generate configurations for derived OSR datasets. The sample code generates nine configurations - three class assignments for three Openness each.

# Generate OSR problem configurations
config, openness = torchosr.data.configure_division(data, n_openness = 3, repeats = 3, seed = 1234)
# Print configurations
for i, (kkc, uuc) in enumerate(config):
    print('C%i - Op: %.3f KKC:%s \t UUC:%s' % (
        i, 
        openness[int(i/3)].detach().numpy(), 
        kkc.detach().numpy(), 
        uuc.detach().numpy()))

> C0 - Op: 0.047 KKC:[0 1 7 9 3] 	 UUC:[6]
> C1 - Op: 0.047 KKC:[6 4 9 2 7] 	 UUC:[1]
> C2 - Op: 0.047 KKC:[1 6 7 0 5] 	 UUC:[9]
> C3 - Op: 0.225 KKC:[8 4 5] 	     UUC:[3 1 9 6]
> C4 - Op: 0.225 KKC:[9 7 4] 	     UUC:[2 5 0 8]
> C5 - Op: 0.225 KKC:[0 4 2] 	     UUC:[3 9 6 8]
> C6 - Op: 0.397 KKC:[3 6] 	         UUC:[9 1 4 0 8 7 5]
> C7 - Op: 0.397 KKC:[2 5] 	         UUC:[9 8 6 3 1 4 7]
> C8 - Op: 0.397 KKC:[4 1] 	         UUC:[3 0 5 9 2 7 8]

The next step is determining the actual training and test set for the evaluation. The get_train_test method will be used for this from data modules. In the example code, the division was made for the first of the nine generated configurations and the first of the five folds.

# Import DataLoader
from torch.utils.data import DataLoader

# Select KKC and UUC from configuration
kkc, uuc = config[0]

# Get training and testing data for first out of 5 folds
train_data, test_data = torchosr.data.get_train_test(data, kkc, uuc, root = 'data', tunning = False, fold = 0, n_folds = 5, seed = 1234)

# Create DataLoaders
train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True)

For the purpose of presentation, labels of objects located in the training and test data loaders were displayed. By default, labels are transformed using the one-hot encoder. In the test subset, the last label represents objects of an unknown class. The classes have been re-indexed in both subsets so that their labels are consecutive integers.

import numpy as np

# Load first batch of Train data and print unique labels
X, y = next(iter(train_data_loader))
print('Train labels:', np.unique(np.argmax(y, axis=1)))

# Load first batch of Test data and print unique labels
X, y = next(iter(test_data_loader))
print('Test labels:', np.unique(np.argmax(y, axis=1)))

> Train labels: [0 1 2 3 4]
> Test labels: [0 1 2 3 4 5]

The method of initializing the TSoftmax method is presented below. The simplest architecture available in the package (consisting only of fully connected layers) was used. The depth and img_size_x parameters describe the dimensions of the images in the MNIST set. The epsilon parameter was determined using a method available in the Utils module, which returns a suboptimal parameter value for a given KKC cardinality.

# Initialize lower stack
ls = torchosr.architectures.fc_lower_stack(depth=1, img_size_x=28, n_out_channels=64)

# Get epsilon parameter for given number of KKC
epsilon = torchosr.utils.base.get_softmax_epsilon(len(kkc))

# Initialize method
method = torchosr.models.TSoftmax(lower_stack=ls, n_known=len(kkc), epsilon=epsilon)

It is possible to further proceed with evaluation of the model for the given data. In the example, the number of epochs and the learning rate were defined, a table for the results from subsequent epochs was created, and the loss function and optimizer were defined. In a loop, for each epoch, the training and testing procedure was carried out. The values returned by the test method (Inner, Outer, Halfpoint and Overall scores, respectively) were saved to the table.

import torch

# Specify processing parameters
epochs = 128
learning_rate = 1e-3

# Prepare array for results
results = torch.zeros((4,epochs))

# Initialize loss function
loss_fn = torch.nn.CrossEntropyLoss()

# Initialize optimizer
optimizer = torch.optim.SGD(method.parameters(), lr=learning_rate)

for t in range(epochs):
    # Train
    method.train(train_data_loader, loss_fn, optimizer)
    
    # Test
    inner_score, outer_score, hp_score, overall_score = method.test(test_data_loader, loss_fn)
    results[:, t] = torch.tensor([inner_score, outer_score, hp_score, overall_score])

The results of the single processing can be visualized using matplotlib library. The output of code presented below is shown in Figure.

import matplotlib.pyplot as plt

# Present results
fig, ax = plt.subplots(1,1,figsize=(10,4))
ax.plot(results.T, label=['Inner', 'Outer', 'Halfpoint', 'Overall'])
ax.legend()
ax.grid(ls=':')
ax.set_xlabel('epochs')
ax.set_ylabel('Balanced accurracy')
ax.set_xlim(0,epochs)

Example results

During the test procedure, one can also request a confusion matrix by using the conf flag in the test routine.

# Call of test method with conf flag
inner_score, outer_score, hp_score, overall_score, \
    inner_c, outer_c, hp_c, overall_c = method.test(test_data_loader, loss_fn, conf=True)

# Print overall confusion matrix
print(overall_c.detach().numpy())

>  [[1244,    2,    1,    1,    3,   12],
    [   1, 1406,    6,    2,    1,   12],
    [   1,    2, 1240,    6,    3,   25],
    [   0,    5,    5, 1303,    7,   25],
    [   5,    4,   18,    7, 1206,   22],
    [ 367,  111,   76,   14,  250,  411]]

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.5

Jan 28, 2024

0.1.4

Jan 23, 2024

0.1.3

Jan 19, 2024

0.1.1

Oct 18, 2023

0.1.0

Oct 4, 2023

0.0.1

May 8, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchosr-0.1.5.tar.gz (39.2 kB view hashes)

Uploaded Jan 28, 2024 Source

Built Distribution

torchosr-0.1.5-py3-none-any.whl (50.4 kB view hashes)

Uploaded Jan 28, 2024 Python 3

Hashes for torchosr-0.1.5.tar.gz

Hashes for torchosr-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`c505fc10c6482ff5f8acb7ff521fb0e1c3f0a54b10f189cffcef5983fa173441`
MD5	`8dd44572bf40f3e7efe77727a4506bbe`
BLAKE2b-256	`3fb3cbce1d5a95434162f0cefc420150e7dd9e8e6fd8d7b576d782296df3395e`

Hashes for torchosr-0.1.5-py3-none-any.whl

Hashes for torchosr-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8710a468642ec4a9b19da9ddba5c75ea4d384e0818c365f69fb7265e082fe383`
MD5	`76ea5914d34fc1f1dc989f6a24ce665a`
BLAKE2b-256	`4322e54ee86d3c92a72c19b0f74f4fcdf231dba19fa30a97db522d3d994b33c2`