Skip to main content

machine learning package

Project description


XCurveLearn: Machine Learning with X-Curve Metrics

Please visit the website for more details on XCurveLearn!


Latest News

  • (New!) 2022.6: The XCurveLearn-v1.0.0 has been released! Please Try now!

Introduction

In recent years, Machine Learning (ML) has achieved significant advances in many domains, such as image recognition, machine translation, and biological information processing, promoting AI development. However, despite great success, it is well-known that the data often exhibits a long-tailed/imbalanced property in real-world applications, posing a critical challenge for the practical performances of deployed ML algorithms. Why? This is because the current studies are mainly established by minimizing accuracy (or cross-entropy) criteria, and then one needs to figure out a decision threshold to determine the category of samples on top of their prediction scores. In practice, such limited consideration of the decision threshold cannot adapt to the changes in data distributions and the growing business requirements, leading to unsatisfactory performance in real-world applications.

To overcome this, XCurveLearn focuses on the design criteria of the objective function for ML tasks, which can be formulated as a series of X-metric (say AUROC, AUPRC, AUTKC) optimization problems considering the average performance of all decision thresholds during the training phase.

To better understand how the XCurveLearn achieves such a goal, let us take AUROC as an example in a high-level manner, as shown in the following figure:

Advantages of XCurveLearn

......

Wide Real-World Applications

There is a wide range of applications for XCurveLearn in the real world, especially the data following a long-tailed/imbalanced distribution. Several cases are listed below:

Supported Curves in XCurveLearn

X-Curve Description
XCurveLearn.AUROC an efficient optimization library for Area Under the ROC curve (AUROC), such as multi-class AUROC and partial AUROC optimization.
... ...

***More X-Curves are stepping up the development. Please stay tuned! ***

Installation

You can get XCurveLearn by

pip install XCurveLearn

Quickstart

Let us take the multi-class AUROC optimization as an example curve here. Detailed tutorial could be found in the website (https://XCurveLearn.org.cn).

'''
We refer the reader to see our paper <Learning with Multiclass AUC: Theory and Algorithms>
if they are interested in the technical details of this example. 
'''
import torch
from easydict import EasyDict as edict

# import loss of AUROC
from XCurveLearn.AUROC.losses import SquareAUCLoss

# import optimier (or one can use any optimizer supported by PyTorch)
from XCurveLearn.AUROC.optimizer import SGD

# create model or you can adopt any DNN models by Pytorch
from XCurveLearn.AUROC.models import generate_net

# set params to create model
args = edict({
    "model_type": "resnet18", # (support resnet18,resnet20, densenet121 and mlp)
    "num_classes": 2,
    "pretrained": None
})
model = generate_net(args).cuda()

num_classes = 2
# create optimizer
optimizer = SGD([params of your model], lr=...)

# create loss criterion
criterion = SquareAUCLoss(
    num_classes=num_classes, # number of classes
    gamma=1.0, # safe margin
    transform="ovo" # the manner of computing the multi-classes AUROC Metric ('ovo' or 'ova').
)

# create Dataset (train_set, val_set, test_set) and dataloader (trainloader)
# You can construct your own dataset/dataloader 
# but must ensure that there at least one sample for every class in each mini-batch 
# to calculate the AUROC loss. Or, you can do this:
from XCurveLearn.AUROC.dataloaders import get_datasets
from XCurveLearn.AUROC.dataloaders import get_data_loaders

# set dataset params, see our doc. for more details.
dataset_args = edict({
    "data_dir": "...",
    "input_size": [32, 32],
    "norm_params": {
        "mean": [123.675, 116.280, 103.530],
        "std": [58.395, 57.120, 57.375]
        },
    "use_lmdb": True,
    "resampler_type": "None",
    "sampler": { # only used for binary classification
        "rpos": 1,
        "rneg": 10
        },
    "npy_style": True,
    "aug": True, 
    "class2id": { # positive (minority) class idx
        "1": 1
    }
})

train_set, val_set, test_set = get_datasets(dataset_args)
trainloader, valloader, testloader = get_data_loaders(
    train_set,
    val_set,
    test_set,
    train_batch_size=32,
    test_batch_size =64
)
# Note that, in the get_datasets(), we conduct stratified sampling for train_set  
# using the StratifiedSampler at from XCurveLearn.AUROC.dataloaders import StratifiedSampler

# forward of model
for x, target in trainloader:

    x, target  = x.cuda(), target.cuda()
    # target.shape => [batch_size, ]
    # Note that we ask for the prediction of the model among [0,1] 
    # for any binary (i.e., sigmoid) or multi-class (i.e., softmax) AUROC optimization.

    pred = model(x) # [batch_size, num_classess] when num_classes > 2, o.w. output [batch_size, ] 

    loss = criterion(pred, target)
    
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Contact & Contribution

If you find any issues or plan to contribute back bug-fixes, please contact us by Shilong Bao (Email: baoshilong@iie.ac.cn) or Zhiyong Yang (Email: yangzhiyong21@ucas.ac.cn)

*** The authors appreciate all contributions!***

Citation

Please cite our paper if you use this library in your own work:

@inproceedings{DBLP:conf/icml/YQBYXQ, 
author    = {Zhiyong Yang, Qianqian Xu, Shilong Bao, Yuan He, Xiaochun Cao and Qingming Huang},
  title     = {When All We Need is a Piece of the Pie: A Generic Framework for Optimizing Two-way Partial AUC},
  booktitle = {ICML},
  pages     = {11820--11829},
  year      = {2021}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

XCurveLearn-0.0.3.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

XCurveLearn-0.0.3-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file XCurveLearn-0.0.3.tar.gz.

File metadata

  • Download URL: XCurveLearn-0.0.3.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for XCurveLearn-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2a2dda295c1a554ab3c1b67db18da2b3868f2e56af17ee8d58e62a7d6b146338
MD5 1f61882743cef3bbbfec40279e0364a0
BLAKE2b-256 5c391e5810747856650338ffd1a03a17c2d85065a9307f07b9b4ab562596b9cf

See more details on using hashes here.

File details

Details for the file XCurveLearn-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: XCurveLearn-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 38.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for XCurveLearn-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 987429923695a73ed7870f965c9a83fe1eae3d479745ca9d25adf11b17224fd4
MD5 83e299cf98c51fd049f2ee5d37ef8e0f
BLAKE2b-256 f75be014b7559fcf5cc6d69b132aaa90f5a9fafb7b6684ed5868da2bd080bb61

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page