Machine Learning algorithms implemented with numpy

# carefree-ml

`carefree-ml` implemented Machine Learning algorithms with numpy, mainly for educational use

## Installation

`carefree-ml` requires Python 3.6 or higher.

```git clone https://github.com/carefree0910/carefree-ml.git
cd carefree-ml
pip install -e .
```

## Basic Usage

See `tests/usages/basic.py` for more example

```from cfml import *
from cfdata.tabular import TabularDataset

# fetch dataset
boston = TabularDataset.boston()
# make a model
lr = Base.make("linear_regression")
# fit the model
lr.fit(*boston.xy)
# plot loss curve
lr.plot_loss_curve()
# make predictions
predictions = lr.predict(boston.x)
```

...or use methods chaining

```import os
from cfml import *
from cfdata.tabular import *

# fetch dataset
prices_file = os.path.join("tests", "datasets", "prices.txt")
# one liner
Base.make("linear_regression").fit(*prices.xy).visualize1d(*prices.xy).plot_loss_curve()
```

## Supported Algorithms

• 1-dimensional polynomial fit (`np.polyfit`)
• Linear Models (Linear Regression, Logistic Regression, Linear SVC, Linear SVR)
• Naive Bayes (Multinomial NB, Gaussian NB)
• Support Vector Machine (SVC, SVR)
• Fully Connected Neural Network (FCNN-clf, FCNN-reg)

It's up to you! Issues are welcomed :)

## Q & A

• I used Google Translate to help me translate Chinese to English

### Why carefree-ml?

Why shall we choose to use (or learn from) `carefree-ml`?

`carefree-ml` 其实源于我一直以来未竟的两个心愿

• 探索机器学习算法到底可以简化成什么样
• 探索各种机器学习算法间的共性究竟有多少

`carefree-ml` actually stems from my two unfinished wishes

• Explore how machine learning algorithms can be simplified
• Explore commonality among various machine learning algorithms

If you happen to have these doubts, or willing to teach others about some intuitions, then `carefree-ml` may be suitable for you. However, if you have a higher pursuit of machine learning and desire to explore more wonderful properties from machine learning, then `carefree-ml` may irritate you, because it omits many of them

First of all, we know that machine learning (and deep learning) algorithms can often be transformed into unconstrained optimization problems. If some special properties (sparseness, convergence speed, etc.) are not considered, the gradient descent based methods can be the most widely use

Therefore, the first major module implemented by `carefree-ml` is a simple gradient descent optimization framework, which is designed to handle most cases with less code. After that, when implementing a machine learning algorithm in `carefree-ml`, gradient descent based methods will also be considered first (this is actually the biggest simplification done by `carefree-ml`, the following example of `LinearRegression` illustrates this)

• 两者都是 `线性模型`
• 前者做的是 `回归` 问题，后者做的是 `分类` 问题
• 后者在输出时使用了 `sigmoid` 激活函数

So, under this idea, how do we implement `LinearRegression` and` LogisticRegression`? First of all we may all know that:

• Both of them are `Linear Models`
• The former deals with `regression` problems, while the latter deals with `classification` problems
• The latter used `sigmoid` function to output the probability predictions

• 如果前者使用 `mse` 作为损失函数，后者使用 `cross_entropy` 作为损失函数，那么它们参数的梯度将会几乎是一模一样的（只差一个倍数）

But there's one thing that we might not have noticed before:

• If we use `mse` loss in `LinearRegression` and use `cross_entropy` loss in `LogisticRegression`, then thier parameters' gradients will be almost identical (except for a multiple factor)

Since they are so similar and the differences are only in a few small parts, their implementation should be very similar as well. Therefore, in `carefree-ml`, the main part of their implementations will be as follows:

```class LinearRegression(LinearRegressorMixin, RegressorBase):
def __init__(self):
self._w = self._b = None
```
```class LogisticRegression(LinearBinaryClassifierMixin, ClassifierBase):
def __init__(self):
self._w = self._b = None
self._sigmoid = Activations("sigmoid")

def _predict_normalized(self, x_normalized):
affine = super()._predict_normalized(x_normalized)
return self._sigmoid(affine)
```

The design here actually embodies the idea that `carefree-ml` wants to simplify machine learning algorithms. Because we know that `LinearRegression` under` mse` loss has an explicit solution (because it's simply a Least Squares problem), but we still use gradient descent to solve it because in this case it will share most of its code with `LogisticRegression`'s code

Of course, this simplification (reducing many algorithms to unconstrained optimization problems and solving them by gradient descent) has its advantage too. For example, we can solve `l1` loss or other losses in `LinearRegression` under the premise that the corresponding training codes will be almost unchanged

Another example is `svm`. Although support vector classification and support vector regression seem to be very different algorithms, but after pulling the cocoon, if you use gradient descent based methods to solve them, you will find that most of the codes are still shared. This also justifies why they belong to the same category - `svm`:

```class CoreSVCMixin:
@staticmethod
def _preprocess_data(x, y):
y_svm = y.copy()
y_svm[y_svm == 0] = -1
return x, y_svm

@staticmethod
def get_diffs(y_batch, predictions):
return {"diff": 1. - y_batch * predictions, "delta_coeff": -y_batch}

class SVCMixin(BinaryClassifierMixin, SVMMixin, metaclass=ABCMeta):
def predict_prob(self, x):
affine = self.predict_raw(x)
sigmoid = Activations.sigmoid(np.clip(affine, -2., 2.) * 5.)
return np.hstack([1. - sigmoid, sigmoid])
```
```class CoreSVRMixin:
def get_diffs(self, y_batch, predictions):
raw_diff = predictions - y_batch
l1_diff = np.abs(raw_diff)
if self.eps <= 0.:
tube_diff = l1_diff
else:
tube_diff = l1_diff - self.eps
return {"diff": tube_diff, "delta_coeff": np.sign(raw_diff)}

class SVRMixin(SVMMixin, metaclass=ABCMeta):
def predict(self, x):
return self.predict_raw(x)
```

After these, when you actually implement the `svm` algorithms, you only need to inherit different classes:

```class SVC(CoreSVCMixin, SVCMixin, ClassifierBase):
def __init__(self,
kernel: str = "rbf"):
self._kernel = Kernel(kernel)
```
```class SVR(CoreSVRMixin, SVRMixin, RegressorBase):
def __init__(self,
eps: float = 0.,
kernel: str = "rbf"):
self._eps = eps
self._kernel = Kernel(kernel)
```

Of course, the real core codes (`SVMMixin`) still have to be written

• 对输入的特征进行规范化处理（normalization）
• 在回归问题中对标签进行 normalization
• 在二分类问题中通过 roc curve 以及具体的 metric 来挑选出最优分类阈值

In addition, besides code sharing between similar algorithms, `carefree-ml` is also dedicated to sharing codes on common engineering functions. For example, we may generally need:

• Normalize the input features
• Normalize the labels in regression problems
• Utilize roc curve to find the best threshold of specific metric in binary classification problems

These engineering functions are also supposed to share codes. Therefore, `carefree-ml` implements ` NormalizeMixin` and `BinaryClassifierMixin` in `cfml.models.mixins` for these functions that may be widely used

### What can carefree-ml do?

`carefree-ml` 能做到什么？

What can `carefree-ml` do?

• 实现了一个轻量级的、泛用性比较好的梯度下降框架
• 比起模型的性能，更注重于让算法间共享逻辑、代码；正因此，总代码量会比较少
• 即使在第二点的“桎梏”下，在小的、比较简单的数据集上，无论是速度还是性能，都是可以锤掉 `scikit-learn` 的友商产品的

First of all, there are actually a lot of repos that use `numpy` to implement massive algorithms recently, so it is not appropriate to use` numpy` as a selling point. In my personal opinion, the reason why `carefree-ml` is still special are shown as follows:

• Implemented a lightweight gradient descent framework which can be used in a wide range of problems
• Compared with the performance of the model, it focused more on the sharing of logic and codes between algorithms. Therefore, the total amount of code will be less
• Even under the 'shackles' of the second point, on small and relatively simple datasets, it can beat `scikit-learn` in either speed or performance to some extend

Here's how you can test it (included installation procedures)

```git clone https://github.com/carefree0910/carefree-ml.git
cd carefree-ml
pip install -e .
cd tests/unittests
python test_all.py
```

Here's some fragments from the outputs:

```~~~  [ info ] timing for    cfml_fcnn     : 0.310764
~~~  [ info ] timing for   sklearn_fcnn   : 0.549960
==========================================================
|             cfml_fcnn  |    mae     |  2.682794  |  <-
|          sklearn_fcnn  |    mae     |  3.969561  |
----------------------------------------------------------
===========================================================
|             cfml_fcnn  |    mse     |  15.635315  |  <-
|          sklearn_fcnn  |    mse     |  30.890426  |
-----------------------------------------------------------
```
```~~~  [ info ] timing for     cfml_lr      : 0.039881
~~~  [ info ] timing for    sklearn_lr    : 0.654799
==========================================================
|               cfml_lr  |    auc     |  0.996287  |  <-
|            sklearn_lr  |    auc     |  0.994675  |
----------------------------------------------------------
==========================================================
|               cfml_lr  |    acc     |  0.980668  |  <-
|            sklearn_lr  |    acc     |  0.957821  |
----------------------------------------------------------
```
```# gaussian naive bayes
~~~  [ info ] timing for     cfml_gnb     : 0.000000
~~~  [ info ] timing for   sklearn_gnb    : 0.001028
# multinomial naive bayes
~~~  [ info ] timing for     cfml_mnb     : 0.003990
~~~  [ info ] timing for   sklearn_mnb    : 0.007011
```
```~~~  [ info ] timing for     cfml_svc     : 0.207024
~~~  [ info ] timing for    cfml_l_svc    : 0.023937
~~~  [ info ] timing for    sklearn_lr    : 0.571722
~~~  [ info ] timing for   sklearn_svc    : 0.007978
~~~  [ info ] timing for  sklearn_l_svc   : 0.148603
==========================================================
|            cfml_l_svc  |    auc     |  0.996300  |
|              cfml_svc  |    auc     |  1.000000  |  <-
|            sklearn_lr  |    auc     |  0.994675  |
----------------------------------------------------------
==========================================================
|            cfml_l_svc  |    acc     |  0.985940  |
|              cfml_svc  |    acc     |  1.000000  |  <-
|         sklearn_l_svc  |    acc     |  0.848858  |
|            sklearn_lr  |    acc     |  0.957821  |
|           sklearn_svc  |    acc     |  0.922671  |
----------------------------------------------------------
```
```~~~  [ info ] timing for     cfml_svr     : 0.090758
~~~  [ info ] timing for    cfml_l_svr    : 0.027925
~~~  [ info ] timing for   sklearn_svr    : 0.008012
~~~  [ info ] timing for  sklearn_l_svr   : 0.165730
==========================================================
|            cfml_l_svr  |    mae     |  3.107422  |  <-
|              cfml_svr  |    mae     |  5.106989  |
|         sklearn_l_svr  |    mae     |  4.654314  |
|           sklearn_svr  |    mae     |  5.259882  |
----------------------------------------------------------
===========================================================
|            cfml_l_svr  |    mse     |  24.503884  |  <-
|              cfml_svr  |    mse     |  66.583145  |
|         sklearn_l_svr  |    mse     |  39.598211  |
|           sklearn_svr  |    mse     |  66.818898  |
-----------------------------------------------------------
```

Of course, in the end we still have to say something responsibly: from the perspective of practical use and generalization, `scikit-learn` will beat ` carefree-ml` by all means (for short, `carefree-ml` does not support sparse data). However, as I said at the beginning, `carefree-ml` focus on exploring how machine learning algorithms can be simplified. So it is not surprising that `carefree-ml` can exceed `scikit-learn` on small & simple datasets in fitting capacity & fitting speed.

Notice that the above experimental results are the results on the training set, so it can only reflect the fitting capacity, not the generalization capacity

### How can I utilize carefree-ml?

How can I utilize `carefree-ml`?

From a practical point of view, perhaps the lightweight gradient descent framework implemented by `carefree-ml` is relatively the most useful tool. But even it will be easily defeated and replaced by `pytorch`

So, as I said at the beginning, `carefree-ml` is mainly for educational use, so the meaning of education may be greater than the practical meaning. Although my academic ability is not good at all, the original intention of this repo might not be worthy of academic researching. So from this perspective, `carefree-ml` may give you some new sights

`carefree-ml` is MIT licensed, as found in the `LICENSE` file.