Code for comparing different machine learning algorithms for binary classification.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Overview

MLAC is available on PyPi and can be installed via

pip install MLAC

This package provides the functionality to quickly compare seven types of feature extraction algorithms and seven types of classifiers. In total there are 49 unique algorithms which can be defined from these FE and classifier algorithms, using the Sci-Kit learn pipeline and grid search functions. Included are two neural networks: an auto-encoder and vanilla fully connected network. The parameters in these are found using the HyperBand algorithm, provided in the KerasTuner package. When a neural network is included in the pipeline, an initial search is performed over the hyper-parameter space using a low number of epochs, patience and factor; this is increased at the end of the search to provide a finer search. This process facilitates quickly determining a good set of hyper-parameters. Included is the ability to see the different hyper-parameter values selected, and whether these are at the bounds of the defined search.

Example

Import data:

data_input = np.random.randn(3000, 1)
ind = np.where(data_input < 0)
data_output = np.zeros(np.shape(data_input))
data_input = data_input + np.random.randn(3000, 1)*0.2 # add noise
data_output[ind] = 1

test_input = np.random.randn(300, 1)
ind = np.where(test_input < 0)
test_output = np.zeros(np.shape(test_input))
test_input = test_input + np.random.randn(300, 1)* 0.2 # add noise
test_output[ind] = 1

This looks like: data

We want to find the best ML algorithm that can seperate out this data and classify it:

CD_class = CDA.Parameter_Search(
            data_input,
            data_output,
        )

To see the available algorithms:

keys_FE = CD_class.keys_FE
keys_CA = CD_class.keys_CA

Finally:

score_arr = []
for i in keys_FE:
   for j in keys_CA:
       CD_class.FE = i
       CD_class.CA = j
       CD_class.trained_model()
       score, predictions = CD_class.predict(test_input, test_output)
       score_arr.append(score)

The results are save automatically for the training data:

We can see the results of a specific hyper-parameter search:

If you want to save the data, you can pull the default filepath:

filepath = CD_class.filepath
df = pd.DataFrame([keys_FE,keys_CA,score_arr]).transpose()
df.columns = ["FE", "CA", "Accuracy"]
df.to_excel(filepath + "\\algorithm_performance_test.xlsx", index=False)

see:

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.2

Sep 3, 2021

0.1

Sep 3, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLAC-0.2.tar.gz (9.4 kB view hashes)

Uploaded Sep 3, 2021 Source

Built Distribution

MLAC-0.2-py3-none-any.whl (10.6 kB view hashes)

Uploaded Sep 3, 2021 Python 3

Hashes for MLAC-0.2.tar.gz

Hashes for MLAC-0.2.tar.gz
Algorithm	Hash digest
SHA256	`b020724007adc1407d87b1f5cd9c96125692d2cb72b3498795afcd8fc52d604c`
MD5	`37fd4125c18890ef4deb46588fe59840`
BLAKE2b-256	`ad83068c4fdcd3e0a34930c96bd276867c7d509a01417bb7d800d8802bc3f5cf`

Hashes for MLAC-0.2-py3-none-any.whl

Hashes for MLAC-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c976920afb979a37534a5e875847a7126224fb775dda24cb0acc9469a59bf20a`
MD5	`1a39d114abd3a06986f6628707e52220`
BLAKE2b-256	`bae5d033b4a23147ea384db00fdf3f3d37d57031fefaa4583000389312509b88`