Project description

end2endML package

The end2endML Python package implemented all the components, data preprocessing, data splitting, model selection, model fitting and model evaluation, required for defining pipelines to do do automate data analysis using some most commonly used machine learning algorithms.

Installation

Install end2endML package by running:

pip install end2endML

on the command line of either Linux system or the Anaconda Prompt on Windows system. If you don't have root privileges, some times you need to add --user after the above commands, then pip will install the packages in your home directory. which doesn't require root privileges.

User guide

User guide is available at https://end2endml.readthedocs.io/en/latest/.

TODO

~~Implement feature extraction feature to the models.~~
- The feature extraction methods only implemented for linear models, svm and neural network. For Tree based methods, they are not implemented.
- The number of components are taken as a hyperparameter for model selection.
~~Implement the unite test suite to do automate testing for every update.~~
Currently, if we specify a gradient boosting model for imbalanced classification both RUSBOOST and EASYENSYMBLE, which differs in how the undersampling is implemented, are selected and trained. Need to find a way to let the user to set it.
If the trained model has already used 10 cores, specify the CV procedure to use another 10 cores, in general is Ok. However, it can be a problem for easyensemble models when the data set is large. Fix it by set the CV procedure n_jobs to be None in easyensembler model
Add the fun to check if the preprocessed data is avaliable. If the data is avaliable, there is no need to preprocess the data anymore. Myabe this is not a good idea, as sometime we may use different parameters to control the behavior to do data preprocessing. And the time to re-preprocess time is not much.
~~Bug. The data analysis pipline should has the ability to remove the inifnte values existed in X and y.~~
When cat_threshold set to 2, which means we are not going to classify the subjects with numerical data type but with limited unique values, then the y will not be transformed to object data type, then the automate data analysis procedure will take it as a regression task.
We should re-save the preprocessed data sets every time. Currently, if the function detect the preprocessed data has already saved, it will not save the preprocessed data anymore. This can lead to serious issue when the data preprocessing parameters change. In addition, it doesn't take much time, we should save the preprocessed data.
~~For binary classificatoin and regression problems, the saved feature importances should be one dimentional rather than two dimensional.~~
~~--user, why~~
~~Keep track of all the preprocessing steps, so we can apply the exat same preprocessing steps to the new data.~~
Add Dan and Mengzhe to author list. Haven't got the agreement from Mengzhe and Dan. Thus, only include them into the credits.
~~Print out time~~

Project details

These details have not been verified by PyPI

Project links

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.9.3

Oct 3, 2021

0.9.3.dev0 pre-release

Feb 1, 2022

0.9.2

Sep 6, 2021

0.9.1

Aug 31, 2021

0.9.0

Aug 17, 2021

This version

0.8.0

Aug 16, 2021

0.7.0

Aug 15, 2021

0.6.0

Aug 14, 2021

0.5.0

Aug 14, 2021

0.4.0

Aug 14, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

end2endML-0.8.0.tar.gz (28.0 kB view hashes)

Uploaded Aug 16, 2021 Source

Built Distribution

end2endML-0.8.0-py3-none-any.whl (31.9 kB view hashes)

Uploaded Aug 16, 2021 Python 3

Hashes for end2endML-0.8.0.tar.gz

Hashes for end2endML-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`f807c57926bb0d70f074dcbc331b4250cc81c86f28c718523f22d0b67387ec33`
MD5	`71ab19784f768369024fd8349b9d025b`
BLAKE2b-256	`a4377394a858057fbc2a96d028ce966752d826718de9d42a4590a7b04a938026`

Hashes for end2endML-0.8.0-py3-none-any.whl

Hashes for end2endML-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f7c1ca7e8afb05a7faf54388bcf98a34542ca11986e20210bb11f2a48abcf18`
MD5	`32847fc1cb827f8eb853c4c4bbb4c028`
BLAKE2b-256	`d540234159da9061d2b1c74e46bd3b1d84e13ad317d44b9cf3faf451c6b01313`