Skip to main content

A scikit-learn compatible discrete first-order method for subset selection.

Project description

ReadTheDocs Maintenance yes

A Discrete First Order Method for Subset Selection

sklearn-discretefirstorder is a light-weight package that implements a simple discrete first-order method for best feature subset selection in linear regression.

The discrete first-order method is based on the technique described by Berstimas et al. [1]

The package is built on top of the scikit-learn framework and is compatible with scikit-learn methods such as cross-validation and pipelines. I followed the guidelines for developing scikit-learn estimators as outlined in the scikit-learn documentation.

About the project

I created this project first and foremost to learn more about how to build and maintain a Python project. My goal was never to build a state-of-the-art machine learning package.

I picked this topic because I had experimented with different feature selection approaches (including the discrete first-order method implemented here) as part of a grad school class project. However, I never developed a robust, well-documented codebase. I decided to re-implement the simplest technique from my grad school project so that I could focus on key aspects of project development such as proper API design, documentation and testing.

I felt like the scikit-learn framework was appropriate for this ML use-case and, more generally, a good set of guiding principles for my first proper Python package thanks to its clear standards and good documentation.

At the moment, the project is in a very early stage of development, but basic usage is already possible. If time permits, I plan to add more features and improve the documentation.

Installation

To install the package from source, clone this repo and run pip install:

git clone https://github.com/miguelfmc/sklearn-discretefirstorder
cd sklearn-discretefirstorder
pip install .

The package is also available for installation from PyPI!:

pip install sklearn-discretefirstorder

Quick Start

Once you have installed the package you can start using it as follows:

The key estimator in this packages is the discretefirstorder.DFORegressor. You can import it as:

from discretefirstorder import DFORegressor

Easily fit the estimator as follows:

import numpy as np
from discretefirstorder import DFORegressor
X = np.arange(100).reshape(100, 1)
y = np.random.normal(size=(100, ))
estimator = DFORegressor()
estimator.fit(X, y)

For more examples, see the documentation.

Known Issues

This package is still at a very early stage of development. The following issues are known:

  • Optimization routines are implemented in Python, which makes them slow.

  • At the moment, the package only supports squared error loss minimization but there are plans to include support for absolute error loss minimization.

  • At the moment, there is no support for classification problems i.e. logistic regression.

  • I’m working on making the package available on conda-forge. Stay tuned for updates!

Contributing

While the project is still in its early stages, contributions are welcome!

To contribute, please fork the repo and clone it to your local machine. Then, create a new branch and make your changes.

I suggest you set-up your local environment with conda and pip:

conda create -n sklearn-discretefirstorder python=3.8
conda activate sklearn-discretefirstorder
pip install -r requirements.txt -r requirements-dev.txt -r requirements-docs.txt -r requirements-test.txt

You can also use conda to install all the dependencies from the environment.yml file:

conda env create -f environment.yml
conda activate sklearn-discretefirstorder

Then, install the package in editable mode:

pip install -e .

License

Distributed under BSD 3-Clause License. See LICENSE for more information.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-discretefirstorder-0.0.2.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklearn_discretefirstorder-0.0.2-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file sklearn-discretefirstorder-0.0.2.tar.gz.

File metadata

File hashes

Hashes for sklearn-discretefirstorder-0.0.2.tar.gz
Algorithm Hash digest
SHA256 2650516361d68c9e2421cb3d67957d58900301fd8d5b8baa1e0ad7d0df1be027
MD5 a458f9bc6ebf32a50e123f97ff0fa5cc
BLAKE2b-256 de1ccc3156e72ecba3ce36b2cee8af1965f67329e971662799b14c3e5392668c

See more details on using hashes here.

File details

Details for the file sklearn_discretefirstorder-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sklearn_discretefirstorder-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 865349b0f322647d63c23d5d11b183f13a8326923fbb1e33bf36eb9feb992618
MD5 9811ca92385b7ea83f462bc73485bf8a
BLAKE2b-256 d31dfe1ee4a20f549f68a337b0a4d89b7628fed53d224a0b5883ea13cc318e8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page