Skip to main content

A scikit-learn compatible discrete first-order method for subset selection.

Project description

ReadTheDocs Maintenance yes

A Discrete First Order Method for Subset Selection

sklearn-discretefirstorder is a light-weight package that implements a simple discrete first-order method for best feature subset selection in linear regression.

The discrete first-order method is based on the technique described by Berstimas et al. [1]

The package is built on top of the scikit-learn framework and is compatible with scikit-learn methods such as cross-validation and pipelines. I followed the guidelines for developing scikit-learn estimators as outlined in the scikit-learn documentation.

About the project

I created this project first and foremost to learn more about how to build and maintain a Python project. My goal was never to build a state-of-the-art machine learning package.

I picked this topic because I had experimented with different feature selection approaches (including the discrete first-order method implemented here) as part of a grad school class project. However, I never developed a robust, well-documented codebase. I decided to re-implement the simplest technique from my grad school project so that I could focus on key aspects of project development such as proper API design, documentation and testing.

I felt like the scikit-learn framework was appropriate for this ML use-case and, more generally, a good set of guiding principles for my first proper Python package thanks to its clear standards and good documentation.

At the moment, the project is in a very early stage of development, but basic usage is already possible. If time permits, I plan to add more features and improve the documentation.

Installation

To install the package, clone this repo and run pip install:

git clone https://github.com/miguelfmc/sklearn-discretefirstorder
cd sklearn-discretefirstorder
pip install .

Quick Start

Once you have installed the package you can start using it as follows:

The key estimator in this packages is the discretefirstorder.DFORegressor. You can import it as:

from discretefirstorder import DFORegressor

Easily fit the estimator as follows:

import numpy as np
from discretefirstorder import DFORegressor
X = np.arange(100).reshape(100, 1)
y = np.random.normal(size=(100, ))
estimator = DFORegressor()
estimator.fit(X, y)

For more examples, see the documentation.

Known Issues

This package is still at a very early stage of development. The following issues are known:

  • Optimization routines are implemented in Python, which makes them slow.

  • At the moment, the package only supports squared error loss minimization but there are plans to include support for absolute error loss minimization.

  • At the moment, there is no support for classification problems i.e. logistic regression.

  • I’m working on making the package available on PyPI and conda-forge. Stay tuned for updates!

Contributing

While the project is still in its early stages, contributions are welcome!

To contribute, please fork the repo and clone it to your local machine. Then, create a new branch and make your changes.

I suggest you set-up your local environment with conda and pip:

conda create -n sklearn-discretefirstorder python=3.8
conda activate sklearn-discretefirstorder
pip install -r requirements.txt -r requirements-dev.txt -r requirements-docs.txt -r requirements-test.txt

You can also use conda to install all the dependencies from the environment.yml file:

conda env create -f environment.yml
conda activate sklearn-discretefirstorder

Then, install the package in editable mode:

pip install -e .

License

Distributed under BSD 3-Clause License. See LICENSE for more information.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-discretefirstorder-0.0.1.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklearn_discretefirstorder-0.0.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file sklearn-discretefirstorder-0.0.1.tar.gz.

File metadata

File hashes

Hashes for sklearn-discretefirstorder-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e1745d526d07792fed6e96573691b75ddb21fe49ed444563266724073e88105e
MD5 d4175bb01104aa7091c9e7b45250d4df
BLAKE2b-256 b8294280103ee81ffb22cd35f96980f6c8b8cc435e0cae5b2bc1a7a7c5c34b56

See more details on using hashes here.

File details

Details for the file sklearn_discretefirstorder-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sklearn_discretefirstorder-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e9b5952717037417063a6c858d3c4965f6323dff2afb83aba82ddc1b1fafe4fc
MD5 9afdbffb8e63cd8b24a3071197c21de3
BLAKE2b-256 2320ef10344c86bf23fbabc25324e85b8153fb3a7901f83161dd8a109becf3cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page