Skip to main content

fit estimate utility

Project description

skestimate package

This package is used to fit and get various metrics for a multi class classifier. It is based on scikit-learn library and uses the methods from that library.

fit_est module inside the package has a class named Xest(self, estimator, data, target_label, ts).

  • estimator: A piplined classifier that encodes all the categorical features.
  • data: Raw dataset including the target label
  • target_label: A string type representing the name of the target label
  • ts: A number between 0 and 1 that specifies the test portion of the data.

Below is an example of how to use the class methods on the CoverType data set from UCI repository.

$ data = pd.read_csv("https://github.com/skhabiri/PredictiveModeling-CoverType-u2build/blob/master/data/train.csv?raw=true")
rfc = make_pipeline(
    RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                           criterion='entropy', max_depth=14, max_features=20,
                           max_leaf_nodes=None, max_samples=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=2, min_samples_split=10,
                           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=-1,
                           oob_score=False, random_state=42, verbose=0,
                           warm_start=False)
                    )
xest = skestimate.Xest(rfc, data, "Cover_Type", 0.2)

For local testing we can use the example() function in fit_est.py.

>>> import skestimate	
>>> myest = skestimate.example()
>>> myest.xskew(0.9)

Available methods associated with Xest class:

  • xunique(): Reports counts of unique values in each column of data

  • xskew(imb=0.99):
    Returns a pandas Series of the sorted column with skewness more than imb. imb is between 0 and 1

  • xfit():
    Fits the pipeline estimator and returns fitted estimator, training score, and test score

  • xscore(fit=True):
    Calculates accuracy, recall and precision of a classifier and plots the confusion matrix

Docker Container

An axample of how to test the sketimate package in a debian OS through docker image. Here are the steps:

  • Create the Dockerfile in project (repository) directory. The specific instructions in Dockerfile includes:
    • Based in”debian” image in dockerhub
    • bash shell
    • Python3 and pip3 installed
    • numpy, pandas, scikit-learn, and matplotlib all installed
    • skestimate package installed
  • Build the image on local machine with docker build . -t skestimate_di
  • Very the existence of the new package with docker image ls
  • Create and enter a fresh container with `docker run -it skestimate_di’
  • Test the package with:
    • python3
    • import skestimate
    • est = skestimate.example()
    • est.xscore(fit=True)
    • exit() #exit the python repl
  • Exit the container with exit

Dockerfile

## Image to base ours on
## docker run debian
FROM debian

## Environment variables to configure things
## Setting shell so pipenv shell works
ENV SHELL=/bin/bash

## Update and install dependencies
## Running them in pipeline. \ is for continue into a new line
## update, updates the list of packages available for install
## and install all the security packages to have an up to date OS.
RUN apt update && \
  ## Yes to possible questions as it is not running interactively
  ## upgrade potentially installs new versions of outdated packages
  apt upgrade -y && \
  ## install python3 and pip3 at the same time
  apt install python3-pip -y && \
  pip3 install pandas numpy scikit-learn matplotlib && \
  pip3 install skestimate

## In command prompt:
## docker build . -t skestimate_di
## docker run -it skestimate_di

Unit Testing

You can test the individual methods in command line with python test_skestimate.py --verbose

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skestimate-0.0.3.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skestimate-0.0.3-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file skestimate-0.0.3.tar.gz.

File metadata

  • Download URL: skestimate-0.0.3.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.8

File hashes

Hashes for skestimate-0.0.3.tar.gz
Algorithm Hash digest
SHA256 6682e7a67a676a866e42494a350d7eae63881a7467dafe45b9ee7735dabcf707
MD5 f4cb58360d61981f6a1c0da469b9b938
BLAKE2b-256 d5d86c623acc49b6aef36c7880c66254c63eb40cab201600f481e97e80b3cd14

See more details on using hashes here.

File details

Details for the file skestimate-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: skestimate-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.8

File hashes

Hashes for skestimate-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9359ac87d89cc5d148947c9494d1aa45c920594c95f97ec422f57980a643b879
MD5 984ee538a2b428003a7ddfd163715200
BLAKE2b-256 173771803ef93e0fac89a95460fd1514aadea537b828449066f6215cc3753869

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page