Skip to main content

A package to train machine learning models on housing dataset

Project description

Median housing value prediction

The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.

The following techniques have been used:

  • Linear regression
  • Decision Tree
  • Random Forest

Steps performed

  • We prepare and clean the data. We check and impute for missing values.
  • Features are generated and the variables are checked for correlation.
  • Multiple sampling techinuqies are evaluated. The data set is split into train and test.
  • All the above said modelling techniques are tried and evaluated. The final metric used to evaluate is mean squared error.

Setup for development

Create conda environment

foo@bar:~$ conda env create -f deploy/conda/linux_cpu_py39.yml 
foo@bar:~$ conda activate mle-dev 

Perform test

Tox have been configured with pytest to automate testing in virtualenv.

foo@bar:~$ tox 

Test a specific test file:

foo@bar:~$ tox -- -k <file_name>

Usage

Install package

Option 1. From github:

foo@bar:~$ git clone https://github.com/rishitoshsingh-ta/mle-training.git
foo@bar:~$ cd mle-training
foo@bar:~$ pip install .

Option 2. From PyPi

foo@bar:~$ pip install housing-prediction

Test installation:

To test whether the package is successfully installed or not, start python session, and try to import housing. If it's imported successfully, then installation is complete

foo@bar:~$ python
>>> import housing

It will install all the dependencies and the housing package

Run mlflow server

As mlflow tracking is used in this project, first mlflow server needs to be started. In he command below, \<directory> can be file:///home/user/artifacts , where you want to store mlruns data
foo@bar:~$ mlflow server \
      --backend-store-uri sqlite:///mlflow.db \
      --default-artifact-root <dricetory> \
      --host 0.0.0.0 \
      --port 8889 

Run scripts

There are two ways to run the scripts, as single command line tool and as python scripts.

  • As command line tool

    foo@bar:~$ housing
    
  • As python scripts

foo@bar:~$ python -m housing.ingest_data
foo@bar:~$ python -m housing.train
foo@bar:~$ python -m housing.score

You can also access pass arguments, to find all available arguments:

foo@bar:~$ housing --help
foo@bar:~$ python -m housing.ingest_data --help
foo@bar:~$ python -m housing.train
foo@bar:~$ python -m housing.score

Default arguments

The defaults argument values are located in a .cfg file located in: /path/to/env/lib/python3.9/site-packages/housing-prediction/housing.cfg. The defaults can be changes as per user preferences.

Project details


Release history Release notifications | RSS feed

This version

0.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

housing-prediction-0.4.tar.gz (24.9 kB view hashes)

Uploaded Source

Built Distribution

housing_prediction-0.4-py3-none-any.whl (27.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page