A package to train machine learning models on housing dataset
Project description
Median housing value prediction
The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.
The following techniques have been used:
- Linear regression
- Decision Tree
- Random Forest
Steps performed
- We prepare and clean the data. We check and impute for missing values.
- Features are generated and the variables are checked for correlation.
- Multiple sampling techinuqies are evaluated. The data set is split into train and test.
- All the above said modelling techniques are tried and evaluated. The final metric used to evaluate is mean squared error.
Setup for development
Create conda environment
foo@bar:~$ conda env create -f deploy/conda/linux_cpu_py39.yml
foo@bar:~$ conda activate mle-dev
Perform test
Tox have been configured with pytest to automate testing in virtualenv.
foo@bar:~$ tox
Test a specific test file:
foo@bar:~$ tox -- -k <file_name>
Usage
Install package
Option 1. From github:
foo@bar:~$ git clone https://github.com/rishitoshsingh-ta/mle-training.git
foo@bar:~$ cd mle-training
foo@bar:~$ pip install .
Option 2. From PyPi
foo@bar:~$ pip install housing-prediction
Test installation:
To test whether the package is successfully installed or not, start python session, and try to import housing. If it's imported successfully, then installation is complete
foo@bar:~$ python
>>> import housing
It will install all the dependencies and the housing package
Run mlflow server
As mlflow tracking is used in this project, first mlflow server needs to be started. In he command below, \<directory> can be file:///home/user/artifacts , where you want to store mlruns data
foo@bar:~$ mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root <dricetory> \
--host 0.0.0.0 \
--port 8889
Run scripts
There are two ways to run the scripts, as single command line tool and as python scripts.
-
As command line tool
foo@bar:~$ housing
-
As python scripts
foo@bar:~$ python -m housing.ingest_data
foo@bar:~$ python -m housing.train
foo@bar:~$ python -m housing.score
You can also access pass arguments, to find all available arguments:
foo@bar:~$ housing --help
foo@bar:~$ python -m housing.ingest_data --help
foo@bar:~$ python -m housing.train
foo@bar:~$ python -m housing.score
Default arguments
The defaults argument values are located in a .cfg
file located in:
/path/to/env/lib/python3.9/site-packages/housing-prediction/housing.cfg
. The defaults can be changes as per user preferences.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for housing_prediction-0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 143e1b934b691cdfd7a41df71cea0c8e20eb8ea6822c88e9df804ce8190dfc07 |
|
MD5 | 50d90fff1caffb10555980a9ea510580 |
|
BLAKE2b-256 | b2da012459f08a210b36c2bd828b8f3a4e42ce32b5053e562d55242d05950cf4 |