Project description

Welcome to housing_price_pred's documentation!

The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.

The following techniques have been used:

Linear regression
Decision Tree
Random Forest - Both Randomized Search and Grid Search has been done for Hyperparameter Tuning.

The trained models are saved in a specified directory from where they can be used to check the performance on the test set.

Steps performed

We prepare and clean the data. We check and impute for missing values.
Features are generated and the variables are checked for correlation.
Multiple sampling techinuqies are evaluated. The data set is split into train and test.
All the above said modelling techniques are tried and evaluated. The final metric used to evaluate is mean squared error.

Install the package

Install the package using python3 -m python3 -m pip install housing_price_pred

Usage

From Command Prompt type python3 to start Python
To check availability inside python type help("modules"). The package should show up in the list.
Package majorly contains 3 modules, ingest_data, train and score. Please note the train_data function from train module does not return any object, it simply trains the model on pre-processed data and stores the trained models as pickles inside specified directory. Incase returns are needed, Please raise and issue and this can be fixed.
Import the modules from the package from housing_price_pred import ingest_data,train,score.
Functions inside these modules can be used with appropriate arguments.
For more information on the functions, run help(function_name).

Here is an usage example: ::

from housing_price_pred import ingest_data,train,score
housing, strat_train_set, strat_test_set = download_data(housing_url, housing_path)
train_data(input_folder=args.input_path,processed_folder,pickle_path)
lr_predictions, tr_predictions, rnd_forest_predictions, grd_forest_predictions = score_models(processed_folder=\
                                                                                              pickle_path,\
                                                                                              output_path
                                                                                              )

It is recommended to run train_data and score_models with default parameters if ingest_data is run with default parameters.

For Contributors and Development

Fork the repo here https://github.com/sibashisc/mle-training/tree/fix/9%2Fml-workflow
Create a dev environment using the .yml file
- conda env create -f env.yml
Activate environment
- conda activate mle-dev

To excute the script

python <scriptname.py>
Each script ingest_data.py, train.py and score.py can take user arguments. For more information run python3 <scriptname.py> --help

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.3

Jul 5, 2021

0.0.2

Jul 5, 2021

0.0.1

Jun 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

housing_price_pred-0.0.3.tar.gz (13.2 kB view hashes)

Uploaded Jul 5, 2021 Source

Built Distribution

housing_price_pred-0.0.3-py3-none-any.whl (18.6 kB view hashes)

Uploaded Jul 5, 2021 Python 3

Hashes for housing_price_pred-0.0.3.tar.gz

Hashes for housing_price_pred-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`22fd20c2ae411b8b508f7a0608bc8e0fac9db586c181098655ce36c91a7150b0`
MD5	`ed186a6101f0e652106efb753a84709e`
BLAKE2b-256	`abbcdeeec98e384fc6d98c1910a3430fb66688c9c0cd1ebc1788e4a7bdeff678`

Hashes for housing_price_pred-0.0.3-py3-none-any.whl

Hashes for housing_price_pred-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bcf5cb8f5ff3996e084c19978acbbbdb0308bead336f2e6fcfc61f5fb4225b3e`
MD5	`f6d7c4f1547dbc93143bc8889e46456c`
BLAKE2b-256	`bc52eb72ed6f8465ad6e9b5821aeefb60fec478b8ffba1754e3639fe59da1702`