A custom package for house price prediction
Project description
Median housing value prediction
The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.
The following techniques have been used:
- Linear regression
- Decision Tree
- Random Forest
Steps performed
- In python code we have executed the following steps.
- Prepared and cleaned the data. We check and impute for missing values.
- Features are generated and the variables are checked for correlation.
- Multiple sampling techinuqies are evaluated. The data set is split into train and test.
- linear regression, Decision tree and random forest techniques were used.
- Also used RandomizedSearchCV and GridSearchCV to find the best params
- All the above said modelling techniques are tried and evaluated. The final metric used to evaluate is mean squared error.
Execution of the script
-
Inorder to set up the conda environment and make the execution process isolated, a special environment 'mle-dev' was created with the help of env.yml file generated. The folowing is the command to create the environment using env.yml file
conda env create -f env.yml -
To create and activate the env 'mle-dev'
conda create -n mle-devconda activate mle-dev -
After creating the environment, the bug free file nonstandardcode.py was executed with the help of the following command.
python3 nonstandardcode.pyflake8 nonstandardcode.py -
The screenshot of the output from running the nonstandardcode.py successfully in the mle-dev environment was attached as part of PR description
Packaging and Testing
Breaking the nonstandardcode.py file into three files
- The nonstandardcode.py file was converted into three seperate files namely ingest_data.py, train.py and score.py
- The ingest_data.py file is responsible for preprocessing the data and creation of valid test and train datasets that are ready for training the model
- The ingest_data.py file was run by giving the path to the output folder where the processed files are to be stored using the command prompt.
- The train.py file is solely responsible for training the models. It uses different algorithms to train using the datasets and as an output,it generates the pickle files which are stored in the artifacts folder of the project
- The score.py file is responsible for evaluating the performance of the models by considering the pickle files and prints the scores to the terminal
Running Tests
- Created two unit tests in the tests folder of the project root folder and in that folder, two seperate folders were created, one for unit testing and one for functional testing.
- Ran unit test for ingest_data.py evaluating whether the the generated outputs are valid
- Ran functional test for the proper installation of the package in a virtual environment created using the subprocess inbuilt package in python
Packaging
-
Created the pyproject.toml file to create the package.The following command is used to create the package
python -m build -
After converting it to a package, sphinx - a python documentation generating tool is used to create the proper documentation and to generate html document to the prepared package.
-
The following are the commands used for generating html content
conda install sphinxsphinx-quickstart -
Made a documentation in index.rst file
-
The following is the command used for generating the html
make html
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file packaged_non_std_nisha-0.0.1.tar.gz.
File metadata
- Download URL: packaged_non_std_nisha-0.0.1.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d09e81c2ca2bf1dd1b2acc405cee40d644dd38d88667f2a8a67c32fd8c0e65f9
|
|
| MD5 |
2b246cda22882d314cddc27c3ff197c0
|
|
| BLAKE2b-256 |
9575850e248b335f74e992a667bd1074b30036dd8450192f112902021b2a8d5b
|
File details
Details for the file packaged_non_std_nisha-0.0.1-py3-none-any.whl.
File metadata
- Download URL: packaged_non_std_nisha-0.0.1-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe7790a10301354faad887597c872be6858b25286d8b673c7aa9ae6f1dc7ead8
|
|
| MD5 |
b4e50896e533e7a1cad99eb55d3ef447
|
|
| BLAKE2b-256 |
674691925d63e3f7ace1e9a73e43f0af33bcfb9f8d47c73cb5450de281a7bc7b
|