Skip to main content

A custom package for house price prediction

Project description

Median housing value prediction

The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.

The following techniques have been used:

  • Linear regression
  • Decision Tree
  • Random Forest

Steps performed

  • In python code we have executed the following steps.
  • Prepared and cleaned the data. We check and impute for missing values.
  • Features are generated and the variables are checked for correlation.
  • Multiple sampling techinuqies are evaluated. The data set is split into train and test.
  • linear regression, Decision tree and random forest techniques were used.
  • Also used RandomizedSearchCV and GridSearchCV to find the best params
  • All the above said modelling techniques are tried and evaluated. The final metric used to evaluate is mean squared error.

Execution of the script

  • Inorder to set up the conda environment and make the execution process isolated, a special environment 'mle-dev' was created with the help of env.yml file generated. The folowing is the command to create the environment using env.yml file

    conda env create -f env.yml

  • To create and activate the env 'mle-dev'

    conda create -n mle-dev

    conda activate mle-dev

  • After creating the environment, the bug free file nonstandardcode.py was executed with the help of the following command.

    python3 nonstandardcode.py

    flake8 nonstandardcode.py

  • The screenshot of the output from running the nonstandardcode.py successfully in the mle-dev environment was attached as part of PR description

Packaging and Testing

Breaking the nonstandardcode.py file into three files

  • The nonstandardcode.py file was converted into three seperate files namely ingest_data.py, train.py and score.py
  • The ingest_data.py file is responsible for preprocessing the data and creation of valid test and train datasets that are ready for training the model
  • The ingest_data.py file was run by giving the path to the output folder where the processed files are to be stored using the command prompt.
  • The train.py file is solely responsible for training the models. It uses different algorithms to train using the datasets and as an output,it generates the pickle files which are stored in the artifacts folder of the project
  • The score.py file is responsible for evaluating the performance of the models by considering the pickle files and prints the scores to the terminal

Running Tests

  • Created two unit tests in the tests folder of the project root folder and in that folder, two seperate folders were created, one for unit testing and one for functional testing.
  • Ran unit test for ingest_data.py evaluating whether the the generated outputs are valid
  • Ran functional test for the proper installation of the package in a virtual environment created using the subprocess inbuilt package in python

Packaging

  • Created the pyproject.toml file to create the package.The following command is used to create the package

    python -m build

  • After converting it to a package, sphinx - a python documentation generating tool is used to create the proper documentation and to generate html document to the prepared package.

  • The following are the commands used for generating html content

    conda install sphinx

    sphinx-quickstart

  • Made a documentation in index.rst file

  • The following is the command used for generating the html

    make html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

packaged_non_std_nisha-0.0.1.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

packaged_non_std_nisha-0.0.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file packaged_non_std_nisha-0.0.1.tar.gz.

File metadata

  • Download URL: packaged_non_std_nisha-0.0.1.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for packaged_non_std_nisha-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d09e81c2ca2bf1dd1b2acc405cee40d644dd38d88667f2a8a67c32fd8c0e65f9
MD5 2b246cda22882d314cddc27c3ff197c0
BLAKE2b-256 9575850e248b335f74e992a667bd1074b30036dd8450192f112902021b2a8d5b

See more details on using hashes here.

File details

Details for the file packaged_non_std_nisha-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for packaged_non_std_nisha-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fe7790a10301354faad887597c872be6858b25286d8b673c7aa9ae6f1dc7ead8
MD5 b4e50896e533e7a1cad99eb55d3ef447
BLAKE2b-256 674691925d63e3f7ace1e9a73e43f0af33bcfb9f8d47c73cb5450de281a7bc7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page