Skip to main content

Package for Practical & efficient Data Science in Python. Initially written for data-science-keras repo

Project description

Data science projects with Keras (Poetry Version)

Code style: black

Author: Angel Martinez-Tenor

Repository: Github link

This repo contains a set of data science projects solved with artificial neural networks implemented in Keras. It is based on a set of use cases from Udacity, Coursera & Kaggle

The repo also introduces a minimal package ds_boost initally implemented as a helper for this repo

Disclaimer: This notebooks-based repo was developed in early 2018. Since July 2022, I'm updating it using the best practices I've learned implementing solutions in production environment my experience as a lead data scientist

A non-poetry version of this repo is available in the branch no-poetry

Scenarios

Classification models

  • Enron Scandal Identifies Enron employees who may have committed fraud

  • Property Maintenance Fines Predicts the probability of a set of blight tickets to be paid on time

  • Sentiment IMDB Predicts positive or negative sentiments from movie reviews (NLP)

  • Spam detector Predicts the probability that a given email is a spam email (NLP)

  • Student Admissions Predicts student admissions to graduate school at UCLA

  • Titanic Predicts survival probabilities from the sinking of the RMS Titanic

Regression models

  • Bike Rental Predicts daily bike rental ridership

  • House Prices Predicts house sales prices from Ames Housing database

  • Simple tickets Predicts the number of tickets requested by different clients

Recurrent models

Social network models

  • Network Predicts missing salaries and new email connections from a company's email network

Setup & Usage

Python 3.10+ required

  1. Clone the repository using git:

    git clone https://github.com/angelmtenor/data-science-keras.git
    
  2. Enter to the root path of the repo and use or create a new conda environment for development:

$ conda create -n dev python=3.10 -y && conda activate dev
  1. Install the minimal package developed as a helper for this repo:

    pip install dist/ds_boost-0.1.0-py3-none-any.whl
    
  2. Open the desired project/s with Jupyter Notebook

    cd data-science-keras
    jupyter notebook
    

Development Mode

In the root folder of the cloned repository, install all the required dev packages and the ds-boost mini package (Make required):

make setup

To install tensorflow with GPU support, follow the instructions of this guide: Install TensorFlow GPU.

QA (manual pre-commit):

make qa

Development Tools Required:

A Container/Machine with Conda, Git and Poetry as closely as defined in .devcontainer/Dockerfile:

  • This Dockerfile contains a non-root user so the same configuration can be applied to a WSL Ubuntu Machine and any Debian/Ubuntu CLoud Machine (Vertex AI workbench, Azure VM ...)
  • In case of having an Ubuntu/Debian machine with non-root user (e.g.: Ubuntu in WSL, Vertex AI VM ...), just install the tools from non-root user (no sudo)* section of .devcontainer/Dockerfile (sudo apt-get install <software> may be required)
  • A pre-configured Cloud VM usually has Git and Conda pre-installed, those steps can be skipped
  • The development container defined in .devcontainer/Dockerfile can be directly used for a fast setup (Docker required). With Visual Studio Code, just open the root folder of this repo, press F1 and select the option Dev Containers: Open Workspace in Container. The container will open the same workspace after the Docker Image is built.

Contributing

Check out the contributing guidelines

License

ds_boost was created by Angel Martinez-Tenor. It is licensed under the terms of the MIT license.

Credits

ds_boost was created from a Data Science Template developed by Angel Martinez-Tenor. The template was built upon py-pkgs-cookiecutter [template] (https://github.com/py-pkgs/py-pkgs-cookiecutter)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds_boost-0.1.1.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

ds_boost-0.1.1-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file ds_boost-0.1.1.tar.gz.

File metadata

  • Download URL: ds_boost-0.1.1.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.4 Linux/5.15.0-57-generic

File hashes

Hashes for ds_boost-0.1.1.tar.gz
Algorithm Hash digest
SHA256 36c8c9edc7121416871ce38e4a62b84cd28eb08c9a08c41704041b0193cc99e4
MD5 7c61d69d0fdba9d5f9b1d3f7c8c4175a
BLAKE2b-256 5bf58daa47f3abbc89f6fb124b8a52504d69ae0fcf122092a50e8d6737a13962

See more details on using hashes here.

File details

Details for the file ds_boost-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ds_boost-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.4 Linux/5.15.0-57-generic

File hashes

Hashes for ds_boost-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf65e99a8112a6ccbb0824dd6b049927c524d47160f583df3e52aa77bf7efefb
MD5 c38b9f9e2d2f7136d3a0b7d037cd46b6
BLAKE2b-256 2b53e1a287075e2cdc9ab5dd83a7d19212c5eb599e9b2747484b9a9f2035efc4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page