Skip to main content

Package for Practical & efficient Data Science in Python. Initially written for data-science-keras repo

Project description

Data science projects with Keras (Poetry Version)

Code style: black

Author: Angel Martinez-Tenor

Repository: Github link

This repo contains a set of data science projects solved with artificial neural networks implemented in Keras. It is based on a set of use cases from Udacity, Coursera & Kaggle

The repo also introduces a minimal package ds_boost initally implemented as a helper for this repo

Disclaimer: This notebooks-based repo was developed in early 2018. Since July 2022, I'm updating it using the best practices I've learned implementing solutions in production environment my experience as a lead data scientist

A non-poetry version of this repo is available in the branch no-poetry

Scenarios

Classification models

  • Enron Scandal Identifies Enron employees who may have committed fraud

  • Property Maintenance Fines Predicts the probability of a set of blight tickets to be paid on time

  • Sentiment IMDB Predicts positive or negative sentiments from movie reviews (NLP)

  • Spam detector Predicts the probability that a given email is a spam email (NLP)

  • Student Admissions Predicts student admissions to graduate school at UCLA

  • Titanic Predicts survival probabilities from the sinking of the RMS Titanic

Regression models

  • Bike Rental Predicts daily bike rental ridership

  • House Prices Predicts house sales prices from Ames Housing database

  • Simple tickets Predicts the number of tickets requested by different clients

Recurrent models

Social network models

  • Network Predicts missing salaries and new email connections from a company's email network

Setup & Usage

Python 3.8+ required. Conda environment with Python 3.10 suggested

  1. Clone the repository using git:

    git clone https://github.com/angelmtenor/data-science-keras.git
    
  2. Enter to the root path of the repo and use or create a new conda environment for development:

$ conda create -n dev python=3.10 -y && conda activate dev
  1. Install the minimal package developed as a helper for this repo:

    pip install dist/ds_boost-0.1.0-py3-none-any.whl
    
  2. Open the desired project/s with Jupyter Notebook

    cd data-science-keras
    jupyter notebook
    

Development Mode

In the root folder of the cloned repository, install all the required dev packages and the ds-boost mini package (Make required):

make setup

To install tensorflow with GPU support, follow the instructions of this guide: Install TensorFlow GPU.

QA (manual pre-commit):

make qa

Development Tools Required:

A Container/Machine with Conda, Git and Poetry as closely as defined in .devcontainer/Dockerfile:

  • This Dockerfile contains a non-root user so the same configuration can be applied to a WSL Ubuntu Machine and any Debian/Ubuntu CLoud Machine (Vertex AI workbench, Azure VM ...)
  • In case of having an Ubuntu/Debian machine with non-root user (e.g.: Ubuntu in WSL, Vertex AI VM ...), just install the tools from non-root user (no sudo)* section of .devcontainer/Dockerfile (sudo apt-get install <software> may be required)
  • A pre-configured Cloud VM usually has Git and Conda pre-installed, those steps can be skipped
  • The development container defined in .devcontainer/Dockerfile can be directly used for a fast setup (Docker required). With Visual Studio Code, just open the root folder of this repo, press F1 and select the option Dev Containers: Open Workspace in Container. The container will open the same workspace after the Docker Image is built.

Contributing

Check out the contributing guidelines

License

ds_boost was created by Angel Martinez-Tenor. It is licensed under the terms of the MIT license.

Credits

ds_boost was created from a Data Science Template developed by Angel Martinez-Tenor. The template was built upon py-pkgs-cookiecutter [template] (https://github.com/py-pkgs/py-pkgs-cookiecutter)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds_boost-0.1.0.tar.gz (22.4 kB view hashes)

Uploaded Source

Built Distribution

ds_boost-0.1.0-py3-none-any.whl (19.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page