Package for Practical & efficient Data Science in Python. Initially written for data-science-keras repo
Project description
Data science projects with Keras (Poetry Version)
Author: Angel Martinez-Tenor
Repository: Github link
This repo contains a set of data science projects solved with artificial neural networks implemented in Keras. It is based on a set of use cases from Udacity, Coursera & Kaggle
The repo also introduces a minimal package ds_boost initally implemented as a helper for this repo
Disclaimer: This notebooks-based repo was developed in early 2018. Since July 2022, I'm updating it using the best practices I've learned implementing solutions in production environment my experience as a lead data scientist
A non-poetry version of this repo is available in the branch no-poetry
Scenarios
Classification models
-
Enron Scandal Identifies Enron employees who may have committed fraud
-
Property Maintenance Fines Predicts the probability of a set of blight tickets to be paid on time
-
Sentiment IMDB Predicts positive or negative sentiments from movie reviews (NLP)
-
Spam detector Predicts the probability that a given email is a spam email (NLP)
-
Student Admissions Predicts student admissions to graduate school at UCLA
-
Titanic Predicts survival probabilities from the sinking of the RMS Titanic
Regression models
-
Bike Rental Predicts daily bike rental ridership
-
House Prices Predicts house sales prices from Ames Housing database
-
Simple tickets Predicts the number of tickets requested by different clients
Recurrent models
-
Machine Translation Translates sentences from English to French (NLP)
-
Simple Stock Prediction Predicts Alphabet Inc. stock price
-
Text generator Creates an English language sequence generator (NLP)
Social network models
- Network Predicts missing salaries and new email connections from a company's email network
Setup & Usage
Python 3.10+ required
-
Clone the repository using
git
:git clone https://github.com/angelmtenor/data-science-keras.git
-
Enter to the root path of the repo and use or create a new conda environment for development:
$ conda create -n dev python=3.10 -y && conda activate dev
-
Install the minimal package developed as a helper for this repo:
pip install dist/ds_boost-0.1.0-py3-none-any.whl
-
Open the desired project/s with Jupyter Notebook
cd data-science-keras jupyter notebook
Development Mode
In the root folder of the cloned repository, install all the required dev packages and the ds-boost mini package (Make required):
make setup
To install tensorflow with GPU support, follow the instructions of this guide: Install TensorFlow GPU.
QA (manual pre-commit):
make qa
Development Tools Required:
A Container/Machine with Conda, Git and Poetry as closely as defined in .devcontainer/Dockerfile
:
- This Dockerfile contains a non-root user so the same configuration can be applied to a WSL Ubuntu Machine and any Debian/Ubuntu CLoud Machine (Vertex AI workbench, Azure VM ...)
- In case of having an Ubuntu/Debian machine with non-root user (e.g.: Ubuntu in WSL, Vertex AI VM ...), just install the tools from non-root user (no sudo)* section of
.devcontainer/Dockerfile
(sudo apt-get install <software> may be required) - A pre-configured Cloud VM usually has Git and Conda pre-installed, those steps can be skipped
- The development container defined in
.devcontainer/Dockerfile
can be directly used for a fast setup (Docker required). With Visual Studio Code, just open the root folder of this repo, pressF1
and select the option Dev Containers: Open Workspace in Container. The container will open the same workspace after the Docker Image is built.
Contributing
Check out the contributing guidelines
License
ds_boost
was created by Angel Martinez-Tenor. It is licensed under the terms of the MIT license.
Credits
ds_boost
was created from a Data Science Template developed by Angel Martinez-Tenor. The template was built upon py-pkgs-cookiecutter
[template] (https://github.com/py-pkgs/py-pkgs-cookiecutter)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ds_boost-0.1.1.tar.gz
.
File metadata
- Download URL: ds_boost-0.1.1.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.4 Linux/5.15.0-57-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36c8c9edc7121416871ce38e4a62b84cd28eb08c9a08c41704041b0193cc99e4 |
|
MD5 | 7c61d69d0fdba9d5f9b1d3f7c8c4175a |
|
BLAKE2b-256 | 5bf58daa47f3abbc89f6fb124b8a52504d69ae0fcf122092a50e8d6737a13962 |
File details
Details for the file ds_boost-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: ds_boost-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.4 Linux/5.15.0-57-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf65e99a8112a6ccbb0824dd6b049927c524d47160f583df3e52aa77bf7efefb |
|
MD5 | c38b9f9e2d2f7136d3a0b7d037cd46b6 |
|
BLAKE2b-256 | 2b53e1a287075e2cdc9ab5dd83a7d19212c5eb599e9b2747484b9a9f2035efc4 |