description
Project description
:rocket: DataScience :facepunch:
 [mlcrate][https://github.com/mxbi/mlcrate]
datascienceipythonnotebooks
Index
 deeplearning
 scikitlearn
 statisticalinferencescipy
 pandas
 matplotlib
 numpy
 pythondata
 kaggleandbusinessanalyses
 spark
 mapreducepython
 amazon web services
 command lines
 misc
 notebookinstallation
 credits
 contributing
 contactinfo
 license
deeplearning
IPython Notebook(s) demonstrating deep learning functionality.
tensorflowtutorials
Additional TensorFlow tutorials:
 pkmital/tensorflow_tutorials
 nlintz/TensorFlowTutorials
 alrojo/tensorflowtutorial
 BinRoot/TensorFlowBook
Notebook  Description 

tsfbasics  Learn basic operations in TensorFlow, a library for various kinds of perceptual and language understanding tasks from Google. 
tsflinear  Implement linear regression in TensorFlow. 
tsflogistic  Implement logistic regression in TensorFlow. 
tsfnn  Implement nearest neighboars in TensorFlow. 
tsfalex  Implement AlexNet in TensorFlow. 
tsfcnn  Implement convolutional neural networks in TensorFlow. 
tsfmlp  Implement multilayer perceptrons in TensorFlow. 
tsfrnn  Implement recurrent neural networks in TensorFlow. 
tsfgpu  Learn about basic multiGPU computation in TensorFlow. 
tsfgviz  Learn about graph visualization in TensorFlow. 
tsflviz  Learn about loss visualization in TensorFlow. 
tensorflowexercises
Notebook  Description 

tsfnotmnist  Learn simple data curation by creating a pickle with formatted datasets for training, development and testing in TensorFlow. 
tsffullyconnected  Progressively train deeper and more accurate models using logistic regression and neural networks in TensorFlow. 
tsfregularization  Explore regularization techniques by training fully connected networks to classify notMNIST characters in TensorFlow. 
tsfconvolutions  Create convolutional neural networks in TensorFlow. 
tsfword2vec  Train a skipgram model over Text8 data in TensorFlow. 
tsflstm  Train a LSTM character model over Text8 data in TensorFlow. 
theanotutorials
Notebook  Description 

theanointro  Intro to Theano, which allows you to define, optimize, and evaluate mathematical expressions involving multidimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation. 
theanoscan  Learn scans, a mechanism to perform loops in a Theano graph. 
theanologistic  Implement logistic regression in Theano. 
theanornn  Implement recurrent neural networks in Theano. 
theanomlp  Implement multilayer perceptrons in Theano. 
kerastutorials
Notebook  Description 

keras  Keras is an open source neural network library written in Python. It is capable of running on top of either Tensorflow or Theano. 
setup  Learn about the tutorial goals and how to set up your Keras environment. 
introdeeplearningann  Get an intro to deep learning with Keras and Artificial Neural Networks (ANN). 
theano  Learn about Theano by working with weights matrices and gradients. 
kerasotto  Learn about Keras by looking at the Kaggle Otto challenge. 
annmnist  Review a simple implementation of ANN for MNIST using Keras. 
convnets  Learn about Convolutional Neural Networks (CNNs) with Keras. 
convnet1  Recognize handwritten digits from MNIST using Keras  Part 1. 
convnet2  Recognize handwritten digits from MNIST using Keras  Part 2. 
kerasmodels  Use pretrained models such as VGG16, VGG19, ResNet50, and Inception v3 with Keras. 
autoencoders  Learn about Autoencoders with Keras. 
rnnlstm  Learn about Recurrent Neural Networks (RNNs) with Keras. 
lstmsentencegen  Learn about RNNs using Long Short Term Memory (LSTM) networks with Keras. 
deeplearningmisc
Notebook  Description 

deepdream  Caffebased computer vision program which uses a convolutional neural network to find and enhance patterns in images. 
scikitlearn
IPython Notebook(s) demonstrating scikitlearn functionality.
Notebook  Description 

intro  Intro notebook to scikitlearn. Scikitlearn adds Python support for large, multidimensional arrays and matrices, along with a large library of highlevel mathematical functions to operate on these arrays. 
knn  Implement knearest neighbors in scikitlearn. 
linearreg  Implement linear regression in scikitlearn. 
svm  Implement support vector machine classifiers with and without kernels in scikitlearn. 
randomforest  Implement random forest classifiers and regressors in scikitlearn. 
kmeans  Implement kmeans clustering in scikitlearn. 
pca  Implement principal component analysis in scikitlearn. 
gmm  Implement Gaussian mixture models in scikitlearn. 
validation  Implement validation and model selection in scikitlearn. 
statisticalinferencescipy
IPython Notebook(s) demonstrating statistical inference with SciPy functionality.
Notebook  Description 

scipy  SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with highlevel commands and classes for manipulating and visualizing data. 
effectsize  Explore statistics that quantify effect size by analyzing the difference in height between men and women. Uses data from the Behavioral Risk Factor Surveillance System (BRFSS) to estimate the mean and standard deviation of height for adult women and men in the United States. 
sampling  Explore random sampling by analyzing the average weight of men and women in the United States using BRFSS data. 
hypothesis  Explore hypothesis testing by analyzing the difference of firstborn babies compared with others. 
pandas
IPython Notebook(s) demonstrating pandas functionality.
Notebook  Description 

pandas  Software library written for data manipulation and analysis in Python. Offers data structures and operations for manipulating numerical tables and time series. 
githubdatawrangling  Learn how to load, clean, merge, and feature engineer by analyzing GitHub data from the Viz repo. 
IntroductiontoPandas  Introduction to Pandas. 
IntroducingPandasObjects  Learn about Pandas objects. 
Data Indexing and Selection  Learn about data indexing and selection in Pandas. 
OperationsinPandas  Learn about operating on data in Pandas. 
MissingValues  Learn about handling missing data in Pandas. 
HierarchicalIndexing  Learn about hierarchical indexing in Pandas. 
ConcatAndAppend  Learn about combining datasets: concat and append in Pandas. 
MergeandJoin  Learn about combining datasets: merge and join in Pandas. 
AggregationandGrouping  Learn about aggregation and grouping in Pandas. 
PivotTables  Learn about pivot tables in Pandas. 
WorkingWithStrings  Learn about vectorized string operations in Pandas. 
WorkingwithTimeSeries  Learn about working with time series in pandas. 
PerformanceEvalandQuery  Learn about highperformance Pandas: eval() and query() in Pandas. 
matplotlib
IPython Notebook(s) demonstrating matplotlib functionality.
Notebook  Description 

matplotlib  Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. 
matplotlibapplied  Apply matplotlib visualizations to Kaggle competitions for exploratory data analysis. Learn how to create bar plots, histograms, subplot2grid, normalized plots, scatter plots, subplots, and kernel density estimation plots. 
IntroductionToMatplotlib  Introduction to Matplotlib. 
SimpleLinePlots  Learn about simple line plots in Matplotlib. 
SimpleScatterPlots  Learn about simple scatter plots in Matplotlib. 
Errorbars.ipynb  Learn about visualizing errors in Matplotlib. 
DensityandContourPlots  Learn about density and contour plots in Matplotlib. 
HistogramsandBinnings  Learn about histograms, binnings, and density in Matplotlib. 
CustomizingLegends  Learn about customizing plot legends in Matplotlib. 
CustomizingColorbars  Learn about customizing colorbars in Matplotlib. 
MultipleSubplots  Learn about multiple subplots in Matplotlib. 
TextandAnnotation  Learn about text and annotation in Matplotlib. 
CustomizingTicks  Learn about customizing ticks in Matplotlib. 
SettingsandStylesheets  Learn about customizing Matplotlib: configurations and stylesheets. 
ThreeDimensionalPlotting  Learn about threedimensional plotting in Matplotlib. 
GeographicDataWithBasemap  Learn about geographic data with basemap in Matplotlib. 
VisualizationWithSeaborn  Learn about visualization with Seaborn. 
numpy
IPython Notebook(s) demonstrating NumPy functionality.
Notebook  Description 

numpy  Adds Python support for large, multidimensional arrays and matrices, along with a large library of highlevel mathematical functions to operate on these arrays. 
IntroductiontoNumPy  Introduction to NumPy. 
UnderstandingDataTypes  Learn about data types in Python. 
TheBasicsOfNumPyArrays  Learn about the basics of NumPy arrays. 
Computationonarraysufuncs  Learn about computations on NumPy arrays: universal functions. 
Computationonarraysaggregates  Learn about aggregations: min, max, and everything in between in NumPy. 
Computationonarraysbroadcasting  Learn about computation on arrays: broadcasting in NumPy. 
BooleanArraysandMasks  Learn about comparisons, masks, and boolean logic in NumPy. 
FancyIndexing  Learn about fancy indexing in NumPy. 
Sorting  Learn about sorting arrays in NumPy. 
StructuredDataNumPy  Learn about structured data: NumPy's structured arrays. 
pythondata
IPython Notebook(s) demonstrating Python functionality geared towards data analysis.
Notebook  Description 

data structures  Learn Python basics with tuples, lists, dicts, sets. 
data structure utilities  Learn Python operations such as slice, range, xrange, bisect, sort, sorted, reversed, enumerate, zip, list comprehensions. 
functions  Learn about more advanced Python features: Functions as objects, lambda functions, closures, *args, **kwargs currying, generators, generator expressions, itertools. 
datetime  Learn how to work with Python dates and times: datetime, strftime, strptime, timedelta. 
logging  Learn about Python logging with RotatingFileHandler and TimedRotatingFileHandler. 
pdb  Learn how to debug in Python with the interactive source code debugger. 
unit tests  Learn how to test in Python with Nose unit tests. 
kaggleandbusinessanalyses
IPython Notebook(s) used in kaggle competitions and business analyses.
Notebook  Description 

titanic  Predict survival on the Titanic. Learn data cleaning, exploratory data analysis, and machine learning. 
churnanalysis  Predict customer churn. Exercise logistic regression, gradient boosting classifers, support vector machines, random forests, and knearestneighbors. Includes discussions of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination. 
spark
IPython Notebook(s) demonstrating spark and HDFS functionality.
Notebook  Description 

spark  Inmemory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. 
hdfs  Reliably stores very large files across machines in a large cluster. 
mapreducepython
IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality.
Notebook  Description 

mapreducepython  Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and mrjob config file to analyze Amazon S3 bucket logs on Elastic MapReduce. Disco is another pythonbased alternative. 
aws
IPython Notebook(s) demonstrating Amazon Web Services (AWS) and AWS tools functionality.
Also check out:
 SAWS: A Supercharged AWS command line interface (CLI).
 Awesome AWS: A curated list of libraries, open source repos, guides, blogs, and other resources.
Notebook  Description 

boto  Official AWS SDK for Python. 
s3cmd  Interacts with S3 through the command line. 
s3distcp  Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. 
s3parallelput  Uploads multiple files to S3 in parallel. 
redshift  Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP). 
kinesis  Streams data in real time with the ability to process thousands of data streams per second. 
lambda  Runs code in response to events, automatically managing compute resources. 
commands
IPython Notebook(s) demonstrating various command lines for Linux, Git, etc.
Notebook  Description 

linux  Unixlike and mostly POSIXcompliant computer operating system. Disk usage, splitting files, grep, sed, curl, viewing running processes, terminal syntax highlighting, and Vim. 
anaconda  Distribution of the Python programming language for largescale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. 
ipython notebook  Webbased interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document. 
git  Distributed revision control system with an emphasis on speed, data integrity, and support for distributed, nonlinear workflows. 
ruby  Used to interact with the AWS command line and for Jekyll, a blog framework that can be hosted on GitHub Pages. 
jekyll  Simple, blogaware, static site generator for personal, project, or organization sites. Renders Markdown or Textile and Liquid templates, and produces a complete, static website ready to be served by Apache HTTP Server, Nginx or another web server. 
pelican  Pythonbased alternative to Jekyll. 
django  Highlevel Python Web framework that encourages rapid development and clean, pragmatic design. It can be useful to share reports/analyses and for blogging. Lighterweight alternatives include Pyramid, Flask, Tornado, and Bottle. 
misc
IPython Notebook(s) demonstrating miscellaneous functionality.
Notebook  Description 

regex  Regular expression cheat sheet useful in data wrangling. 
algorithmia  Algorithmia is a marketplace for algorithms. This notebook showcases 4 different algorithms: Face Detection, Content Summarizer, Latent Dirichlet Allocation and Optical Character Recognition. 
notebookinstallation
anaconda
Anaconda is a free distribution of the Python programming language for largescale data processing, predictive analytics, and scientific computing that aims to simplify package management and deployment.
Follow instructions to install Anaconda or the more lightweight miniconda.
devsetup
For detailed instructions, scripts, and tools to set up your development environment for data analysis, check out the devsetup repo.
runningnotebooks
To view interactive content or to modify elements within the IPython notebooks, you must first clone or download the repository then run the notebook. More information on IPython Notebooks can be found here.
$ git clone https://github.com/donnemartin/datascienceipythonnotebooks.git
$ cd datascienceipythonnotebooks
$ jupyter notebook
Notebooks tested with Python 2.7.x.
credits
 Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
 PyCon 2015 Scikitlearn Tutorial by Jake VanderPlas
 Python Data Science Handbook by Jake VanderPlas
 Parallel Machine Learning with scikitlearn and IPython by Olivier Grisel
 Statistical Interference Using Computational Methods in Python by Allen Downey
 TensorFlow Examples by Aymeric Damien
 TensorFlow Tutorials by Parag K Mital
 TensorFlow Tutorials by Nathan Lintz
 TensorFlow Tutorials by Alexander R Johansen
 TensorFlow Book by Nishant Shukla
 Summer School 2015 by milaudem
 Keras tutorials by Valerio Maggio
 Kaggle
 Yhat Blog
contributing
Contributions are welcome! For bug reports or requests please submit an issue.
contactinfo
Feel free to contact me to discuss any issues, questions, or comments.
 Email: donne.martin@gmail.com
 Twitter: @donne_martin
 GitHub: donnemartin
 LinkedIn: donnemartin
 Website: donnemartin.com
license
This repository contains a variety of content; some developed by Donne Martin, and some from thirdparties. The thirdparty content is distributed under the license provided by those parties.
The content developed by Donne Martin is distributed under the following license:
I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).
Copyright 2015 Donne Martin
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Project details
Release history Release notifications
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size DeepTricks2019.9.10.23.5.27py3noneany.whl (16.7 kB)  File type Wheel  Python version py3  Upload date  Hashes View 
Filename, size DeepTricks2019.9.10.23.5.27.tar.gz (30.4 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for DeepTricks2019.9.10.23.5.27py3noneany.whl
Algorithm  Hash digest  

SHA256  6e8b49cafe37454d219500f1b0fca6af1ce7d08825f1b403099900586e18c6cb 

MD5  b44a344e33b2e956d464e74221213d90 

BLAKE2256  7affeb40dd2bd57fc908828dfce97aa499ba5c25e1696e8ab9539e2ca4d564b4 
Hashes for DeepTricks2019.9.10.23.5.27.tar.gz
Algorithm  Hash digest  

SHA256  9d007545d236771e3533c2ab96225e7f60720e044c416a80d956b550a08fdeec 

MD5  e6be6fd3fdf3388c38ee4b3b322b9f4a 

BLAKE2256  c701c8954f2444c8ead7dc0e3f6ac7fdf29150df07f5dfb30457c459077e8931 