Data Science utilities in python.
Project description
======================
Data Science Utilities
======================
.. image:: https://img.shields.io/pypi/v/data_science_utilities.svg
:target: https://pypi.python.org/pypi/data_science_utilities
.. image:: https://img.shields.io/travis/truocphamkhac/data-science-utilities.svg
:target: https://travis-ci.org/truocphamkhac/data-science-utilities
.. image:: https://readthedocs.org/projects/data-science-utilities/badge/?version=latest
:target: http://data-science-utilities-python.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Data Science utilities in python.
* Free software: MIT license
* Documentation: http://data-science-utilities-python.readthedocs.io.
Features
========
Missing Data Statistic
----------------------
.. code:: python
from data_science_utilities import data_science_utilities
# make statistic
missing_data = data_science_utilities.missing_data_stats(df)
# display statistic
missing_data
Read CSV files from path
------------------------
.. code:: python
from data_science_utilities import data_science_utilities
train_path = '../data/raw/train.csv'
test_path = '../data/raw/test.csv'
X_train, X_test = data_science_utilities.read_csv_files(train_path, test_path)
Plotting distribution normal
----------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_dist_norm(dist, 'distribution normal')
Plotting correlation matrix
---------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_corelation_matrix(data)
Plotting top attributes correlation matrix
------------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_top_corelation_matrix(data, target, k=10, cmap='YlGnBu')
Plotting attributes by scatter chart
------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_scatter(data, column_name, target)
Plotting attributes by box bar
------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_box(data, column_name, target)
Plotting category by box bar
----------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_category_columns(data, limit_bars=10)
Generate a simple plot of the test and traning learning curve
-------------------------------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_learning_curve(estimator, title, X, y, ylim=None,
cv=None, train_sizes=np.linspace(.1, 1.0, 5))
Credits
=======
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
0.2.4 (2018-05-21)
------------------
* Fixed render docs on README.
0.2.3 (2018-05-21)
------------------
* Fixed render docs on https://pypi.org/.
0.2.2 (2018-05-21)
------------------
* Fix render docs con't.
0.2.1 (2018-05-21)
------------------
* Fix render docs.
0.2.0 (2018-05-14)
------------------
* Adds utils about visualization.
0.1.0 (2018-05-11)
------------------
* First release on PyPI.
Data Science Utilities
======================
.. image:: https://img.shields.io/pypi/v/data_science_utilities.svg
:target: https://pypi.python.org/pypi/data_science_utilities
.. image:: https://img.shields.io/travis/truocphamkhac/data-science-utilities.svg
:target: https://travis-ci.org/truocphamkhac/data-science-utilities
.. image:: https://readthedocs.org/projects/data-science-utilities/badge/?version=latest
:target: http://data-science-utilities-python.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Data Science utilities in python.
* Free software: MIT license
* Documentation: http://data-science-utilities-python.readthedocs.io.
Features
========
Missing Data Statistic
----------------------
.. code:: python
from data_science_utilities import data_science_utilities
# make statistic
missing_data = data_science_utilities.missing_data_stats(df)
# display statistic
missing_data
Read CSV files from path
------------------------
.. code:: python
from data_science_utilities import data_science_utilities
train_path = '../data/raw/train.csv'
test_path = '../data/raw/test.csv'
X_train, X_test = data_science_utilities.read_csv_files(train_path, test_path)
Plotting distribution normal
----------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_dist_norm(dist, 'distribution normal')
Plotting correlation matrix
---------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_corelation_matrix(data)
Plotting top attributes correlation matrix
------------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_top_corelation_matrix(data, target, k=10, cmap='YlGnBu')
Plotting attributes by scatter chart
------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_scatter(data, column_name, target)
Plotting attributes by box bar
------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_box(data, column_name, target)
Plotting category by box bar
----------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_category_columns(data, limit_bars=10)
Generate a simple plot of the test and traning learning curve
-------------------------------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_learning_curve(estimator, title, X, y, ylim=None,
cv=None, train_sizes=np.linspace(.1, 1.0, 5))
Credits
=======
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
0.2.4 (2018-05-21)
------------------
* Fixed render docs on README.
0.2.3 (2018-05-21)
------------------
* Fixed render docs on https://pypi.org/.
0.2.2 (2018-05-21)
------------------
* Fix render docs con't.
0.2.1 (2018-05-21)
------------------
* Fix render docs.
0.2.0 (2018-05-14)
------------------
* Adds utils about visualization.
0.1.0 (2018-05-11)
------------------
* First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for data_science_utilities-0.2.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 618ac654ee265d8a2a65a85a151cc2c25d10c4290b7a4b81dcab442fa64aace6 |
|
MD5 | 9eba17395e0c25544ff98c340ee911ce |
|
BLAKE2b-256 | 3d5db9abfb634400464961d7504914a47794de39ab805d078f8ee3d1169ba895 |
Close
Hashes for data_science_utilities-0.2.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eab4b129b0357e7d32ff16865e7ca60ffc6a889ce338fc5520ac2f2fc1a3ad81 |
|
MD5 | d4bd1765d1862f9dbbaf9e3ea14aeed7 |
|
BLAKE2b-256 | e463aa3961bb6a87328309ec4cc7f593528e578571a1e2b9e417559459243e27 |