Data Science utilities in python.
Project description
======================
Data Science Utilities
======================
.. image:: https://img.shields.io/pypi/v/data_science_utilities.svg
:target: https://pypi.python.org/pypi/data_science_utilities
.. image:: https://img.shields.io/travis/truocphamkhac/data_science_utilities.svg
:target: https://travis-ci.org/truocphamkhac/data_science_utilities
.. image:: https://readthedocs.org/projects/data-science-utilities/badge/?version=latest
:target: http://data-science-utilities-python.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Data Science utilities in python.
* Free software: MIT license
* Documentation: http://data-science-utilities-python.readthedocs.io.
Features
--------
* Missing Data Statistic
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ # make statistic
$ missing_data = data_science_utilities.missing_data_stats(df)
$ # display statistic
$ missing_data
* Read CSV files from path
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ train_path = '../data/raw/train.csv'
$ test_path = '../data/raw/test.csv'
$
$ X_train, X_test = data_science_utilities.read_csv_files(train_path, test_path)
* Plotting distribution normal
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_dist_norm(dist, 'distribution normal')
* Plotting correlation matrix
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_corelation_matrix(data)
* Plotting top attributes correlation matrix
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_top_corelation_matrix(data, target, k=10, cmap='YlGnBu')
* Plotting attributes by scatter chart
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_scatter(data, column_name, target)
* Plotting attributes by box bar
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_box(data, column_name, target)
* Plotting category by box bar
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_category_columns(data, limit_bars=10)
* Generate a simple plot of the test and traning learning curve
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_learning_curve(estimator, title, X, y, ylim=None,
$ cv=None, train_sizes=np.linspace(.1, 1.0, 5))
Credits
-------
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
0.2.2 (2018-05-14)
------------------
* Adds utils about visualization.
0.1.0 (2018-05-11)
------------------
* First release on PyPI.
Data Science Utilities
======================
.. image:: https://img.shields.io/pypi/v/data_science_utilities.svg
:target: https://pypi.python.org/pypi/data_science_utilities
.. image:: https://img.shields.io/travis/truocphamkhac/data_science_utilities.svg
:target: https://travis-ci.org/truocphamkhac/data_science_utilities
.. image:: https://readthedocs.org/projects/data-science-utilities/badge/?version=latest
:target: http://data-science-utilities-python.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Data Science utilities in python.
* Free software: MIT license
* Documentation: http://data-science-utilities-python.readthedocs.io.
Features
--------
* Missing Data Statistic
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ # make statistic
$ missing_data = data_science_utilities.missing_data_stats(df)
$ # display statistic
$ missing_data
* Read CSV files from path
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ train_path = '../data/raw/train.csv'
$ test_path = '../data/raw/test.csv'
$
$ X_train, X_test = data_science_utilities.read_csv_files(train_path, test_path)
* Plotting distribution normal
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_dist_norm(dist, 'distribution normal')
* Plotting correlation matrix
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_corelation_matrix(data)
* Plotting top attributes correlation matrix
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_top_corelation_matrix(data, target, k=10, cmap='YlGnBu')
* Plotting attributes by scatter chart
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_scatter(data, column_name, target)
* Plotting attributes by box bar
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_box(data, column_name, target)
* Plotting category by box bar
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_category_columns(data, limit_bars=10)
* Generate a simple plot of the test and traning learning curve
.. code-block:: console
$ from data_science_utilities import data_science_utilities
$
$ data_science_utilities.plot_learning_curve(estimator, title, X, y, ylim=None,
$ cv=None, train_sizes=np.linspace(.1, 1.0, 5))
Credits
-------
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
0.2.2 (2018-05-14)
------------------
* Adds utils about visualization.
0.1.0 (2018-05-11)
------------------
* First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for data_science_utilities-0.2.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e0588484e3a2fc26e30b490299a6540328bb0a5c8d0ffca57cbc863bf365d97 |
|
MD5 | 0620520ea220f201491c6dd4af7e0dad |
|
BLAKE2b-256 | 1becd7e49a36f9dc8860ef41f78a34709cdc5ea2bb7ad27259e09b91b0061305 |
Close
Hashes for data_science_utilities-0.2.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 452f57c7f57013307849dece896dc06930051a7c2781b75eeb19e9d5516efc91 |
|
MD5 | 73ed10b4f6f47d2bb75ff9e10081a024 |
|
BLAKE2b-256 | 42a33c43015ce7f7868ae88b6404ef45e79503b107e2d75fc61596c53afdbf35 |