Skip to main content

A data analysis and visualization helper module.

Project description

Build Status

Data Utilities

Data utilities library focused on machine learning and data analysis.

The library relies upon python's scientific/numeric stack to expand their capabilities. The dependencies are:

  • numpy
  • scipy
  • pandas
  • matplotlib
  • seaborn
  • scikit-learn

Optional dependencies are:

  • XGBoost
  • deap

Highlights are:

  • matplotlib_utilities: out-of-the-shelf data description with histogram_of_dataframe.
  • pandas_utilities: easier dataframe preparation with rename_columns_to_lower, categorical_serie_to_binary_dataframe, balance_ndframe and get_numeric_columns.
  • sklearn_utilities: multiprocessing and persistence support for hyper parameter grid search, both exhaustive and using a genetic algorithmic approach; convenience functions to the XGBoost module.

And much more.

Organization and files

./data_utilities
├── __init__.py
├── matplotlib_utilities.py
├── pandas_utilities.py
├── python_utilities.py
├── sklearn_utilities
│   ├── evolutionary_grid_search.py
│   ├── grid_search.py
│   └── __init__.py
└── tests
    ├── __init__.py
    ├── test_matplotlib_utilities.py
    ├── test_pandas_utilities.py
    ├── test_python_utilities.py
    ├── test_sklearn_utilities.py
    └── test_support.py

Each of python's significant data modules has its own set of functions. Optional dependencies functions are interspersed throughout the code.

This module does not intend to create its own API or standards. Instead each of the utilities module should follow the guidelines and APIs provided by the parent module.

Note: This is a primitive project. Expect backwards incompatible changes as I figure out the best way to to develop the utilities.

What's new

Check our changelog.

Development guidelines

  • Coding style: PEP 8 compliant.
  • Docstrings: google docstrings.
  • Before commiting new versions do a test for different versions of python3:
    • python3.4
    • python3.5
    • python3.6
    • (newer versions)
    • Rationale: even though stability is expected between python versions some changes occur. See for instance that on commit v1.2.8 (60573d7) there was as unexpected import error on python34 but not on python36.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_utilities-1.2.10.tar.gz (58.5 kB view details)

Uploaded Source

File details

Details for the file data_utilities-1.2.10.tar.gz.

File metadata

  • Download URL: data_utilities-1.2.10.tar.gz
  • Upload date:
  • Size: 58.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.5.7

File hashes

Hashes for data_utilities-1.2.10.tar.gz
Algorithm Hash digest
SHA256 dac06617d43998e2a1cd14a7cfe3b2376680d175959bd7f0d1fa29e107fe549d
MD5 7ea9a7cdb7a1debfae9048b74c2bb081
BLAKE2b-256 6f209d197d8526fac3c264090764b891eacb1a25fd65e723c8fe221f1f72a4c4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page