Skip to main content

Transform Dataframe for Machine Learning

Project description

Transform Dataframe for Machine Learning

version License

A lightweight and easy-to-use Python package that transforming dataframes into machine learning friendly data format.

Current dataframes, including Pandas and PySpark, are widely used to manipulate tabular data. These packages provide rich functionalities and optimizations for data processing. But after the processing, data is usually input to machine learning or deep learning models, which are constructed by other ML packages. In this step, the user needs to spend time in transforming dataframes into arrays or tensors, splitting data into several sets, mapping categorical data to integers, and even representing text data by vectors. To make the whole process more efficient, TDML bridges the dataframes and ML frameworks by addressing the aforementioned painful issues. Currently, TDML provides functions including:

  • Automatically transform dataframe (Pandas or PySpark) into ML framework (NumPy, PyTorch or TensorFlow) arrays or tensors.
  • Map categorical data to integers, represent text data by bag-of-word and support UDF on text transformation.
  • Split transformed data into several sets (train-test or train-validation-test) by one line of code.
  • Support reshuffling the train set after the splitting.

Examples

cd examples/numpy
python simple_sklearn_regression.py

For more examples, please go to the examples folder.

Tests

Please refer to the tests.

Contact

zecheng@cs.stanford.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdml-0.1.1.tar.gz (8.9 kB view hashes)

Uploaded Source

Built Distribution

tdml-0.1.1-py3-none-any.whl (12.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page