Skip to main content

Transform Dataframe for Machine Learning

Project description

Transform Dataframe for Machine Learning

version License

A lightweight and easy-to-use Python package that transforming dataframes into machine learning friendly data format.

Current dataframes, including Pandas and PySpark, are widely used to manipulate tabular data. These packages provide rich functionalities and optimizations for data processing. But after the processing, data is usually input to machine learning or deep learning models, which are constructed by other ML packages. In this step, the user needs to spend time in transforming dataframes into arrays or tensors, splitting data into several sets, mapping categorical data to integers, and even representing text data by vectors. To make the whole process more efficient, TDML bridges the dataframes and ML frameworks by addressing the aforementioned painful issues. Currently, TDML provides functions including:

  • Automatically transform dataframe (Pandas or PySpark) into ML framework (NumPy, PyTorch or TensorFlow) arrays or tensors.
  • Map categorical data to integers, represent text data by bag-of-word and support UDF on text transformation.
  • Split transformed data into several sets (train-test or train-validation-test) by one line of code.
  • Support reshuffling the train set after the splitting.

Examples

cd examples/numpy
python simple_sklearn_regression.py

For more examples, please go to the examples folder.

Tests

Please refer to the tests.

Contact

zecheng@cs.stanford.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdml-0.1.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

tdml-0.1.1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file tdml-0.1.1.tar.gz.

File metadata

  • Download URL: tdml-0.1.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for tdml-0.1.1.tar.gz
Algorithm Hash digest
SHA256 731936abf0f77a3d44c5ded5a135f147ef00f772eddfaa14cc2544e2ccc6873b
MD5 38fe0b3b85ac83b2ff262ff7d3a3da97
BLAKE2b-256 28f1d01ad7f981cdf04dd05c3de11672f24b61a0614669cb57a6cf1cefe79c51

See more details on using hashes here.

File details

Details for the file tdml-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tdml-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for tdml-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0914d00ac7b15c6ea6a5afbff757de66ca3fb93e5bd69bbe4556564214ea1bff
MD5 0b465daa90533b072fb6d6536215309b
BLAKE2b-256 ae345de8744f6971c31278d7dbf7c673721ef562beb71df56dffa0eb8577d652

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page