Transform Dataframe for Machine Learning
Project description
Transform Dataframe for Machine Learning
A lightweight and easy-to-use Python package that transforming dataframes into machine learning friendly data format.
Current dataframes, including Pandas and PySpark, are widely used to manipulate tabular data. These packages provide rich functionalities and optimizations for data processing. But after the processing, data is usually input to machine learning or deep learning models, which are constructed by other ML packages. In this step, the user needs to spend time in transforming dataframes into arrays or tensors, splitting data into several sets, mapping categorical data to integers, and even representing text data by vectors. To make the whole process more efficient, TDML bridges the dataframes and ML frameworks by addressing the aforementioned painful issues. Currently, TDML provides functions including:
- Automatically transform dataframe (Pandas or PySpark) into ML framework (NumPy, PyTorch or TensorFlow) arrays or tensors.
- Map categorical data to integers, represent text data by bag-of-word and support UDF on text transformation.
- Split transformed data into several sets (train-test or train-validation-test) by one line of code.
- Support reshuffling the train set after the splitting.
Examples
cd examples/numpy
python simple_sklearn_regression.py
For more examples, please go to the examples folder.
Tests
Please refer to the tests.
Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tdml-0.1.1.tar.gz
.
File metadata
- Download URL: tdml-0.1.1.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 731936abf0f77a3d44c5ded5a135f147ef00f772eddfaa14cc2544e2ccc6873b |
|
MD5 | 38fe0b3b85ac83b2ff262ff7d3a3da97 |
|
BLAKE2b-256 | 28f1d01ad7f981cdf04dd05c3de11672f24b61a0614669cb57a6cf1cefe79c51 |
File details
Details for the file tdml-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: tdml-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0914d00ac7b15c6ea6a5afbff757de66ca3fb93e5bd69bbe4556564214ea1bff |
|
MD5 | 0b465daa90533b072fb6d6536215309b |
|
BLAKE2b-256 | ae345de8744f6971c31278d7dbf7c673721ef562beb71df56dffa0eb8577d652 |