A library that executes SortingHat feature type inference on Pandas dataframes
Project description
SortingHatInf
SortingHatInf is a library that implements ML-based feature type inference as seen in the paper here. Feature type inference is the task of predicting the feature types of the columns of a given dataset.
Library for ML feature type inference: https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference.
Feature Types
SortingHat
NumericCategoricalDatetimeSentenceURLEmbedded NumberListNot-GeneralizableContext-Specific
Extended
Same as SortingHat except:
Numericmapped toIntegerorFloatingCategoricalmapped toBooleanif Boolean
ARFF
IntegerReal(Float)Nominal-specification(Categorical)StringIgnore(Not-Generalizable)
Example Usage with OpenML
Here, we run feature type inference on a dataset obtained from OpenML. Note: this can be done with any dataset loaded as a Pandas dataframe, but we use OpenML here as an example.
- First ensure
pip,wheel, andsetuptoolsare up-to-date.
python -m pip install --upgrade pip setuptools wheel
- Install the package using python-pip.
pip install sortinghatinf
- Import the library.
import sortinghatinf
- Install the OpenML python API.
pip install openml
- Import the OpenML python library.
import openml
- Load the 'Blood Transfusion Service Center' dataset from OpenML (dataset_id=31). Note: This requires an OpenML account which you can setup by following this link.
data = openml.datasets.get_dataset(dataset_id=31)
X, _, _, _ = data.get_data() # Loaded as Pandas dataframe
- Infer the feature types for the data columns.
# Infer the SortingHat feature types.
infer_sh = sortinghatinf.get_sortinghat_types(X)
# Infer the extended feature types.
infer_ext = sortinghatinf.get_expanded_feature_types(X)
# Infer the ARFF feature types.
# The function `get_feature_types_as_arff()` also returns the SortingHat feature types.
infer_arff, infer_sh = sortinghatinf.get_feature_types_as_arff(X)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sortinghatinf-0.0.2.tar.gz.
File metadata
- Download URL: sortinghatinf-0.0.2.tar.gz
- Upload date:
- Size: 7.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
116114334855bf4b4765878db1fae6b3c8040b4a7a54c6c4d0c3f201638cefff
|
|
| MD5 |
866510c1d5025ca1ab9ddbd28de0960c
|
|
| BLAKE2b-256 |
06a59d4426127fb517a4bb6bdc7712568bc38ea197e6a167c2b2225ff0646e88
|
File details
Details for the file sortinghatinf-0.0.2-py3-none-any.whl.
File metadata
- Download URL: sortinghatinf-0.0.2-py3-none-any.whl
- Upload date:
- Size: 7.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15154ecb3b0d6c4240dfc39fa98762ed672df51af7febb509939ad306eea897c
|
|
| MD5 |
b3d6be2b7b028c66e02dd7e49c4dcd3f
|
|
| BLAKE2b-256 |
e427d4b4701e5152581a934156055a32fc3133d2768150aa706d1b7fb0a35893
|