Skip to main content

A library that executes SortingHat feature type inference on Pandas dataframes

Project description

SortingHatInf

SortingHatInf is a library that implements ML-based feature type inference as seen in the paper here. Feature type inference is the task of predicting the feature types of the columns of a given dataset.

Library for ML feature type inference: https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference.

Feature Types

SortingHat

  • Numeric
  • Categorical
  • Datetime
  • Sentence
  • URL
  • Embedded Number
  • List
  • Not-Generalizable
  • Context-Specific

Extended

Same as SortingHat except:

  • Numeric mapped to Integer or Floating
  • Categorical mapped to Boolean if Boolean

ARFF

  • Integer
  • Real (Float)
  • Nominal-specification (Categorical)
  • String
  • Ignore (Not-Generalizable)

Example Usage with OpenML

Here, we run feature type inference on a dataset obtained from OpenML. Note: this can be done with any dataset loaded as a Pandas dataframe, but we use OpenML here as an example.

  1. First ensure pip, wheel, and setuptools are up-to-date.
python -m pip install --upgrade pip setuptools wheel
  1. Install the package using python-pip.
pip install sortinghatinf
  1. Import the library.
import sortinghatinf
  1. Install the OpenML python API.
pip install openml
  1. Import the OpenML python library.
import openml
  1. Load the 'Blood Transfusion Service Center' dataset from OpenML (dataset_id=31). Note: This requires an OpenML account which you can setup by following this link.
data = openml.datasets.get_dataset(dataset_id=31)
X, _, _, _ = data.get_data() # Loaded as Pandas dataframe
  1. Infer the feature types for the data columns.
# Infer the SortingHat feature types.
infer_sh = sortinghatinf.get_sortinghat_types(X)

# Infer the extended feature types.
infer_ext = sortinghatinf.get_expanded_feature_types(X)

# Infer the ARFF feature types.
# The function `get_feature_types_as_arff()` also returns the SortingHat feature types.
infer_arff, infer_sh = sortinghatinf.get_feature_types_as_arff(X)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sortinghatinf-0.0.2.tar.gz (7.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sortinghatinf-0.0.2-py3-none-any.whl (7.7 MB view details)

Uploaded Python 3

File details

Details for the file sortinghatinf-0.0.2.tar.gz.

File metadata

  • Download URL: sortinghatinf-0.0.2.tar.gz
  • Upload date:
  • Size: 7.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9

File hashes

Hashes for sortinghatinf-0.0.2.tar.gz
Algorithm Hash digest
SHA256 116114334855bf4b4765878db1fae6b3c8040b4a7a54c6c4d0c3f201638cefff
MD5 866510c1d5025ca1ab9ddbd28de0960c
BLAKE2b-256 06a59d4426127fb517a4bb6bdc7712568bc38ea197e6a167c2b2225ff0646e88

See more details on using hashes here.

File details

Details for the file sortinghatinf-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: sortinghatinf-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9

File hashes

Hashes for sortinghatinf-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 15154ecb3b0d6c4240dfc39fa98762ed672df51af7febb509939ad306eea897c
MD5 b3d6be2b7b028c66e02dd7e49c4dcd3f
BLAKE2b-256 e427d4b4701e5152581a934156055a32fc3133d2768150aa706d1b7fb0a35893

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page