Probabilistic type inference
Project description
1 Introduction
Type inference refers to the task of inferring the data type (e.g., Boolean, date, integer and string) of a given column of data, which becomes challenging in the presence of missing data and anomalies.
ptype is a probabilistic type inference model for tabular data, which aims to robustly infer the data type for each column in a table of data. By taking into account missing data and anomalies, ptype improves over the existing type inference methods. This repository provides an implementation of ptype in Python.
If you use this package, please cite ptype with the following BibTeX entry:
@article{ceritli2020ptype, title={ptype: probabilistic type inference}, author={Ceritli, Taha and Williams, Christopher KI and Geddes, James}, journal={Data Mining and Knowledge Discovery}, year={2020}, volume = {34}, number = {3}, pages={870–-904}, doi = {10.1007/s10618-020-00680-1}, }
2 Install requirements
pip install -r requirements.txt
3 Usage
See demo notebooks in notebooks folder. View them online via Binder.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.