Skip to main content

An open-source package from LEGOAI for identifying data types

Project description

LegoAI Logo

Empowering Business Users With Self Serve Analytics

What is it ?

This is a cutting-edge project leveraging advanced Machine Learning technologies to accurately discern and classify data types from various values. Designed to enhance data preprocessing and analysis pipelines, this tool automates the often tedious and error-prone task of manually identifying data types.

Table of contents

Getting Started

To quickly start using the pipeline just install and follow notebook below.

Datatype Identification ( Inference )

Inference Notebook

[!IMPORTANT]
openai_api_key is required for running L2 model inference.

Main Features

L1 and L2 Datatype Categorization

L1 and L2 Model
  • Has two models, L1 model (uses Classifier) that identifies normal datatypes ( integer, float, alphanumeric, range_type, date & time, open_ended_text, close_ended_text)
  • L2 model further classifies L1 datatype result that are integer or float to measure,dimension or unknown (if not classified) (uses LLM) and date & time into one of 41 date-time formats like (YYYY-MM-DDTHH:MM:SS, YYYY/MM/DD, MM-DD-YYYY HH:MM AM/PM ) (uses RegEx).

Datatype Identification Inference Workflow

DI Inference Workflow

Where to get it?

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
> pip install legoai

Performance

[!NOTE] Source Ecommerce: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
Total Tables: 9 , Total Columns: 52
Source Healthcare: https://mitre.box.com/shared/static/aw9po06ypfb9hrau4jamtvtz0e5ziucz.zip
Total Tables: 18, Total Columns: 249

Classification Report ( L1 Model )

L1 Model Classification Metrics

Classification Report ( L2 Model )

L2 Model Classification Metrics

Execution Chart ( Google Collab Environment )

DI Execution Chart

License

The project is released under the MIT License.

Contributing

Any contributions to this project is welcomed, you can follow the steps below for contribution:

  1. Fork the repository.
  2. Create a new branch feature/* (git checkout -b feature)
  3. Make your changes.
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature)
  6. Create a new Pull Request.

Project details


Release history Release notifications | RSS feed

This version

0.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

legoai-0.3-py3-none-any.whl (7.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page