Skip to main content

An open-source package from LEGOAI for identifying data types

Project description

LegoAI Logo

Empowering Business Users With Self Serve Analytics

What is it ?

This is a cutting-edge project leveraging advanced Machine Learning technologies to accurately discern and classify data types from various values. Designed to enhance data preprocessing and analysis pipelines, this tool automates the often tedious and error-prone task of manually identifying data types.

Table of contents

Getting Started

To quickly start using the pipeline just install and follow notebook below.

Datatype Identification ( Inference )

Inference Notebook

[!IMPORTANT]
openai_api_key is required for running L2 model inference.

Main Features

L1 and L2 Datatype Categorization

L1 and L2 Model
  • Has two models, L1 model (uses Classifier) that identifies normal datatypes ( integer, float, alphanumeric, range_type, date & time, open_ended_text, close_ended_text)
  • L2 model further classifies L1 datatype result that are integer or float to measure,dimension or unknown (if not classified) (uses LLM) and date & time into one of 41 date-time formats like (YYYY-MM-DDTHH:MM:SS, YYYY/MM/DD, MM-DD-YYYY HH:MM AM/PM ) (uses RegEx).

Datatype Identification Inference Workflow

DI Inference Workflow

Where to get it?

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
> pip install legoai

Performance

[!NOTE] Source Ecommerce: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
Total Tables: 9 , Total Columns: 52
Source Healthcare: https://mitre.box.com/shared/static/aw9po06ypfb9hrau4jamtvtz0e5ziucz.zip
Total Tables: 18, Total Columns: 249

Classification Report ( L1 Model )

L1 Model Classification Metrics

Classification Report ( L2 Model )

L2 Model Classification Metrics

Execution Chart ( Google Collab Environment )

DI Execution Chart

License

The project is released under the MIT License.

Contributing

Any contributions to this project is welcomed, you can follow the steps below for contribution:

  1. Fork the repository.
  2. Create a new branch feature/* (git checkout -b feature)
  3. Make your changes.
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature)
  6. Create a new Pull Request.

Project details


Release history Release notifications | RSS feed

This version

0.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

legoai-0.3-py3-none-any.whl (7.9 MB view details)

Uploaded Python 3

File details

Details for the file legoai-0.3-py3-none-any.whl.

File metadata

  • Download URL: legoai-0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for legoai-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f0d08cc5493838c89a65f65d4b43bcfa5f86103c7a33aa3715b71bdcc0d386d7
MD5 0b580f14d534fd1b23b537d6aca33a52
BLAKE2b-256 23e26a38cc4bc2de21a512eb10e200b49b9b32a189466a15da6934d377aa77b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page