An open-source package from LEGOAI for identifying data types
Project description
Empowering Business Users With Self Serve Analytics
What is it ?
This is a cutting-edge project leveraging advanced Machine Learning technologies to accurately discern and classify data types from various values. Designed to enhance data preprocessing and analysis pipelines, this tool automates the often tedious and error-prone task of manually identifying data types.
Table of contents
Getting Started
To quickly start using the pipeline just install and follow notebook below.
Datatype Identification ( Inference )
[!IMPORTANT]
openai_api_key is required for running L2 model inference.
Main Features
L1 and L2 Datatype Categorization
- Has two models, L1 model (uses Classifier) that identifies normal datatypes ( integer, float, alphanumeric, range_type, date & time, open_ended_text, close_ended_text)
- L2 model further classifies L1 datatype result that are integer or float to measure,dimension or unknown (if not classified) (uses LLM) and date & time into one of 41 date-time formats like (YYYY-MM-DDTHH:MM:SS, YYYY/MM/DD, MM-DD-YYYY HH:MM AM/PM ) (uses RegEx).
Datatype Identification Inference Workflow
Where to get it?
Binary installers for the latest released version are available at the Python Package Index (PyPI)
# PyPI
> pip install legoai
Performance
[!NOTE] Source Ecommerce: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
Total Tables: 9
,Total Columns: 52
Source Healthcare: https://mitre.box.com/shared/static/aw9po06ypfb9hrau4jamtvtz0e5ziucz.zip
Total Tables: 18
,Total Columns: 249
Classification Report ( L1 Model )
Classification Report ( L2 Model )
Execution Chart ( Google Collab Environment )
License
The project is released under the MIT License.
Contributing
Any contributions to this project is welcomed, you can follow the steps below for contribution:
- Fork the repository.
- Create a new branch feature/* (git checkout -b feature)
- Make your changes.
- Commit your changes (git commit -am 'Add new feature')
- Push to the branch (git push origin feature)
- Create a new Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file legoai-0.3-py3-none-any.whl
.
File metadata
- Download URL: legoai-0.3-py3-none-any.whl
- Upload date:
- Size: 7.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0d08cc5493838c89a65f65d4b43bcfa5f86103c7a33aa3715b71bdcc0d386d7 |
|
MD5 | 0b580f14d534fd1b23b537d6aca33a52 |
|
BLAKE2b-256 | 23e26a38cc4bc2de21a512eb10e200b49b9b32a189466a15da6934d377aa77b2 |