A GenAI-powered Python library for building semantic layers.
Project description
The GenAI-powered toolkit for automated data intelligence.
Automated Data Profiling, Link Prediction, and Semantic Layer Generation
Overview
Intugle provides a set of GenAI-powered Python tools to simplify and accelerate the journey from raw data to insights. This library empowers data and business teams to build an intelligent semantic layer over their data, enabling self-serve analytics and natural language queries. By automating data profiling, link prediction, and SQL generation, Intugle helps you build data products faster and more efficiently than traditional methods.
Who is this for?
This tool is designed for both data teams and business teams.
- Data teams can use it to automate data profiling, schema discovery, and documentation, significantly accelerating their workflow.
- Business teams can use it to gain a better understanding of their data and to perform self-service analytics without needing to write complex SQL queries.
Features
- Automated Data Profiling: Generate detailed statistics for each column in your dataset, including distinct count, uniqueness, completeness, and more.
- Datatype Identification: Automatically identify the data type of each column (e.g., integer, string, datetime).
- Key Identification: Identify potential primary keys in your tables.
- LLM-Powered Link Prediction: Use GenAI to automatically discover relationships (foreign keys) between tables.
- Business Glossary Generation: Generate a business glossary for each column, with support for industry-specific domains.
- Semantic Layer Generation: Create YAML files that defines your semantic layer, including models (tables) and their relationships.
- SQL Generation: Generate SQL queries from the semantic layer, allowing you to query your data using business-friendly terms.
Getting Started
Installation
pip install intugle
Configuration
Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.
You can configure the LLM by setting the following environment variables:
LLM_PROVIDER: The LLM provider and model to use (e.g.,openai:gpt-3.5-turbo).OPENAI_API_KEY: Your API key for the LLM provider.
Here's an example of how to set these variables in your environment:
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
Quickstart
For a detailed, hands-on introduction to the project, please see the quickstart.ipynb notebook. It will walk you through the entire process of profiling your data, predicting links, generating a semantic layer, and querying your data.
Usage
The core workflow of the project involves the following steps:
-
Load your data: Load your data into a DataSet object.
-
Run the analysis pipeline: Use the
run()method to profile your data and generate a business glossary. -
Predict links: Use the
LinkPredictorto discover relationships between your tables.from intugle import LinkPredictor # Initialize the predictor predictor = LinkPredictor(datasets) # Run the prediction results = predictor.predict() results.show_graph()
-
Generate SQL: Use the
SqlGeneratorto generate SQL queries from the semantic layer.from intugle import SqlGenerator # Create a SqlGenerator sql_generator = SqlGenerator() # Create an ETL model etl = { name": "test_etl", fields": [ {"id": "patients.first", "name": "first_name"}, {"id": "patients.last", "name": "last_name"}, {"id": "allergies.start", "name": "start_date"}, , filter": { "selections": [{"id": "claims.departmentid", "values": ["3", "20"]}], , } # Generate the query sql_query = sql_generator.generate_query(etl_model) print(sql_query)
For detailed code examples and a complete walkthrough, please refer to the quickstart.ipynb notebook.
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md file for guidelines.
License
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intugle-0.1.0.tar.gz.
File metadata
- Download URL: intugle-0.1.0.tar.gz
- Upload date:
- Size: 87.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1605b1b99fbfadaa4dc151b81bd89c03b16ad6ebf758c693f2042b2d410a3134
|
|
| MD5 |
b5bc6ac32c0ef9e58ac8c48d15feb1e4
|
|
| BLAKE2b-256 |
aba5c56d97e30bb601628c5ac884675ceae6a8bd37a393e61d5a4e3215e2b25d
|
File details
Details for the file intugle-0.1.0-py3-none-any.whl.
File metadata
- Download URL: intugle-0.1.0-py3-none-any.whl
- Upload date:
- Size: 111.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b20077ef5252250efb98c7e665f3d68074bcd60d5eb57eb72a255268da702478
|
|
| MD5 |
c678ff38d118463ba008db2d0df5120a
|
|
| BLAKE2b-256 |
52e154db190da5ffad6cddc483f2decc000dca1ebd169d7ddc93ab5bb85a964f
|