Skip to main content

A GenAI-powered Python library for building semantic layers.

Project description

Intugle Logo

The GenAI-powered toolkit for automated data intelligence.

Release
Made with Python contributions - welcome License: Apache 2.0 Open Issues GitHub star chart

Automated Data Profiling, Link Prediction, and Semantic Layer Generation

Overview

Intugle provides a set of GenAI-powered Python tools to simplify and accelerate the journey from raw data to insights. This library empowers data and business teams to build an intelligent semantic layer over their data, enabling self-serve analytics and natural language queries. By automating data profiling, link prediction, and SQL generation, Intugle helps you build data products faster and more efficiently than traditional methods.

Who is this for?

This tool is designed for both data teams and business teams.

  • Data teams can use it to automate data profiling, schema discovery, and documentation, significantly accelerating their workflow.
  • Business teams can use it to gain a better understanding of their data and to perform self-service analytics without needing to write complex SQL queries.

Features

  • Automated Data Profiling: Generate detailed statistics for each column in your dataset, including distinct count, uniqueness, completeness, and more.
  • Datatype Identification: Automatically identify the data type of each column (e.g., integer, string, datetime).
  • Key Identification: Identify potential primary keys in your tables.
  • LLM-Powered Link Prediction: Use GenAI to automatically discover relationships (foreign keys) between tables.
  • Business Glossary Generation: Generate a business glossary for each column, with support for industry-specific domains.
  • Semantic Layer Generation: Create YAML files that defines your semantic layer, including models (tables) and their relationships.
  • SQL Generation: Generate SQL queries from the semantic layer, allowing you to query your data using business-friendly terms.

Getting Started

Installation

pip install intugle

Configuration

Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.

You can configure the LLM by setting the following environment variables:

  • LLM_PROVIDER: The LLM provider and model to use (e.g., openai:gpt-3.5-turbo) following LangChain's conventions
  • OPENAI_API_KEY: Your API key for the LLM provider.

Here's an example of how to set these variables in your environment:

export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"

Quickstart

For a detailed, hands-on introduction to the project, please see the quickstart.ipynb notebook. It will walk you through the entire process of profiling your data, predicting links, generating a semantic layer, and querying your data.

Usage

The core workflow of the project involves the following steps:

  1. Load your data: Load your data into a DataSet object.

  2. Run the analysis pipeline: Use the run() method to profile your data and generate a business glossary.

  3. Predict links: Use the LinkPredictor to discover relationships between your tables.

    from intugle import LinkPredictor
    
    # Initialize the predictor
    predictor = LinkPredictor(datasets)
    
    # Run the prediction
    results = predictor.predict()
    results.show_graph()
    
  4. Generate SQL: Use the SqlGenerator to generate SQL queries from the semantic layer.

    from intugle import SqlGenerator
    
    # Create a SqlGenerator
    sql_generator = SqlGenerator()
    
    # Create an ETL model
    etl = {
        name": "test_etl",
        fields": [
           {"id": "patients.first", "name": "first_name"},
           {"id": "patients.last", "name": "last_name"},
           {"id": "allergies.start", "name": "start_date"},
        ,
        filter": {
           "selections": [{"id": "claims.departmentid", "values": ["3", "20"]}],
        ,
    }
    
    # Generate the query
    sql_query = sql_generator.generate_query(etl_model)
    print(sql_query)
    

For detailed code examples and a complete walkthrough, please refer to the quickstart.ipynb notebook.

MCP Server

This tool also includes an MCP server that exposes your semantic layer as a set of tools that can be used by an LLM client. This enables you to interact with your semantic layer using natural language to generate SQL queries, discover data, and more.

To start the MCP server, run the following command:

intugle-mcp

You can then connect to the server from any MCP client, such as Claude Desktop or Gemini CLI, at http://localhost:8000/semantic_layer/mcp.

Contributing

Contributions are welcome! Please see the CONTRIBUTING.md file for guidelines.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intugle-0.1.1.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intugle-0.1.1-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file intugle-0.1.1.tar.gz.

File metadata

  • Download URL: intugle-0.1.1.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for intugle-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8aeadb8749145d403a743c12cf8a2005f5022c278932e4ed37a5106051e346b2
MD5 ca062349f538ee03c73c762566f2cb1b
BLAKE2b-256 fe72b9c21f48f1d743da4c12f0a423fb7fb88c77cedb40a7f83b4066a40a2236

See more details on using hashes here.

File details

Details for the file intugle-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: intugle-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for intugle-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b959954bb7a01a8d7ebd2a4ce6a4604147669f8abd7d7142967a3d8f9654f7f5
MD5 77e854100b101e6f99821b4bbd537766
BLAKE2b-256 a63630823845d5f279058c13dde35e1422c5c237802ee8ce1d75e273966b8ab5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page