intugle

A GenAI-powered Python library for building semantic layers.

These details have not been verified by PyPI

Project links

Project description

Intugle Logo

The GenAI-powered toolkit for automated data intelligence.

Release

Transform Fragmented Data into Connected Semantic Data Model

Overview

Intugle’s GenAI-powered open-source Python library builds a semantic data model over your existing data systems. At its core, it discovers meaningful links and relationships across data assets — enriching them with profiles, classifications, and business glossaries. With this connected knowledge layer, you can enable semantic search and auto-generate queries to create unified data products, making data integration and exploration faster, more accurate, and far less manual.

Who is this for?

Data Engineers & Architects often spend weeks manually profiling, classifying, and stitching together fragmented data assets. With Intugle, they can automate this process end-to-end, uncovering meaningful links and relationships to instantly generate a connected semantic layer.
Data Analysts & Scientists spend endless hours on data readiness and preparation before they can even start the real analysis. Intugle accelerates this by providing contextual intelligence, automatically generating SQL and reusable data products enriched with relationships and business meaning.
Business Analysts & Decision Makers are slowed down by constant dependence on technical teams for answers. Intugle removes this bottleneck by enabling natural language queries and semantic search, giving them trusted insights on demand.

Features

Semantic Data Model - Transform raw, fragmented datasets into an intelligent semantic graph that captures entities, relationships, and context — the foundation for connected intelligence.
Business Glossary & Semantic Search: Auto-generate a business glossary and enable search that understands meaning, not just keywords — making data more accessible across technical and business users.
Data Products - Instantly generate SQL and reusable data products enriched with context, eliminating manual pipelines and accelerating data-to-insight.

Getting Started

Installation

For Windows and Linux, you can follow these steps. For macOS, please see the additional steps in the macOS section below.

Before installing, it is recommended to create a virtual environment:

python -m venv .venv
source .venv/bin/activate

Then, install the package:

pip install intugle

macOS

For macOS users, you may need to install the libomp library:

brew install libomp

If you installed Python using the official installer from python.org, you may also need to install SSL certificates by running the following command in your terminal. Please replace 3.XX with your specific Python version. This step is not necessary if you installed Python using Homebrew.

/Applications/Python\ 3.XX/Install\ Certificates.command

Configuration

Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.

You can configure the LLM by setting the following environment variables:

LLM_PROVIDER: The LLM provider and model to use (e.g., openai:gpt-3.5-turbo) following LangChain's conventions
API_KEY: Your API key for the LLM provider. The exact name of the variable may vary from provider to provider.

Here's an example of how to set these variables in your environment:

export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"

Quickstart

For a detailed, hands-on introduction to the project, please see our quickstart notebooks:

Domain	Notebook	Open in Colab
Healthcare	`quickstart_healthcare.ipynb`
Tech Manufacturing	`quickstart_tech_manufacturing.ipynb`
FMCG	`quickstart_fmcg.ipynb`
Sports Media	`quickstart_sports_media.ipynb`
Databricks Unity Catalog [Health Care]	`quickstart_healthcare_databricks.ipynb`	Databricks Notebook Only
Snowflake Horizon Catalog [ FMCG ]	`quickstart_fmcg_snowflake.ipynb`	Snowflake Notebook Only
Native Snowflake with Cortex Analyst [ Tech Manufacturing ]	`quickstart_native_snowflake.ipynb`
Native Databricks with AI/BI Genie [ Tech Manufacturing ]	`quickstart_native_databricks.ipynb`

These datasets will take you through the following steps:

Generate Semantic Model → The unified layer that transforms fragmented datasets, creating the foundation for connected intelligence.
- 1.1 Profile and classify data → Analyze your data sources to understand their structure, data types, and other characteristics.
- 1.2 Discover links & relationships among data → Reveal meaningful connections (PK & FK) across fragmented tables.
- 1.3 Generate a business glossary → Create business-friendly terms and use them to query data with context.
- 1.4 Enable semantic search → Intelligent search that understands meaning, not just keywords—making data more accessible across both technical and business users.
- 1.5 Visualize semantic model→ Get access to enriched metadata of the semantic layer in the form of YAML files and visualize in the form of graph
Build Unified Data Products → Simply pick the attributes across your data tables, and let the toolkit auto-generate queries with all the required joins, transformations, and aggregations using the semantic layer. When executed, these queries produce reusable data products.

Documentation

For more detailed information, advanced usage, and tutorials, please refer to our full documentation site.

Usage

The core workflow of the project involves using the SemanticModel to build a semantic layer, and then using the DataProduct to generate data products from that layer.

from intugle import SemanticModel

# Define your datasets
datasets = {
    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
    "patients": {"path": "path/to/patients.csv", "type": "csv"},
    "claims": {"path": "path/to/claims.csv", "type": "csv"},
    # ... add other datasets
}

# Build the semantic model
sm = SemanticModel(datasets, domain="Healthcare")
sm.build()

# Access the profiling results
print(sm.profiling_df.head())

# Access the discovered links
print(sm.links_df)

For detailed code examples and a complete walkthrough, please see our quickstart notebooks.

Data Product

Once the semantic model is built, you can use the DataProduct class to generate unified data products from the semantic layer.

from intugle import DataProduct

# Define an ETL model
etl = {
  "name": "top_patients_by_claim_count",
  "fields": [
    {
      "id": "patients.first",
      "name": "first_name",
    },
    {
      "id": "patients.last",
      "name": "last_name",
    },
    {
      "id": "claims.id",
      "name": "number_of_claims",
      "category": "measure",
      "measure_func": "count"
    }
  ],
  "filter": {
    "sort_by": [
      {
        "id": "claims.id",
        "alias": "number_of_claims",
        "direction": "desc"
      }
    ],
    "limit": 10
  }
}

# Create a DataProduct and build it
dp = DataProduct()
data_product = dp.build(etl)

# View the data product as a DataFrame
print(data_product.to_df())

Semantic Search

The semantic search feature allows you to search for columns in your datasets using natural language. It is built on top of the Qdrant vector database.

Prerequisites

To use the semantic search feature, you need to have a running Qdrant instance. You can start one using the following Docker command:

docker run -d -p 6333:6333 -p 6334:6334 \
    -v qdrant_storage:/qdrant/storage:z \
    --name qdrant qdrant/qdrant

You also need to configure the Qdrant URL and API key (if using authorization) in your environment variables:

export QDRANT_URL="http://localhost:6333"
export QDRANT_API_KEY="your-qdrant-api-key" # if authorization is used

Currently, the semantic search feature only supports OpenAI embedding models. Therefore, you need to have an OpenAI API key set up in your environment. The default model is text-embedding-ada-002. You can change the embedding model by setting the EMBEDDING_MODEL_NAME environment variable.

For OpenAI models:

export OPENAI_API_KEY="your-openai-api-key"
export EMBEDDING_MODEL_NAME="openai:ada"

For Azure OpenAI models:

export AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
export AZURE_OPENAI_ENDPOINT="your-azure-openai-endpoint"
export OPENAI_API_VERSION="your-openai-api-version"
export EMBEDDING_MODEL_NAME="azure_openai:ada"

Usage

Once you have built the semantic model, you can use the search method to perform a semantic search. The search function returns a pandas DataFrame containing the search results, including the column's profiling metrics, category, table name, and table glossary.

from intugle import SemanticModel

# Define your datasets
datasets = {
    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
    "patients": {"path": "path/to/patients.csv", "type": "csv"},
    "claims": {"path": "path/to/claims.csv", "type": "csv"},
    # ... add other datasets
}

# Build the semantic model
sm = SemanticModel(datasets, domain="Healthcare")
sm.build()
# Perform a semantic search
search_results = sm.search("reason for hospital visit")

# View the search results
print(search_results)

For detailed code examples and a complete walkthrough, please see our quickstart notebooks.

MCP Server

Intugle includes a built-in MCP (Model Context Protocol) server that exposes your semantic layer to AI assistants and LLM-powered clients. Its main purpose is to allow agents to understand your data's structure by using tools like get_tables and get_schema.

Once your semantic model is built, you can start the server with a simple command:

intugle-mcp

This enables AI agents to programmatically interact with your data context. This also enables vibe coding with the library

For detailed instructions on setting up the server and connecting your favorite client, please see our full documentation.

Community

Join our community to ask questions, share your projects, and connect with other users.

Join our Discord

Contributing

Contributions are welcome! Please see the CONTRIBUTING.md file for guidelines.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.1

Mar 30, 2026

1.3.0

Dec 30, 2025

1.2.4rc2 pre-release

Dec 18, 2025

1.2.4rc1 pre-release

Dec 16, 2025

1.2.3

Dec 6, 2025

1.2.1

Nov 29, 2025

1.2.0

Nov 18, 2025

1.2.0rc1 pre-release

Nov 16, 2025

1.1.0

Nov 11, 2025

1.0.13

Nov 10, 2025

1.0.12

Nov 8, 2025

1.0.11

Oct 29, 2025

1.0.10

Oct 23, 2025

1.0.9

Oct 23, 2025

This version

1.0.8

Oct 21, 2025

1.0.7

Oct 14, 2025

1.0.6

Oct 8, 2025

1.0.5

Oct 8, 2025

1.0.4

Oct 5, 2025

1.0.3

Oct 2, 2025

1.0.3rc1 pre-release

Oct 2, 2025

1.0.2

Oct 2, 2025

1.0.2rc2 pre-release

Oct 1, 2025

1.0.2rc1 pre-release

Oct 1, 2025

1.0.1

Sep 29, 2025

1.0.0

Sep 25, 2025

0.1.10

Sep 25, 2025

0.1.9

Sep 18, 2025

0.1.8

Sep 17, 2025

0.1.7

Sep 15, 2025

0.1.6

Sep 15, 2025

0.1.5

Sep 10, 2025

0.1.4

Sep 10, 2025

0.1.3

Sep 8, 2025

0.1.2

Sep 8, 2025

0.1.1

Aug 25, 2025

0.1.0

Aug 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intugle-1.0.8.tar.gz (8.4 MB view details)

Uploaded Oct 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

intugle-1.0.8-py3-none-any.whl (8.6 MB view details)

Uploaded Oct 21, 2025 Python 3

File details

Details for the file intugle-1.0.8.tar.gz.

File metadata

Download URL: intugle-1.0.8.tar.gz
Upload date: Oct 21, 2025
Size: 8.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for intugle-1.0.8.tar.gz
Algorithm	Hash digest
SHA256	`bcbcab41e06d8ee46f3c906b807bc3e02bf261e2c2636513e71aab955cdea751`
MD5	`85ed434f4512402e4d73a0c316a8b010`
BLAKE2b-256	`0f909a0762876394de0ed714e22d47b8c428c24385d4ea6d5ebe58538a6cd3a5`

See more details on using hashes here.

File details

Details for the file intugle-1.0.8-py3-none-any.whl.

File metadata

Download URL: intugle-1.0.8-py3-none-any.whl
Upload date: Oct 21, 2025
Size: 8.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for intugle-1.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`805aed0ee7841bb8caa8cd9639b84ecb9b777f88586787582d1b4ac74c18d420`
MD5	`3438be5a88ca8da932f7f47b1839b92c`
BLAKE2b-256	`dd9f1817cddcdf6f96924009663ff51495587b9bf1bd272f4e44c962cc86431d`

See more details on using hashes here.

intugle 1.0.8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

The GenAI-powered toolkit for automated data intelligence.

Overview

Who is this for?

Features

Getting Started

Installation

macOS

Configuration

Quickstart

Documentation

Usage

Data Product

Semantic Search

Prerequisites

Usage

MCP Server

Community

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes