A python package to simplify the usage of feature store using Teradata Vantage ...

Project description

tdfs4ds logo

tdfs4ds — A Feature Store Library for Data Scientists working with ClearScape Analytics

tdfs4ds (Teradata Feature Store for Data Scientists) is a Python package for managing temporal feature stores in Teradata Vantage databases. It provides easy-to-use functions for creating, registering, storing, and retrieving features — with full time-travel support, lineage tracking, and process operationalization.

Installation

pip install tdfs4ds

Quick Start

Import tdfs4ds after establishing a teradataml connection so the package can auto-detect your default database:

import teradataml as tdml
tdml.create_context(host=..., username=..., password=...)

import tdfs4ds
# tdfs4ds.SCHEMA is auto-set from the teradataml context;
# override if needed: tdfs4ds.SCHEMA = 'my_database'

# Data domain management — use the dedicated functions:
tdfs4ds.create_data_domain('MY_PROJECT')   # create and activate a new domain
# or
tdfs4ds.select_data_domain('MY_PROJECT')   # activate an existing domain
# or
tdfs4ds.get_data_domains()                 # list all available domains (* marks the active one)

Core API

Function	Description
`tdfs4ds.setup(database)`	Create feature catalog, process catalog, and follow-up tables in `database`
`tdfs4ds.upload_features(df, entity_id, feature_names, metadata={})`	Ingest features from a teradataml DataFrame into the feature store
`tdfs4ds.build_dataset(entity_id, selected_features, view_name, comment='dataset', grouped=False)`	Assemble a dataset view from registered features
`tdfs4ds.run(process_id)`	Re-execute a registered feature engineering process
`tdfs4ds.roll_out(...)`	Operationalize processes at scale
`tdfs4ds.connect(database)`	Connect to an existing feature store

`entity_id` must specify SQL data types (dict, not list)

entity_id = {'CUSTOMER_ID': 'BIGINT', 'EVENT_DATE': 'DATE'}   # correct
entity_id = ['CUSTOMER_ID', 'EVENT_DATE']                      # wrong

Walkthrough Example

Step 1 — Set up a feature store

import teradataml as tdml
tdml.create_context(host=..., username=..., password=...)

import tdfs4ds
tdfs4ds.setup(database='my_database')

Step 2 — Configure the active context

tdfs4ds.SCHEMA = 'my_database'   # override if not auto-detected

# Use dedicated functions to manage the data domain:
tdfs4ds.create_data_domain('DATA_QUALITY')   # create and activate (first time)
# tdfs4ds.select_data_domain('DATA_QUALITY') # activate an existing domain
# tdfs4ds.get_data_domains()                 # list all domains

Step 3 — Define your feature engineering view

df = tdml.DataFrame(tdml.in_schema('my_database', 'my_feature_view'))
# If teradataml created intermediate views, make them permanent first:
# tdfs4ds.crystallize_view(df)

Step 4 — Upload and operationalize

entity_id     = {'EVENT_DT': 'DATE', 'ID': 'BIGINT'}
feature_names = ['KPI1', 'KPI2']

tdfs4ds.upload_features(
    df=df,
    entity_id=entity_id,
    feature_names=feature_names,
    metadata={'project': 'data quality'}
)

This registers entities and features (if not already present), registers a feature engineering process in the process catalog, and writes the feature values into the feature store.

Step 5 — Re-run a process

# List all registered processes to find the process ID
tdfs4ds.process_catalog()

# Re-execute by process ID
tdfs4ds.run(process_id)

Step 6 — Build a dataset

selected_features = {
    'KPI1': '<process_uuid>',
    'KPI2': '<process_uuid>',
}

dataset = tdfs4ds.build_dataset(
    entity_id={'ID': 'BIGINT'},
    selected_features=selected_features,
    view_name='my_dataset',
    comment='Dataset for churn model'
)

selected_features maps each feature name to the UUID of the process that computed it.

Pass grouped=True to activate the grouped pivot strategy — when many features share the same source table and process, they are collapsed into a single MAX(CASE WHEN FEATURE_ID=…) … GROUP BY sub-query instead of one sub-query per feature. This reduces JOIN fan-out for wide feature sets:

dataset = tdfs4ds.build_dataset(
    entity_id={'ID': 'BIGINT'},
    selected_features=selected_features,
    view_name='my_dataset',
    comment='Dataset for churn model',
    grouped=True,   # MAX(CASE WHEN) + GROUP BY pivot strategy
)

Use return_query=True to inspect the generated DDL without executing it:

sql = tdfs4ds.build_dataset(
    entity_id={'ID': 'BIGINT'},
    selected_features=selected_features,
    view_name='my_dataset',
    return_query=True,
)
print(sql)

Configuration

Programmatic (in-session)

tdfs4ds.SCHEMA                = 'my_database'        # target database (auto-set from context)
# Data domain: use tdfs4ds.create_data_domain() / select_data_domain() / get_data_domains()
tdfs4ds.FEATURE_STORE_TIME    = None                 # None = current; '2024-01-01 00:00:00' = time travel
tdfs4ds.DISPLAY_LOGS          = True                 # verbose logging
tdfs4ds.DEBUG_MODE            = False
tdfs4ds.STORE_FEATURE         = 'MERGE'              # 'MERGE' or 'UPDATE_INSERT'

# GenAI documentation
tdfs4ds.INSTRUCT_MODEL_PROVIDER = 'openai'           # or 'bedrock', 'vllm', 'azure'
tdfs4ds.INSTRUCT_MODEL_MODEL    = 'gpt-4o'
tdfs4ds.INSTRUCT_MODEL_API_KEY  = 'sk-...'           # prefer env var instead (see below)

# Embedding model (consumer agent vector index — falls back to INSTRUCT_MODEL_* if unset)
tdfs4ds.EMBEDDING_MODEL_PROVIDER = 'vllm'
tdfs4ds.EMBEDDING_MODEL_URL      = 'https://api.example.com/v1/e5'
tdfs4ds.EMBEDDING_MODEL_MODEL    = 'text-embedding-3-small'
tdfs4ds.EMBEDDING_MODEL_DIM      = 1536

# Chroma vector store
tdfs4ds.CHROMA_MODE = 'local'                        # 'local' or 'server'
tdfs4ds.CHROMA_PATH = './tdfs4ds_chroma'             # persist directory (local mode)

# MCP server (optional — consumer agent external tools)
tdfs4ds.MCP_SERVER_URL = 'http://localhost:8000/sse' # SSE endpoint; None = disabled

Config file (persistent per-project or per-user)

Create a tdfs4ds.json file in your project directory (or ~/.tdfs4ds/config.json for user-wide defaults) to avoid repeating the setup cell in every notebook:

{
    "schema": "MY_DATABASE",
    "data_domain": "MY_PROJECT",
    "display_logs": true,
    "store_feature": "MERGE",
    "varchar_size": 1024,
    "instruct_model_provider": "openai",
    "instruct_model_model": "gpt-4o",
    "instruct_model_url": null,
    "embedding_model_provider": "vllm",
    "embedding_model_model": "text-embedding-3-small",
    "embedding_model_dim": 1536,
    "chroma_mode": "local",
    "chroma_path": "./tdfs4ds_chroma",
    "mcp_server_url": null
}

Keys are case-insensitive. instruct_model_api_key is rejected from JSON config to prevent accidental commits — use a .env file or OS env var for credentials.

`.env` file (local secrets and overrides)

Place a .env file in your project directory (or ~/.tdfs4ds/.env for user-wide defaults). Only TDFS4DS_* variables are read — the file is parsed without touching os.environ:

TDFS4DS_SCHEMA=MY_DATABASE
TDFS4DS_DATA_DOMAIN=MY_PROJECT
TDFS4DS_INSTRUCT_MODEL_API_KEY=sk-...
TDFS4DS_INSTRUCT_MODEL_PROVIDER=openai
TDFS4DS_INSTRUCT_MODEL_MODEL=gpt-4o
TDFS4DS_EMBEDDING_MODEL_PROVIDER=vllm
TDFS4DS_EMBEDDING_MODEL_URL=https://api.example.com/v1/e5
TDFS4DS_EMBEDDING_MODEL_MODEL=text-embedding-3-small
TDFS4DS_EMBEDDING_MODEL_DIM=1536
TDFS4DS_CHROMA_MODE=local
TDFS4DS_CHROMA_PATH=./tdfs4ds_chroma
TDFS4DS_MCP_SERVER_URL=https://your-mcp-server/endpoint/
TDFS4DS_MCP_SERVER_TRANSPORT=streamable_http

Add .env to .gitignore to keep secrets out of source control. Quoted values and export KEY=VALUE syntax are supported.

Environment variables

All settings can also be set via TDFS4DS_<VAR_NAME> OS environment variables (useful in CI/CD):

Variable	Corresponding setting
`TDFS4DS_SCHEMA`	`tdfs4ds.SCHEMA`
`TDFS4DS_DATA_DOMAIN`	`tdfs4ds.DATA_DOMAIN`
`TDFS4DS_DISPLAY_LOGS`	`tdfs4ds.DISPLAY_LOGS`
`TDFS4DS_DEBUG_MODE`	`tdfs4ds.DEBUG_MODE`
`TDFS4DS_STORE_FEATURE`	`tdfs4ds.STORE_FEATURE`
`TDFS4DS_VARCHAR_SIZE`	`tdfs4ds.VARCHAR_SIZE`
`TDFS4DS_INSTRUCT_MODEL_PROVIDER`	`tdfs4ds.INSTRUCT_MODEL_PROVIDER`
`TDFS4DS_INSTRUCT_MODEL_MODEL`	`tdfs4ds.INSTRUCT_MODEL_MODEL`
`TDFS4DS_INSTRUCT_MODEL_URL`	`tdfs4ds.INSTRUCT_MODEL_URL`
`TDFS4DS_INSTRUCT_MODEL_API_KEY`	`tdfs4ds.INSTRUCT_MODEL_API_KEY`
`TDFS4DS_EMBEDDING_MODEL_PROVIDER`	`tdfs4ds.EMBEDDING_MODEL_PROVIDER`
`TDFS4DS_EMBEDDING_MODEL_MODEL`	`tdfs4ds.EMBEDDING_MODEL_MODEL`
`TDFS4DS_EMBEDDING_MODEL_URL`	`tdfs4ds.EMBEDDING_MODEL_URL`
`TDFS4DS_EMBEDDING_MODEL_API_KEY`	`tdfs4ds.EMBEDDING_MODEL_API_KEY`
`TDFS4DS_EMBEDDING_MODEL_DIM`	`tdfs4ds.EMBEDDING_MODEL_DIM`
`TDFS4DS_CHROMA_MODE`	`tdfs4ds.CHROMA_MODE`
`TDFS4DS_CHROMA_PATH`	`tdfs4ds.CHROMA_PATH`
`TDFS4DS_CHROMA_HOST`	`tdfs4ds.CHROMA_HOST`
`TDFS4DS_CHROMA_PORT`	`tdfs4ds.CHROMA_PORT`
`TDFS4DS_MCP_SERVER_URL`	`tdfs4ds.MCP_SERVER_URL`
`TDFS4DS_MCP_SERVER_TRANSPORT`	`tdfs4ds.MCP_SERVER_TRANSPORT` (`'streamable_http'` or `'sse'`)
`TDFS4DS_SKILLS_FOLDER`	Path to a folder containing plugin consumer-agent skill subdirectories

`load_config()` — explicit reload

# Reload from default search paths
tdfs4ds.load_config()

# Point at specific files
tdfs4ds.load_config(
    path='/configs/feature_store.json',
    dotenv_path='/project/.env.production',
)

Priority chain

programmatic (tdfs4ds.X = value)
  > OS environment variable (TDFS4DS_X)
  > .env file (./.env or ~/.tdfs4ds/.env)
  > JSON config file (./tdfs4ds.json or ~/.tdfs4ds/config.json)
  > teradataml auto-detection (SCHEMA only)
  > built-in defaults

Time Travel

All catalogs and feature stores are temporal. Point-in-time queries are available via:

tdfs4ds.FEATURE_STORE_TIME = '2024-01-01 00:00:00'   # query historical state
tdfs4ds.FEATURE_STORE_TIME = None                     # back to current state

Package Structure

tdfs4ds/
├── __init__.py                    — Global config variables & re-exported public API
├── config.py                      — External config loading (JSON, .env, env vars); load_config()
├── lifecycle.py                   — setup(), connect()
├── execution.py                   — run(), upload_features(), roll_out()
├── catalog.py                     — feature_catalog(), process_catalog(), dataset_catalog()
├── data_domain.py                 — get_data_domains(), select_data_domain(), create_data_domain()
├── datasets.py                    — Utility dataset helpers
├── agent/
│   ├── __init__.py                — Public exports: consumer_agent, query_optimizer, aquery_optimizer, display_optimization_result, …
│   ├── consumer_agent.py          — Intent classifier, 7 skills, feature doc resolver, LLM helpers
│   ├── graph.py                   — LangGraph StateGraph: classify → detect_domain → skill → synthesize
│   ├── query_optimizer.py         — skill_optimize_query(), query_optimizer(), aquery_optimizer(), display_optimization_result()
│   ├── embedding.py               — get_embeddings(), list_embedding_models()
│   ├── vector_index.py            — build_vector_index(), search_vector_index() (Chroma)
│   └── chatbot.py                 — launch_chatbot(), launch_chatbot_with_index() Gradio UI
├── feature_store/
│   ├── entity_management.py       — register_entity(), remove_entity()
│   ├── feature_data_processing.py — prepare_feature_ingestion(), store_feature(), apply_collect_stats()
│   ├── feature_query_retrieval.py — get_list_features(), get_available_features(), get_feature_versions()
│   └── feature_store_management.py — register_features(), feature_store_table_creation()
├── process_store/
│   ├── process_followup.py        — followup_open(), followup_close(), follow_up_report()
│   ├── process_query_administration.py — list_processes(), get_process_id(), remove_process()
│   ├── process_registration_management.py — register_process_view()
│   └── process_store_catalog_management.py — process_store_catalog_creation()
├── dataset/
│   ├── builder.py                 — build_dataset(), build_dataset_opt(), augment_source_with_features()
│   ├── dataset.py                 — Dataset class
│   └── dataset_catalog.py        — DatasetCatalog class
├── genai/
│   └── documentation.py          — LLM-powered auto-documentation of SQL processes (OpenAI / Bedrock)
├── lineage/
│   ├── lineage.py                 — SQL query parsing, DDL analysis
│   ├── network.py                 — Dependency graph construction
│   └── indexing.py                — Lineage indexing utilities
└── utils/
    ├── query_management.py        — execute_query(), execute_query_wrapper()
    ├── filter_management.py       — FilterManager class
    ├── time_management.py         — TimeManager class
    ├── lineage.py                 — crystallize_view(), analyze_sql_query(), generate_view_dependency_network()
    ├── info.py                    — update_varchar_length(), get_column_types(), seconds_to_dhms()
    └── visualization.py           — plot_graph(), visualize_graph(), display_table()

GenAI Documentation

The genai module provides two complementary ways to document the feature store.

LLM-powered process documentation

document_process() calls an LLM (OpenAI, Azure, vLLM, or AWS Bedrock) to generate:

Business-logic description of the SQL query
Entity description and per-column annotations
EXPLAIN-plan quality metrics: two 1–5 scores (User score / Overall score) plus two deterministic execution counters (n_steps, n_spool_objects) parsed from the raw Teradata EXPLAIN text, with warnings and recommendations

import tdfs4ds
from tdfs4ds.genai import document_process

# Configure the LLM (or use TDFS4DS_INSTRUCT_MODEL_* env vars / .env file)
tdfs4ds.INSTRUCT_MODEL_PROVIDER = 'openai'
tdfs4ds.INSTRUCT_MODEL_MODEL    = 'gpt-4o'
tdfs4ds.INSTRUCT_MODEL_API_KEY  = 'sk-...'

process_info = document_process(process_id='<UUID>', show_explain_plan=True)

LLM-powered dataset documentation

document_dataset_incremental() documents a dataset by walking its full lineage bottom-up:

Source tables — uses the business dictionary if available
Intermediate views — auto-documented via LLM if undocumented
Process views — actively calls document_process_incremental if undocumented
Feature/entity column descriptions are propagated from process docs (no extra LLM call)
A single JSON-constrained LLM call generates five structured sections for the dataset

from tdfs4ds.genai import document_dataset_incremental

result = document_dataset_incremental(
    dataset_id   = '<UUID>',  # from dataset_catalog()
    force_update = False,
    upload       = True,
)

# result['DATASET_SECTIONS'] contains:
#   OVERVIEW, ENTITY, FEATURE_THEMES, BUSINESS_QUESTIONS, INTENDED_AUDIENCE

Each section is stored as an independent row in FS_BUSINESS_DICTIONARY_SECTIONS — no chunking needed for RAG retrieval.

Full-store documentation in one call

document_feature_store_incremental() documents every registered process and dataset in a single optimised pass. Objects are processed in dependency order (leaves first, roots last) so upstream context is always available. A shared pair of visited-sets ensures each process view is documented at most once, even when referenced by multiple datasets.

from tdfs4ds.genai import document_feature_store_incremental

summary = document_feature_store_incremental(
    language     = 'English',
    force_update = False,
    upload       = True,
)
# summary keys: processes_documented, datasets_documented,
#               processes_skipped, datasets_skipped

After documentation, process descriptions are automatically mirrored to the business dictionary (object overview + column-level feature descriptions) so the consumer agent can resolve them without any extra step.

Business dictionary (no LLM required)

Three temporal tables store business-oriented descriptions for any database object, its columns, and its documentation sections. They form a 3-level hierarchy designed for chunking-free hierarchical RAG:

Level	Table	Key	Purpose
0	`FS_BUSINESS_DICTIONARY_OBJECTS`	`(DATABASE_NAME, OBJECT_NAME)`	One summary per object (`OBJECT_TYPE`: `'T'`/`'V'`/`'D'`)
1	`FS_BUSINESS_DICTIONARY_SECTIONS`	`(DATABASE_NAME, OBJECT_NAME, SECTION_NAME)`	One row per documentation section per object
2	`FS_BUSINESS_DICTIONARY_COLUMNS`	`(DATABASE_NAME, TABLE_NAME, COLUMN_NAME)`	One description per column

All tables are VALIDTIME temporal and provisioned automatically by tdfs4ds.connect(create_if_missing=True).

import pandas as pd
from tdfs4ds.genai import (
    upload_business_dictionary_objects,
    upload_business_dictionary_columns,
    upload_business_dictionary_sections,
)

# Level 0 — Object-level descriptions
upload_business_dictionary_objects(pd.DataFrame([
    {
        'DATABASE_NAME'       : 'MY_DB',
        'OBJECT_NAME'         : 'CUSTOMER',
        'OBJECT_TYPE'         : 'T',
        'BUSINESS_DESCRIPTION': 'Core customer table. Each row represents a unique enrolled customer.',
    },
]))

# Level 1 — Section-level descriptions (typically LLM-generated for datasets)
upload_business_dictionary_sections(pd.DataFrame([
    {
        'DATABASE_NAME'  : 'MY_DB',
        'OBJECT_NAME'    : 'DATASET_CUSTOMER',
        'SECTION_NAME'   : 'OVERVIEW',
        'SECTION_CONTENT': 'Customer-level analytical dataset combining spending and category features...',
    },
]))

# Level 2 — Column-level descriptions
upload_business_dictionary_columns(pd.DataFrame([
    {
        'DATABASE_NAME'       : 'MY_DB',
        'TABLE_NAME'          : 'CUSTOMER',
        'COLUMN_NAME'         : 'CUSTOMER_ID',
        'BUSINESS_DESCRIPTION': 'Unique customer identifier assigned at enrolment.',
    },
]))

All three functions validate required columns and perform a CURRENT VALIDTIME MERGE — re-running them updates existing descriptions and preserves the full change history.

Consumer Agent (Chatbot)

The agent module provides a conversational interface for business consumers. Non-technical users can ask natural-language questions about features, datasets, definitions, data freshness, usage guidance, data lineage, and calculation logic — in English or French.

Architecture

User question
  → Intent classifier (Pydantic structured output)
  → DATA_DOMAIN detector (finds which domain owns the feature; remembered across turns)
  → Skill dispatcher (7 skills)
  → Plain-language answer

The agent uses LangGraph StateGraph with MemorySaver for multi-turn conversations. Conversation context is persisted across turns:

State field	What is remembered
`resolved_data_domain`	Which DATA_DOMAIN owns the last named feature/dataset
`resolved_object_name`	Last explicitly named feature or dataset
`resolved_feature_triplet`	Full resolution: feature name, entity, process ID, view name
`resolved_entity_name`	Entity type in focus (e.g. `CustomerID`)
`resolved_feature_list`	Feature names currently in focus (one or many)
`resolved_column_sources`	Column→source-table map from the last EXPLAIN result (used by DEFINITION drill-down)

Follow-up questions that omit an explicit feature name (e.g. "when was it last updated?", "how is it calculated?") automatically reuse the previously resolved feature, entity, and domain — no need to repeat yourself. When a feature name is shared across multiple entity types, the remembered entity silently disambiguates without asking for clarification.

After an EXPLAIN turn, the agent lists every variable involved in the formula with its source table. Asking "what is <column>?" immediately after an EXPLAIN resolves the column through the business dictionary — even if it is not a registered feature. Vague references (e.g. "what does the date mean?") are fuzzy-matched against remembered column and table names.

Feature descriptions are resolved via the process documentation chain: entity → features → process_id → VIEW_NAME → FS_BUSINESS_DICTIONARY_COLUMNS

Quick start

import tdfs4ds
from tdfs4ds.agent import launch_chatbot_with_index

# Configure LLM and embedding model
tdfs4ds.INSTRUCT_MODEL_PROVIDER = 'vllm'
tdfs4ds.INSTRUCT_MODEL_URL      = 'https://api.example.com/v1'
tdfs4ds.INSTRUCT_MODEL_API_KEY  = 'my-key'
tdfs4ds.INSTRUCT_MODEL_MODEL    = 'mistral-7b-instruct'
tdfs4ds.EMBEDDING_MODEL_URL     = 'https://api.example.com/v1/e5'
tdfs4ds.EMBEDDING_MODEL_MODEL   = 'bge-m3'

# Build vector index (incremental) then launch the Gradio chatbot — one call
demo = launch_chatbot_with_index(port=7860)

Or call the agent programmatically:

from tdfs4ds.agent import consumer_agent

answer = consumer_agent("What features are available?", thread_id="session-1")
answer = consumer_agent("How is nb_days_since_last_transactions calculated?", thread_id="session-1")
answer = consumer_agent("When was it last updated?", thread_id="session-1")  # feature + entity remembered
answer = consumer_agent("What about for CustomerID?", thread_id="session-1")  # entity remembered, new feature group

Skills

The 8 available intents are defined by the ca-* SKILL.md files bundled with the package. Removing a file removes that intent; reset_consumer_agent() clears the singleton cache so the change takes effect without restarting the process.

Intent	Trigger examples	What happens
`SEARCH`	"What features analyse customer spending?"	Semantic search across vector index + feature catalog
`DEFINITION`	"What does total_amount measure?"	Resolves feature → process view → column doc
`USAGE`	"How do I use avg_amount in Tableau?"	Audience, granularity, regulatory guidance
`FRESHNESS`	"Is total_amount up to date?"	Checks follow-up execution history
`SUMMARY`	"List all available features"	Full feature list with entity and description per feature
`LINEAGE`	"Where does total_amount come from?"	Walks upstream dependency graph via `build_teradata_dependency_graph`
`EXPLAIN`	"How is total_amount calculated?"	Fetches `SHOW VIEW` DDL → LLM explains logic in plain language + lists source columns so you can drill into any variable
`DATASET`	"Which dataset exposes total_amount?"	Looks up dataset catalog for datasets that contain the named feature

When the user asks about multiple features at once (e.g. "is there a dataset with feature1 and feature2?"), the DATASET skill returns per-feature results and the intersection of datasets that expose all requested features simultaneously.

Restrict the agent to a subset at runtime:

answer = consumer_agent("What features are available?", skills=["SEARCH", "SUMMARY"])

Discover installed consumer-agent skills programmatically:

tdfs4ds.consumer_agent_skill_catalog()   # {skill_name: {intent, description, ...}}

MCP Tools (optional)

When tdfs4ds.MCP_SERVER_URL is set, the consumer agent can delegate questions to an external MCP (Model Context Protocol) server — useful for general data queries, external lookups, or calculations that fall outside the feature store domain.

Configure the endpoint:

# .env
TDFS4DS_MCP_SERVER_URL=https://your-mcp-server/endpoint/
TDFS4DS_MCP_SERVER_TRANSPORT=streamable_http   # default — modern MCP standard
# TDFS4DS_MCP_SERVER_TRANSPORT=sse             # legacy SSE transport

or programmatically:

tdfs4ds.MCP_SERVER_URL       = 'https://your-mcp-server/endpoint/'
tdfs4ds.MCP_SERVER_TRANSPORT = 'streamable_http'   # 'streamable_http' (default) or 'sse'

Enable MCP in programmatic calls:

from tdfs4ds.agent import consumer_agent

answer = consumer_agent("How many rows are in the transactions table?",
                        thread_id="session-1", mcp_enabled=True)

Gradio chatbot — when MCP_SERVER_URL is set, the chatbot shows an Enable MCP Tools checkbox. Tick it to route applicable questions to the MCP server; leave it unticked for pure feature store mode.

Safety guardrails (enforced at both the tool and LLM-prompt level):

Rule	What happens
Forbidden objects	SELECT statements against feature store process views or feature storage tables are refused outright
Allowed data access	When feature data is needed, only datasets registered in the dataset catalog are queried
Row cap	All SELECT queries are automatically rewritten to `SELECT TOP 20`; a ⚠ notice is added to the answer when the limit is reached

The langchain-mcp-adapters package is a lazy optional dependency — it is only imported when MCP is actually invoked. Install it with pip install langchain-mcp-adapters.

Plugin skills

You can extend the agent with custom skills — for example, to run domain-specific analyses or query live data — by pointing TDFS4DS_SKILLS_FOLDER at a directory that contains skill subdirectories. Plugin skills are merged transparently into the catalog alongside the bundled ca-* skills.

$TDFS4DS_SKILLS_FOLDER/
  my-analysis-skill/
    SKILL.md    ← required — defines the intent name and description
    skill.py    ← optional — if present, adds a routable node to the agent graph

SKILL.md uses the same frontmatter format as the bundled skills:

---
name: Revenue Trend Analysis
intent: REVENUE_TREND
description: user wants to analyse revenue trends or compare figures across periods
---

skill.py must define a node(state) function that receives the full AgentState dict and returns a partial state update:

def node(state: dict) -> dict:
    question = state.get("question", "")
    domain   = state.get("resolved_data_domain")
    # ... your analysis logic (teradataml queries, aggregations, etc.) ...
    return {
        "skill_result": {
            "answer": "Revenue grew 12 % YoY, driven by the EMEA region."
        }
    }

If skill_result contains an "answer" key, it is returned directly to the user without a second LLM pass.
If skill.py is absent, the skill still appears in the intent catalog (its description is included in the classifier prompt) but the agent cannot execute it.
Plugin skills do not need a ca- prefix — any directory name under TDFS4DS_SKILLS_FOLDER is picked up.

Setting the folder:

import os
os.environ["TDFS4DS_SKILLS_FOLDER"] = "/path/to/my/plugins"

After changing the folder, call reset_consumer_agent() to invalidate the graph cache:

from tdfs4ds.agent import reset_consumer_agent
reset_consumer_agent()

Notebooks

09 - Consumer Agent Chatbot with tdfs4ds.ipynb — architecture walkthrough, 7-intent test suite, 4-turn multi-turn demo
10 - Launch Consumer Agent Chatbot.ipynb — minimal 4-cell one-command launch

Gradio trace panel

The chatbot includes a collapsible Agent Trace accordion showing, for each turn:

Intent Classification — detected intent, object name, domain
DATA_DOMAIN Detection — available domains, resolved domain, source (detected / remembered)
Skill Executed — skill name and inputs
Skill Result — structured output summary; errors include per-step diagnostic messages

Model listing

from tdfs4ds.agent import list_instruct_models, list_embedding_models

list_instruct_models()                          # models on INSTRUCT_MODEL_URL
list_embedding_models(sub_paths=['e5', 'code']) # models on EMBEDDING_MODEL_URL sub-paths

Query Optimizer Agent

The query_optimizer module analyses and rewrites Teradata SQL feature-engineering queries for better performance. It is process-aware — when a process_id is supplied it pulls the registered SQL and any stored EXPLAIN documentation directly from the tdfs4ds process catalog, avoiding redundant LLM calls.

Pipeline (9 steps)

Step	What happens
0	Process context — SQL + stored EXPLAIN analysis fetched from the catalog (skipped when no `process_id`)
0.5	SQL simplification — structural compaction pass merges unnecessary nesting layers into CTE + single outer SELECT, giving the LLM a cleaner baseline; accepts the simplified form only when its EXPLAIN score ≥ original
1	Structured EXPLAIN analysis — `document_sql_query_explain` scores the simplified query 1–5 with `[You]`-prefixed author-actionable warnings and recommendations
2	Lineage graph — Primary Index + partition columns collected for every underlying object
3	DDL fetch — `SHOW TABLE` / `SHOW VIEW` for every referenced object
4	Candidate generation — LLM proposes up to `N` rewrites focused on `[You]`-actionable items (`N = tdfs4ds.QUERY_OPTIMIZER_MAX_CANDIDATES`, default 5)
5	Candidate EXPLAIN — `document_sql_query_explain` run per candidate
6	Plan comparison — LLM selects the best plan by score delta and resolved warnings
7	FilterManager check — partitioned-but-unfiltered objects are flagged for incremental processing
8	Final report — Markdown with Score Summary, Simplification section, 3-stage query comparison (Input → After Simplification → After Optimization), Candidates, Selected Optimisation, FilterManager

Quick start

import tdfs4ds

# Configure LLM — vllm / OpenAI / Azure / Bedrock
tdfs4ds.INSTRUCT_MODEL_PROVIDER = 'vllm'   # 'openai' does not require INSTRUCT_MODEL_URL
tdfs4ds.INSTRUCT_MODEL_MODEL    = '...'
tdfs4ds.INSTRUCT_MODEL_API_KEY  = '...'
# tdfs4ds.INSTRUCT_MODEL_URL = '...'  # required for vllm/azure; omit for openai/bedrock

tdfs4ds.QUERY_OPTIMIZER_MAX_CANDIDATES       = 5     # max valid rewrites to evaluate (default 5)
tdfs4ds.QUERY_OPTIMIZER_MAX_FAILURES         = 5     # max EXPLAIN/syntax failures before stopping (default 5)
tdfs4ds.QUERY_OPTIMIZER_MAX_CANDIDATE_TOKENS = None  # cap completion tokens per candidate call (None = model default; set e.g. 4096 for small-context models)

# Process-aware — SQL and stored EXPLAIN docs pulled from the catalog
result = tdfs4ds.query_optimizer(process_id='<UUID>', thread_id='session-1')

# Or pass raw SQL directly
result = tdfs4ds.query_optimizer(
    sql_query="SELECT ... FROM db.tbl",
    thread_id='session-1',
)

Multi-turn follow-ups sharing the same thread_id use the MemorySaver singleton — the agent retrieves the SQL from history and re-runs the pipeline with additional context:

result = tdfs4ds.query_optimizer(
    "Would adding a Secondary Index on the join column improve the plan?",
    thread_id='session-1',
)

Inside a Jupyter notebook use the async entry point to avoid background-thread overhead:

from tdfs4ds.agent import aquery_optimizer

result = await aquery_optimizer(process_id='<UUID>', thread_id='session-async')

Result keys

Key	Content
`answer`	Structured Markdown optimisation report (Summary, Score Summary, Lineage, …)
`best_sql`	Recommended SQL (original if already optimal)
`score_delta`	Before/after comparison of scores + execution metrics — see below
`original_analysis`	Scored EXPLAIN: `explanation`, `user_score`, `global_score`, `n_steps`, `n_spool_objects`, `warnings`, `recommendations`
`candidates`	List of candidate dicts: `sql`, `strategy`, `rationale`, `analysis`
`comparison`	Plan comparison: `best_index`, `reasoning`, `business_logic_preserved`
`filtermanager_applicable`	`True` if a FilterManager loop was recommended
`process_info`	Full process catalog record (when `process_id` is supplied)
`steps`	Every pipeline step with inputs and outputs

Score comparison (`score_delta`)

After each optimization run, score_delta captures exactly how much the rewrite improved the query — across both LLM-assessed scores and deterministic execution-plan metrics:

sd = result['score_delta']
# {
#   'optimized':                True,
#   'best_strategy':            'partition_pruning',
#   'original_user_score':      2,
#   'original_global_score':    3,
#   'best_user_score':          4,
#   'best_global_score':        4,
#   'user_score_delta':         2.0,   # +2 improvement
#   'global_score_delta':       1.0,   # +1 improvement
#   'original_n_steps':         14,
#   'best_n_steps':             9,
#   'steps_delta':              -5,    # 5 fewer execution steps
#   'original_n_spools':        6,
#   'best_n_spools':            4,
#   'spools_delta':             -2,    # 2 fewer spool materialisations
#   'business_logic_preserved': True,
# }

Four signals are reported side by side:

user_score (1–5) — quality of what the SQL author controls.
global_score (1–5) — overall plan quality, including infrastructure factors (Primary Index placement, statistics, etc.). A rewrite only improves user_score, never global_score, if the bottleneck is infrastructure rather than the SQL itself.
n_steps — number of numbered execution steps in the Teradata EXPLAIN plan. Parsed deterministically from the raw EXPLAIN text.
n_spool_objects — number of distinct Spool objects the plan materialises. A rough proxy for intermediate-result memory/IO pressure.

Negative steps_delta / spools_delta mean the rewrite is lighter than the baseline. When the rewrite improves the score but adds a small number of steps or spools (typically single-row CTEs backing a precomputed threshold), the report appends an explanatory note describing the trade-off.

The optimizer Score Summary table in the generated report always shows all four metrics as Before → After (or Input → Simplified → Optimized when the simplification pass changed the SQL).

Standalone simplification

The simplification pass can be called independently of the full optimizer:

result = tdfs4ds.simplify_query(sql_query="SELECT ...")
# or from a registered process
result = tdfs4ds.simplify_query(process_id='<UUID>')

# result keys: simplified_sql, original_sql, simplified (bool),
#              original_score (1-5), simplified_score (1-5)
if result["simplified"]:
    print(result["simplified_sql"])

Notebook display

display_optimization_result renders a score comparison widget at the top of the report, followed by the full Markdown analysis. The widget shows a before/after table with colour-coded Δ Change cells (green for improvement, red for regression) and a footer line for strategy, optimization status, and business-logic preservation.

from tdfs4ds.agent import display_optimization_result

display_optimization_result(result)

FilterManager integration

When filtermanager_applicable is True, the report includes a ready-to-use code snippet for iterating over partitions one at a time — reducing per-run spool usage and enabling full partition elimination:

fm = tdfs4ds.FilterManager(
    schema_name = tdfs4ds.SCHEMA,
    view_name   = 'TRANSACTIONS',
    col_names   = ['transaction_date'],
)

for filter_id in range(fm.nb_filters):
    fm.update(filter_id)
    tdfs4ds.run(process_id)

Notebook

11 - Query Optimizer Agent with tdfs4ds.ipynb (notebook dev/genai/) walks through the full pipeline end-to-end — process-aware entry, multi-turn follow-ups, score comparison widget, and FilterManager recommendation.

Discover Registered Features

from tdfs4ds.feature_store.feature_query_retrieval import (
    get_list_entity,
    get_list_features,
    get_available_features,
    get_feature_versions,
)

Lineage

The lineage module builds end-to-end dependency graphs from a SQL query or a dataset view DDL.

Dependency graph

from tdfs4ds.lineage import build_teradata_dependency_graph, plot_lineage_sankey, show_plotly_robust

# Start from a dataset view DDL (obtained via SHOW VIEW)
sql = tdml.execute_sql("SHOW VIEW DATASET_CUSTOMER").fetchall()[0][0]

graph = build_teradata_dependency_graph(sql_query=sql)
# Returns: {"nodes": {...}, "edges": [...], "roots": [...]}

By default (expand_datasets_via_process_catalog=True) dataset nodes are resolved through the process catalog: FEATURE_VERSION UUIDs embedded in the dataset DDL are matched to PROCESS_ID entries in FS_V_PROCESS_CATALOG, and edges are drawn directly to the registered feature-engineering views.

DATASET_CUSTOMER  →  FEAT_ENG_CUST  →  DB_SOURCE.TRANSACTIONS

Set expand_datasets_via_process_catalog=False to connect the dataset directly to the raw feature-store storage tables (previous behaviour).

fig = plot_lineage_sankey(graph, title="Customer Dataset Lineage")
show_plotly_robust(fig)

Migration manifest

graph_to_migration_manifest converts any lineage graph into a flat, JSON-serialisable dict — useful for planning a feature store migration.

from tdfs4ds.lineage import graph_to_migration_manifest
import json

# All databases
manifest = graph_to_migration_manifest(graph)

# Scoped to the feature store schema only (cross-boundary edges excluded)
manifest_fs = graph_to_migration_manifest(graph, filter_database=tdfs4ds.SCHEMA)
print(json.dumps(manifest_fs, indent=2))
# {
#   "views":  [{"database": "demo_user", "name": "DATASET_CUSTOMER", "type": "dataset"},
#              {"database": "demo_user", "name": "FEAT_ENG_CUST",    "type": "view"}],
#   "tables": [],
#   "edges":  [{"from": "demo_user.DATASET_CUSTOMER", "to": "demo_user.FEAT_ENG_CUST"}]
# }

with open("migration_manifest.json", "w") as f:
    json.dump(manifest_fs, f, indent=2)

Claude Code Skills

tdfs4ds ships 22 bundled SKILL.md files organized in four families that teach Claude Code (and compatible agents) how to drive the feature store end-to-end — from first connection to consumer chatbot. This includes:

10 workflow skills (fs-*) — core feature store operations
1 Teradata utility (td-explain) — EXPLAIN visualization
3 query optimizer skills (qo-skill-*) with 11 nested LLM prompts
9 consumer-agent reference skills (ca-*) that document each intent of the conversational agent

Installing after `pip install tdfs4ds`

Convenience functions — recommended for most users:

import tdfs4ds

# Install globally (all projects on this machine)
tdfs4ds.install_skills_global()

# Or install locally (just this project — put .claude/skills/ in git)
tdfs4ds.install_skills_local()

# Install a specific subset of skills
tdfs4ds.install_skills_global(skills=['fs-setup', 'fs-upload', 'ca-search'])
tdfs4ds.install_skills_local(skills=['fs-document', 'fs-lineage', 'fs-analyze'])

Lower-level function — when you need custom target directories:

# Copy all skills to a custom location (e.g. ~/.config/mycli/skills/)
tdfs4ds.install_skills(target_dir='~/.config/mycli/skills')

# Only add skills that do not exist yet (safe for shared project dirs)
tdfs4ds.export_skills('.claude/skills')        # overwrite=False by default

# Install a specific subset into a custom location
tdfs4ds.install_skills(target_dir='./backup/skills', skills=['fs-setup', 'fs-upload'])

Or from the command line with the venv active:

# User-level (all projects on this machine)
tdfs4ds-install-skills

# Project-level
tdfs4ds-install-skills --target .claude/skills

# Skip skills already present
tdfs4ds-install-skills --target .claude/skills --no-overwrite

# List bundled skill names
tdfs4ds-install-skills --list

Skill catalogue

Feature Store Workflow Skills (fs-*)

Skill	Purpose
`fs-setup`	Connect to Teradata, run `setup()`, activate a data domain
`fs-upload`	Engineer features in SQL / teradataml, register with `upload_features`
`fs-filter`	Segmented ingestion with `FilterManager` (standard, hybrid, clone)
`fs-rollout`	Backfill across a date range with `TimeManager` + `roll_out`
`fs-dataset`	Resolve feature versions, build a denormalised dataset view
`fs-inspect`	Browse process / feature / dataset catalogs and follow-up table
`fs-document`	LLM process documentation + EXPLAIN score 1–5
`fs-lineage`	Build a dependency graph and render a Sankey diagram
`fs-analyze`	Scalability analysis — EXPLAIN + partition opportunities + FilterManager proposal
`fs-agent`	Launch the consumer agent chatbot (LangGraph + Chroma + Gradio)

Teradata Utilities (td-*)

Skill	Purpose
`td-explain`	Interactive HTML flow diagram from any raw Teradata EXPLAIN output

Query Optimization Skills (qo-skill-* with nested prompts)

Skill	Purpose	Nested Prompts
`qo-skill-explain`	Analyze EXPLAIN plans and score query quality (1–5)	3 prompts: EXPLAIN analysis, false-positive classification, SQL documentation
`qo-skill-simplify`	Flatten SQL nesting and remove redundant wrappers	2 prompts: simplification logic, syntax repair loop
`qo-skill-optimize`	Full optimization pipeline: EXPLAIN → strategies → candidates → best plan	6 prompts: strategies, generation, refinement, candidate comparison, summary, fallback generation

Consumer-agent reference skills (also used by build_graph() to configure the agent at runtime)

Skill	Intent	Trigger example
`ca-search`	`SEARCH`	"What features measure customer spending?"
`ca-definition`	`DEFINITION`	"What is `total_amount`?"
`ca-usage`	`USAGE`	"How do I use `avg_amount` in Tableau?"
`ca-freshness`	`FRESHNESS`	"Is `total_amount` up to date?"
`ca-summary`	`SUMMARY`	"Give me a snapshot of the feature store"
`ca-lineage`	`LINEAGE`	"Where does `total_amount` come from?"
`ca-explain`	`EXPLAIN`	"How is `total_amount` calculated?"
`ca-dataset`	`DATASET`	"Which dataset exposes `total_amount`?"

Sharing skills with teammates

Choose between global (user-level) and local (project-level) installation based on your workflow:

Global installation — shared across all projects on this machine:

tdfs4ds.install_skills_global()
# Skills available in ~/.claude/skills/ for Claude Code, IDE extensions, and CLI

Use this when:

You work solo or each team member manages their own Claude Code environment
You want one-time setup and skills available everywhere
You don't need to version-control the skills with the project

Local installation — one-per-project, version-controlled in git:

tdfs4ds.install_skills_local()
# Skills available in .claude/skills/ (commit to git for team sharing)

Use this when:

You have a shared git repository and want teammates to get skills automatically on git pull
Your team has customized skills (edited SKILL.md files) that should be tracked
You want skills pinned to a specific package version

Function	Target	Scope	Commit to git?
`install_skills_global()`	`~/.claude/skills/`	All projects for this user	No — personal setup
`install_skills_local()`	`.claude/skills/`	This project only	Yes — shared with team

Requirements

Python >= 3.6
teradataml >= 17.20
Active Teradata Vantage connection
VALIDTIME temporal tables must be enabled on the Teradata Vantage system — all feature catalogs, process catalogs, and feature stores rely on VALIDTIME support

Project details

Release history Release notifications | RSS feed

0.3.1.35

May 21, 2026

0.3.1.34

May 21, 2026

0.3.1.33

May 21, 2026

0.3.1.32

May 21, 2026

0.3.1.31

May 21, 2026

0.3.1.30

May 21, 2026

0.3.1.29

May 21, 2026

0.3.1.28

May 20, 2026

0.3.1.27

May 20, 2026

0.3.1.26

May 20, 2026

0.3.1.25

May 20, 2026

0.3.1.24

May 19, 2026

0.3.1.23

May 19, 2026

0.3.1.22

May 19, 2026

0.3.1.21

May 19, 2026

This version

0.3.1.20

May 17, 2026

0.3.1.18

May 17, 2026

0.3.1.17

May 17, 2026

0.3.1.16

May 17, 2026

0.3.1.15

May 17, 2026

0.3.1.14

May 16, 2026

0.3.1.12

May 16, 2026

0.3.1.11

May 16, 2026

0.3.1.10

May 16, 2026

0.3.1.9

May 16, 2026

0.3.1.6

May 16, 2026

0.3.1.5

May 16, 2026

0.3.1.4

May 15, 2026

0.3.1.3

May 15, 2026

0.3.1.2

May 15, 2026

0.3.1.1

May 15, 2026

0.3.1.0

May 15, 2026

0.3.0.9

May 15, 2026

0.3.0.8

May 15, 2026

0.3.0.6

May 15, 2026

0.3.0.5

May 15, 2026

0.3.0.4

May 15, 2026

0.3.0.3

May 15, 2026

0.3.0.2

May 15, 2026

0.3.0.1

May 15, 2026

0.3.0.0

May 15, 2026

0.2.9.8

May 13, 2026

0.2.9.7

May 12, 2026

0.2.9.1

Apr 30, 2026

0.2.9.0

Apr 17, 2026

0.2.8.1

Apr 16, 2026

0.2.8.0

Apr 14, 2026

0.2.7.5

Apr 14, 2026

0.2.7.2

Apr 10, 2026

0.2.7.1

Apr 10, 2026

0.2.7.0

Apr 9, 2026

0.2.6.5

Apr 7, 2026

0.2.6.4

Apr 3, 2026

0.2.6.2

Mar 26, 2026

0.2.6.1

Mar 26, 2026

0.2.6.0

Mar 19, 2026

0.2.5.6

Feb 6, 2026

0.2.5.5

Feb 5, 2026

0.2.5.4

Jan 21, 2026

0.2.5.3

Jan 21, 2026

0.2.5.2

Jan 19, 2026

0.2.5.1

Jan 19, 2026

0.2.5.0

Jan 19, 2026

0.2.4.47

Dec 16, 2025

0.2.4.46

Dec 8, 2025

0.2.4.45

Nov 23, 2025

0.2.4.44

Nov 21, 2025

0.2.4.43

Nov 21, 2025

0.2.4.42

Nov 4, 2025

0.2.4.41

Nov 4, 2025

0.2.4.40

Oct 29, 2025

0.2.4.39

Oct 28, 2025

0.2.4.38

Oct 27, 2025

0.2.4.37

Oct 27, 2025

0.2.4.36

Oct 27, 2025

0.2.4.35

Oct 24, 2025

0.2.4.34

Oct 24, 2025

0.2.4.33

Oct 23, 2025

0.2.4.32

Oct 21, 2025

0.2.4.31

Oct 14, 2025

0.2.4.30

Sep 30, 2025

0.2.4.29

Sep 22, 2025

0.2.4.28

Sep 22, 2025

0.2.4.27

Sep 17, 2025

0.2.4.26

Sep 17, 2025

0.2.4.25

Sep 5, 2025

0.2.4.24

Aug 1, 2025

0.2.4.23

Aug 1, 2025

0.2.4.22

Jul 31, 2025

0.2.4.21

Jul 31, 2025

0.2.4.20

Jul 31, 2025

0.2.4.19

Jul 30, 2025

0.2.4.18

Jul 30, 2025

0.2.4.17

Jun 30, 2025

0.2.4.16

Jun 12, 2025

0.2.4.15

May 19, 2025

0.2.4.14

May 19, 2025

0.2.4.13

Mar 31, 2025

0.2.4.12

Feb 13, 2025

0.2.4.11

Feb 11, 2025

0.2.4.10

Feb 11, 2025

0.2.4.9

Feb 11, 2025

0.2.4.8

Feb 10, 2025

0.2.4.7

Feb 10, 2025

0.2.4.6

Feb 5, 2025

0.2.4.5

Feb 5, 2025

0.2.4.4

Feb 3, 2025

0.2.4.3

Feb 3, 2025

0.2.4.2

Feb 3, 2025

0.2.4.1

Feb 3, 2025

0.2.4.0

Jan 29, 2025

0.2.3.26

Jan 24, 2025

0.2.3.25

Jan 21, 2025

0.2.3.24

Jan 15, 2025

0.2.3.23

Jan 10, 2025

0.2.3.22

Jan 9, 2025

0.2.3.21

Jan 9, 2025

0.2.3.20

Jan 9, 2025

0.2.3.19

Jan 9, 2025

0.2.3.18

Dec 19, 2024

0.2.3.17

Dec 18, 2024

0.2.3.16

Dec 18, 2024

0.2.3.15

Dec 4, 2024

0.2.3.14

Dec 4, 2024

0.2.3.13

Dec 4, 2024

0.2.3.12

Dec 4, 2024

0.2.3.11

Dec 3, 2024

0.2.3.10

Dec 3, 2024

0.2.3.9

Nov 27, 2024

0.2.3.8

Nov 27, 2024

0.2.3.7

Nov 18, 2024

0.2.3.6

Nov 14, 2024

0.2.3.5

Nov 14, 2024

0.2.3.4

Nov 14, 2024

0.2.3.3

Nov 14, 2024

0.2.3.2

Nov 13, 2024

0.2.3.1

Nov 13, 2024

0.2.3.0

Nov 13, 2024

0.2.2.85

Nov 13, 2024

0.2.2.84

Nov 4, 2024

0.2.2.83

Nov 4, 2024

0.2.2.82

Nov 4, 2024

0.2.2.81

Oct 30, 2024

0.2.2.80

Oct 29, 2024

0.2.2.79

Oct 29, 2024

0.2.2.78

Oct 28, 2024

0.2.2.77

Oct 28, 2024

0.2.2.76

Oct 25, 2024

0.2.2.75

Oct 25, 2024

0.2.2.74

Oct 25, 2024

0.2.2.73

Oct 25, 2024

0.2.2.72

Oct 15, 2024

0.2.2.71

Oct 3, 2024

0.2.2.70

Oct 3, 2024

0.2.2.69

Sep 25, 2024

0.2.2.68

Sep 25, 2024

0.2.2.67

Jul 18, 2024

0.2.2.66

Jul 17, 2024

0.2.2.65

Jul 10, 2024

0.2.2.64

Jul 10, 2024

0.2.2.63

Jul 10, 2024

0.2.2.62

Jul 10, 2024

0.2.2.61

Jul 10, 2024

0.2.2.60

Jul 8, 2024

0.2.2.59

Jul 6, 2024

0.2.2.58

Jul 6, 2024

0.2.2.57

Jul 6, 2024

0.2.2.56

Jul 6, 2024

0.2.2.55

Jul 5, 2024

0.2.2.54

Jul 5, 2024

0.2.2.53

Jul 5, 2024

0.2.2.52

Jul 5, 2024

0.2.2.51

Jul 5, 2024

0.2.2.50

Jul 4, 2024

0.2.2.49

Jul 3, 2024

0.2.2.48

Jul 3, 2024

0.2.2.47

Jul 3, 2024

0.2.2.46

Jun 28, 2024

0.2.2.45

Jun 28, 2024

0.2.2.44

Jun 28, 2024

0.2.2.43

Jun 28, 2024

0.2.2.42

Jun 28, 2024

0.2.2.41

Jun 27, 2024

0.2.2.40

Jun 27, 2024

0.2.2.39

Jun 27, 2024

0.2.2.38

Jun 27, 2024

0.2.2.37

Jun 27, 2024

0.2.2.36

Jun 19, 2024

0.2.2.35

Jun 17, 2024

0.2.2.34

Jun 14, 2024

0.2.2.33

Jun 13, 2024

0.2.2.32

Jun 13, 2024

0.2.2.31

Jun 12, 2024

0.2.2.30

Jun 10, 2024

0.2.2.29

Jun 10, 2024

0.2.2.28

May 29, 2024

0.2.2.27

May 21, 2024

0.2.2.26

May 21, 2024

0.2.2.25

May 21, 2024

0.2.2.24

May 21, 2024

0.2.2.23

May 21, 2024

0.2.2.22

May 16, 2024

0.2.2.21

May 16, 2024

0.2.2.20

May 14, 2024

0.2.2.19

May 14, 2024

0.2.2.18

May 14, 2024

0.2.2.17

May 14, 2024

0.2.2.16

May 14, 2024

0.2.2.15

Apr 26, 2024

0.2.2.14

Apr 25, 2024

0.2.2.13

Apr 11, 2024

0.2.2.12

Apr 11, 2024

0.2.2.11

Apr 8, 2024

0.2.2.10

Apr 5, 2024

0.2.2.8

Mar 27, 2024

0.2.2.7

Mar 26, 2024

0.2.2.6

Mar 26, 2024

0.2.2.5

Mar 19, 2024

0.2.2.4

Mar 19, 2024

0.2.2.3

Mar 14, 2024

0.2.2.2

Mar 14, 2024

0.2.2.1

Mar 8, 2024

0.2.2.0

Mar 8, 2024

0.2.1.28

Mar 4, 2024

0.2.1.27

Mar 4, 2024

0.2.1.26

Feb 15, 2024

0.2.1.25

Feb 15, 2024

0.2.1.24

Feb 14, 2024

0.2.1.23

Feb 14, 2024

0.2.1.22

Feb 14, 2024

0.2.1.21

Feb 13, 2024

0.2.1.20

Feb 13, 2024

0.2.1.19

Feb 13, 2024

0.2.1.18

Feb 13, 2024

0.2.1.17

Feb 12, 2024

0.2.1.16

Feb 12, 2024

0.2.1.15

Feb 12, 2024

0.2.1.14

Feb 9, 2024

0.2.1.13

Feb 9, 2024

0.2.1.12

Feb 8, 2024

0.2.1.11

Feb 7, 2024

0.2.1.9

Feb 7, 2024

0.2.1.8

Feb 6, 2024

0.2.1.7

Feb 6, 2024

0.2.1.6

Feb 5, 2024

0.2.1.5

Feb 5, 2024

0.2.1.4

Feb 5, 2024

0.2.1.3

Feb 5, 2024

0.2.1.2

Feb 5, 2024

0.2.1.1

Feb 2, 2024

0.2.0.1

Feb 2, 2024

0.1.0.26

Jan 29, 2024

0.1.0.25

Jan 18, 2024

0.1.0.24

Jan 18, 2024

0.1.0.22

Jan 15, 2024

0.1.0.21

Jan 10, 2024

0.1.0.20

Dec 22, 2023

0.1.0.19

Dec 22, 2023

0.1.0.18

Dec 21, 2023

0.1.0.17

Dec 20, 2023

0.1.0.16

Dec 20, 2023

0.1.0.15

Dec 20, 2023

0.1.0.14

Dec 19, 2023

0.1.0.13

Dec 19, 2023

0.1.0.12

Dec 5, 2023

0.1.0.11

Dec 1, 2023

0.1.0.10

Dec 1, 2023

0.1.0.9

Nov 30, 2023

0.1.0.8

Nov 15, 2023

0.1.0.7

Nov 10, 2023

0.1.0.6

Sep 13, 2023

0.1.0.5

Sep 13, 2023

0.1.0.4

Sep 13, 2023

0.1.0.3

Sep 12, 2023

0.1.0.2

Sep 11, 2023

0.1.0.1

Jul 5, 2023

0.1.0.0

Jul 5, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tdfs4ds-0.3.1.20-py3-none-any.whl (747.2 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file tdfs4ds-0.3.1.20-py3-none-any.whl.

File metadata

Download URL: tdfs4ds-0.3.1.20-py3-none-any.whl
Upload date: May 17, 2026
Size: 747.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for tdfs4ds-0.3.1.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3931e9e767a28d14eacf19778cf2af3e03dfbe3503b2ca5cd9ec3201a6e35938`
MD5	`14137dc7070776d51738c6941f26b6b4`
BLAKE2b-256	`fa96d9aa93c525309dc4016224c046d1df2624f8dd583541c3e17c4d76c9c04c`

See more details on using hashes here.

tdfs4ds 0.3.1.20

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

tdfs4ds — A Feature Store Library for Data Scientists working with ClearScape Analytics

Installation

Quick Start

Core API

entity_id must specify SQL data types (dict, not list)

Walkthrough Example

Step 1 — Set up a feature store

Step 2 — Configure the active context

Step 3 — Define your feature engineering view

Step 4 — Upload and operationalize

Step 5 — Re-run a process

Step 6 — Build a dataset

Configuration

Programmatic (in-session)

Config file (persistent per-project or per-user)

.env file (local secrets and overrides)

Environment variables

load_config() — explicit reload

Priority chain

Time Travel

Package Structure

GenAI Documentation

LLM-powered process documentation

LLM-powered dataset documentation

Full-store documentation in one call

Business dictionary (no LLM required)

Consumer Agent (Chatbot)

Architecture

Quick start

Skills

MCP Tools (optional)

Plugin skills

Notebooks

Gradio trace panel

Model listing

Query Optimizer Agent

Pipeline (9 steps)

Quick start

Result keys

Score comparison (score_delta)

Standalone simplification

Notebook display

FilterManager integration

Notebook

Discover Registered Features

Lineage

Dependency graph

Migration manifest

Claude Code Skills

Installing after pip install tdfs4ds

Skill catalogue

Sharing skills with teammates

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`entity_id` must specify SQL data types (dict, not list)

`.env` file (local secrets and overrides)

`load_config()` — explicit reload

Score comparison (`score_delta`)

Installing after `pip install tdfs4ds`