Sema4.ai Data
Project description
⚡️ sema4ai-data
Python library to develop data packages for Sema4.ai. Build powerful data-driven actions that can query databases and work with various data sources.
This library is designed to work with Sema4.ai Data Server, which is included in the Sema4.ai Data Access VSCode extension.
Installation
pip install sema4ai-data
Quick Start
from typing import Annotated
from sema4ai.data import query, DataSource, DataSourceSpec
from sema4ai.actions import Response, Table
# Define a data source
PostgresDataSource = Annotated[DataSource, DataSourceSpec(
name="my_postgres_db",
engine="postgres",
description="Main PostgreSQL database"
)]
# Create a data query
@query
def get_users(datasource: PostgresDataSource, limit: int = 10) -> Response[Table]:
"""Get users from the database."""
result = datasource.query("SELECT * FROM `my_postgres_db`.users LIMIT 5", [limit])
return Response(result=result.to_table())
Core Concepts
DataSource
The DataSource class is the main interface for executing queries against configured data sources. It's automatically injected by the framework when you use the @query decorator.
Key Methods:
query(sql, params=None)- Execute SQL queries with optional parametersnative_query(sql, params=None)- Execute engine-specific queriesconnection()- Get the underlying data server connection
DataSourceSpec
Used to specify the configuration of a data source through type annotations:
from typing import Annotated
from sema4ai.data import DataSource, DataSourceSpec
# Database data source
DatabaseSource = Annotated[DataSource, DataSourceSpec(
name="my_database",
engine="postgres", # or "mysql", "sqlite", etc.
description="Production database"
)]
# File-based data source
FileSource = Annotated[DataSource, DataSourceSpec(
engine="files",
file="data/customers.csv",
created_table="customers",
description="Customer data from CSV"
)]
# Knowledge base for semantic search
KnowledgeBaseSource = Annotated[DataSource, DataSourceSpec(
name="company_kb",
engine="sema4_knowledge_base",
description="Company knowledge base for semantic search"
)]
Parameters:
engine(required) - The data source engine typename- Name of the data sourcedescription- Human-readable descriptionfile- File path for file-based sourcescreated_table- Table name created from filessetup_sql- SQL commands to run on setupsetup_sql_files- SQL files to execute on setup
Decorators
@query
The main decorator for creating data queries that can be executed by sema4ai actions:
from sema4ai.data import query
from sema4ai.actions import Response, Table
@query
def get_countries(datasource: PostgresCustomersDataSource) -> str:
sql = """
SELECT distinct(country)
FROM public_demo.demo_customers
LIMIT 100;
"""
result = datasource.query(sql)
return result.to_markdown()
Parameters:
is_consequential- Whether the action has side effects or updates a resource (default: False)display_name- Custom display name for the action
@predict ⚠️ DEPRECATED
Note: The @predict decorator is deprecated as of version 1.0.3. Use @query instead for all operations including predictions.
# OLD (deprecated):
@predict
def predict_something(datasource: SomeDataSource):
pass
# NEW (recommended):
@query
def predict_something(datasource: SomeDataSource):
pass
ResultSet
The ResultSet class represents query results and provides various methods to work with the data:
# Convert to different formats
result = datasource.query("SELECT * FROM `my_database`.users")
# As a table for actions
table = result.to_table()
# As a list of dictionaries
dicts = result.to_dict_list()
# As structured objects
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
users = result.build_list(User)
# Iterate over results
for row_dict in result.iter_as_dicts():
print(row_dict)
for row_tuple in result.iter_as_tuples():
print(row_tuple)
Basic Database Query
from typing import Annotated
from pydantic import BaseModel
from sema4ai.data import query, DataSource, DataSourceSpec
from sema4ai.actions import Response
class Product(BaseModel):
id: int
name: str
price: float
category: str
ProductDB = Annotated[DataSource, DataSourceSpec(
name="products",
engine="postgres",
description="Product catalog database"
)]
@query
def search_products(
category: str,
max_price: float,
datasource: ProductDB
) -> Response[list[Product]]:
"""Search products by category and price."""
result = datasource.query(
"""
SELECT id, name, price, category
FROM products.products
WHERE category = ? AND price <= ?
ORDER BY price ASC
""",
[category, max_price]
)
return Response(result=result.build_list(Product))
File-based Data Source
SalesData = Annotated[DataSource, DataSourceSpec(
engine="files",
file="data/sales_2024.csv",
created_table="sales",
description="Sales data for 2024"
)]
@query
def monthly_sales_report(
month: int,
datasource: SalesData
) -> Response[Table]:
"""Generate monthly sales report."""
result = datasource.query(
"""
SELECT
product_category,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM files.sales
WHERE MONTH(sale_date) = ?
GROUP BY product_category
ORDER BY total_sales DESC
""",
[month]
)
return Response(result=result.to_table())
Knowledge Base Search
KnowledgeBase = Annotated[DataSource, DataSourceSpec(
name="company_kb",
engine="sema4_knowledge_base",
description="Company knowledge base for semantic search"
)]
@query
def search_knowledge(
query_text: str,
relevance_threshold: float = 0.7,
datasource: KnowledgeBase
) -> Response[Table]:
"""Search company knowledge base."""
result = datasource.query(
"""
SELECT chunk_content, relevance_score, document_name
FROM company_kb
WHERE content = ? AND relevance_threshold = ?
ORDER BY relevance_score DESC
LIMIT 5
""",
[query_text, relevance_threshold]
)
return Response(result=result.to_table())
Using native_query for Engine-Specific Syntax
@query
def get_user_by_id(
user_id: int,
datasource: MyDataSource
) -> Response[Table]:
"""Get user using native SQL syntax."""
# Uses engine-specific syntax, automatically wrapped
result = datasource.native_query(
"SELECT * FROM user_info WHERE id = $id",
{"id": user_id}
)
return Response(result=result.to_table())
API Reference
Functions
query(func=None, *, is_consequential=None, display_name=None)
Decorator for creating query actions.
predict(func=None, *, is_consequential=None, display_name=None) ⚠️ DEPRECATED
Deprecated: Use @query instead. This decorator is deprecated as of version 1.0.3.
get_connection() -> DataServerConnection
Get a connection to the data server.
metadata(package_root: Path) -> dict
Get metadata about data sources in a package.
get_snowflake_connection_details()
Get Snowflake-specific connection configuration.
Classes
DataSource
Main interface for executing queries against data sources.
Methods:
query(sql: str, params: list = None) -> ResultSetnative_query(sql: str, params: dict = None) -> ResultSetconnection() -> DataServerConnection
Properties:
datasource_name: str- Name of the data source
DataSourceSpec
Configuration specification for data sources.
ResultSet
Container for query results with conversion methods.
Methods:
to_table() -> Table- Convert to sema4ai Tableto_dict_list() -> list[dict]- Convert to list of dictionariesbuild_list(item_class: type[T]) -> list[T]- Build typed object listiter_as_dicts() -> Iterator[dict]- Iterate as dictionariesiter_as_tuples() -> Iterator[tuple]- Iterate as tuplesto_pandas_df() -> pd.DataFrame- Convert to pandas DataFrameto_markdown_table() -> str- Convert to markdown table
Data Models
SourceInfo
Information about a data source configuration.
TableInfo
Metadata about database tables.
ColumnInfo
Information about table columns.
KnowledgeBaseInfo
Metadata about knowledge base configurations.
Changelog
Unreleased
1.2.2 - 2025-12-18
- CVE fixes
1.2.1 - 2025-10-30
- Fix to the performance hit in creating snowflake connection.
1.2.0 - 2025-10-30
- Bringing Snowflake connection and execute query functions into the library to reduce load on customer codes.
get_snowflake_connectionexecute_snowflake_queryget_snowflake_connection_detailsget_snowflake_rest_api_headersget_snowflake_rest_api_headers
- Support for Snowflake OAuth linking via Sema4.ai Studio
1.1.0 - 2025-10-21
- Add support for Snowflake
SNOWFLAKE_OAUTH_PARTNERandSNOWFLAKE_OAUTH_CUSTOMauth type.
1.0.10 - 2025-09-08
- Fix
KnowledgeBaseInfoparams optionality
1.0.9 - 2025-09-08
- Implement
_get_datasource_infoprivate method onDataServerConnectionclass
1.0.8 - 2025-08-21
- CVE updates
- Expose the underlying SQL error when running an query
1.0.7 - 2025-07-28
- Improve readme and add changelog when publishing to pypi
1.0.6 - 2025-06-18
- Simplify error message on
run_sqlfunction call.
1.0.5 - 2025-05-20
- Allow extra fields in
sf-auth.jsonwithout changing behaviour ofget_snowflake_connection_details.
1.0.4 - 2025-05-13
- Add
sema4_knowledge_baseengine to support knowledge base as a data source
1.0.3 - 2025-04-24
- Add deprecation warning for
@predictdecorator andDataServerConnection.predictmethod as Lightwood is being phased out for data server predictions. Use@queryorconnection.query()instead. - Update to latest
sema4ai-actionsversion
1.0.2 - 2025-03-06
- Fix Snowflake local auth file path for Windows
1.0.1 - 2025-02-28
- Fix to the private key passphrase hanling
1.0.0 - 2025-02-25
- Add
private_key_file_pwdto snowflake connection details when it exists in auth config file SnowflakeAuthenticationErrornow inherits fromActionError.
0.1.0 - 2025-02-18
- Added
native_query()method which will automatically wrap the query in aSELECT * FROM <datasource_name> (<query>)clause so that the query can be executed in the native SQL syntax of the data source instead of the syntax required by the data server. - If no parameters are provided, the query is returned as is (even if parameters are detected in the query -- added so that the user can do the escaping themselves if needed if the SQL syntax accepts the parameters in a different way).
0.0.9 - 2025-02-14
- Correct the local authentication JSON file path for Snowflake in get_snowflake_connection_details
0.0.8 - 2025-02-14
- Add
get_snowflake_connection_detailshelper function to get the connection details for Snowflake.
0.0.7 - 2025-02-06
- Corrected typo in
ColumInfo. - Updated
list_knowledge_basesmethod to returnKnowledgeBaseInfo.
0.0.6 - 2025-01-31
- Add data utilitary methods to
DataServerConnection
0.0.5 - 2024-12-20
- Added
execute_sql()to theDataSourceclass.
0.0.4 - 2024-12-19
- New utility methods for the
ResultSetclass:to_dataframe()(alias foras_dataframe)to_table()(creates aTableobject that can be used to build a structured response)to_dict_list()(returns a list of dictionaries)__iter__()(same asiter_as_dicts)__len__()
- Retry login if the server returns a 401 error.
- Retry SQL requests (once) if the server returns an unexpected error (as it may be a transient error).
- Added
sema4ai.data.get_connection()to get the configured connection to the data server. - Backward incompatible change: The queries/predictions must always use the full data source name to access a table and not just the table name
regardless of the data source name configured in the
DataSourceSpec. i.e.: SQL likeSELECT * FROM my_datasource.my_tableis required instead ofSELECT * FROM my_table.
0.0.3 - 2024-11-27
- Using REST API instead of PyMySQL.
- ResultSet APIs (provisional):
iter_as_dicts()(new in 0.0.3)iter_as_tuples()(new in 0.0.3)as_dataframe()(new in 0.0.1)build_list(item_class)(new in 0.0.1)to_markdown()(new in 0.0.1)
0.0.2 - 2024-11-25
- Changed metadata format to have
_instead of-in names. - Made
defined_at/filein metadata relative. - Added support for
setup_sql_filesinDataSourceSpec. - Default datasource named
modelsis used for custom and prediction engines.
0.0.1 - 2024-11-18
- Initial release
- Added API:
from sema4ai.data import queryto mark function as@queryfrom sema4ai.data import predictto mark function as@predictfrom sema4ai.data import DataSourceto define a data sourcefrom sema4ai.data import DataSourceSpecto define a data source specification using anAnnotatedtype
License
See LICENSE - Sema4.ai End User License Agreement
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sema4ai_data-1.2.2.tar.gz.
File metadata
- Download URL: sema4ai_data-1.2.2.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.12 Linux/6.8.0-1044-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a192c8252dff847dc2ec279917d7a3fbb572df4185a70f7127a9b1b5a53f625
|
|
| MD5 |
f8e624ce5e635f0f8a455a85f5d968ef
|
|
| BLAKE2b-256 |
494ec5762dda2a5ac3f41faa87dfdaa58a33587d32d6262546328b403629007e
|
File details
Details for the file sema4ai_data-1.2.2-py3-none-any.whl.
File metadata
- Download URL: sema4ai_data-1.2.2-py3-none-any.whl
- Upload date:
- Size: 41.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.12 Linux/6.8.0-1044-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3aa233b6cb2e82644907681505051ac829dc3162d611d59acb02da6916cb361c
|
|
| MD5 |
435c9f444b3b98108eaf6634dc82f797
|
|
| BLAKE2b-256 |
d8d9ef2f5d64307b60eabd152032093859c204ab02b2f00fc6b1c6507dc8b632
|