Skip to main content

NLP interface for Trilogy

Project description

Trilogy NLP

pytrilogy-nlp is a natural language interface for generating SQL queries via a Trilogy data model.

When you write SQL, most of the value you're creating comes from the column selection, transformation, and filtering.

Joins, table selection, group bys are opportunities to introduce errors. Maybe the source table is stale; maybe you group by more fields than you need to and impact performance; maybe you didn't realize a table had partial data and you need a left join.

Trilogy aims to be easier SQL for humans because it separates out those parts - that add little value, just risk - in the language into a reusable metadata layer that can be independently tested; the exact same benefits apply to an LLM.

Beyond reducing sources of error, the significantly reduced target space for generation vs full SQL syntax reduce common sources of LLM errors.

This makes it more testable and less prone to hallucination than generating SQL directly.

Trilogy-NLP is built on common open-source LLM toolchains, but can easily swap them out and supports configurable backends. [OpenAI, Anthropic, Google, local llama].

Examples

[!TIP] These examples utilize the trilogy-public-models package to get predefined models, which can be installed with pip install trilogy-public-models

Hello World

from trilogy_public_models import get_executor
from trilogy_nlp import NLPEngine, Provider, CacheType

# we use this to run queries
# get a Trilogy executor preloaded with the tpc_ds schema in duckdb
# Executors run queries again a model using an engine
executor = get_executor("duckdb.tpc_ds")

# create an NLP engine
# we use this to generate queries against the model
engine = NLPEngine(
    provider=Provider.OPENAI,
    model="gpt-4o-mini",
    cache=CacheType.SQLLITE,
    cache_kwargs={"database_path": ".demo.db"},
)

# We can pass the executor to the engine
# to directly run a querie
results = engine.run_query(
    "What was the store sales for the first 5 days of January 2000 for customers in CA?",
    executor=executor,
)

for row in results:
    print(row)

# Or generate a query without executing it
query = engine.generate_query(
    "What was the store sales for the first 5 days of January 2000 for customers in CA?",
    env=executor.environment,
)

# which can compile it to SQL
# this might be multiple statements in some cases
# but here we can just grab the last one
print(executor.generate_sql(query)[-1])

BQ Example

from trilogy_public_models import models
from trilogy import Executor, Dialects
from trilogy_nlp import build_query

# define the model we want to parse
environment = models["bigquery.stack_overflow"]

# set up preql executor
# default bigquery executor requires local default credentials configured
executor = Dialects.BIGQUERY.default_executor(environment= environment)

# build a query off text and the selected model
processed_query = build_query(
    "How many questions are asked per year?",
    environment,
)

# make sure we got reasonable outputs
for concept in processed_query.output_columns:
    print(concept.name)

# and run that to get our answer
results = executor.execute_query(processed_query)
for row in results:
    print(row)

[!WARNING]
Don't expect perfection - results are non-determistic; review the generated Trilogy to make sure it maches your expectations. Treat queries as a starting point for refinement.

Setting Up Your Environment

Recommend that you work in a virtual environment with requirements from both requirements.txt and requirements-test.txt installed. The latter is necessary to run tests (surprise).

trilogy-nlp is python 3.10+

Open AI Config

Requires setting the following environment variables or passing them into NLPEngine creation.

  • OPENAI_API_KEY
  • OPENAI_MODEL

Recommended to use "gpt-4o-mini" or higher as the model.

Gemini

Requires setting the following environment variables or passing them into NLpEngine reation

  • GOOGLE_API_KEY

LlamaFile Config

Run server locally

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytrilogy_nlp-0.1.7.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytrilogy_nlp-0.1.7-py3-none-any.whl (40.1 kB view details)

Uploaded Python 3

File details

Details for the file pytrilogy_nlp-0.1.7.tar.gz.

File metadata

  • Download URL: pytrilogy_nlp-0.1.7.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pytrilogy_nlp-0.1.7.tar.gz
Algorithm Hash digest
SHA256 fa06a89959d43d798936bee4f111f4d6596d2cc9cce94a77778e670e60d7c54a
MD5 c8c8002f4ecbe37bd9ef6016d8861826
BLAKE2b-256 197766e9cd8db8227443c2491b6d13e3f0aa4e9a6b0f48e220a0c1f4521f7dd6

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytrilogy_nlp-0.1.7.tar.gz:

Publisher: pythonpublish.yml on trilogy-data/pytrilogy-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pytrilogy_nlp-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: pytrilogy_nlp-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 40.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pytrilogy_nlp-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 efee321de411830f7780efd8a58c87971a5b6e58d86a014b4fb37c2cc1e7b9ed
MD5 a1fbdd982c7ab0478db3a0d2484053b6
BLAKE2b-256 c02e208df3394e6e6880e2546f134414f51065864e3dae0027a76210b3bb2da5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytrilogy_nlp-0.1.7-py3-none-any.whl:

Publisher: pythonpublish.yml on trilogy-data/pytrilogy-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page