Skip to main content

A modular text-to-SQL toolkit.

Project description

🐷 piglets

A modular library of text-to-SQL tools.

Status

piglets is currently an alpha-stage package. The API is expected to evolve before 1.0.

Get started

Install

venv

pip install piglets

uv

uv add piglets

Install the optional dependency for the model provider you use. For OpenAI:

venv

pip install "piglets[openai]"

uv

uv add "piglets[openai]"

Other provider extras include anthropic, google_genai, google_vertexai, bedrock, cohere, mistralai, groq, ollama, and openrouter.

Install the optional dependency for the database backend you use. For BigQuery:

venv

pip install "piglets[bigquery]"

uv

uv add "piglets[bigquery]"

Logical planning

Use gpt-5.2 to generate 3 logical plans from a natural language query.

from piglets import LogicalPlanner

# initialise a logical planner
logical_planner = LogicalPlanner('gpt-5.2')

# generate 3 logical plan samples and aggregate them
logical_plan = logical_planner.plan(
    natural_language_query="What was the average number of piglets per week for Q4 2025?",
    num_samples=3,
)

# print the aggregated logical plan
for i, step in enumerate(logical_plan.logical_steps):
    print(f"Step {i + 1}: ")
    print(step)

# inspect the candidate plans used to create the aggregate
print(f"Aggregated from {len(logical_plan.sample_plans)} sample plans.")
>>> Step 1:
>>> 1. Identify all piglet birth (or piglet addition) events with their event dates and piglet counts.
>>> Step 2:
>>> 2. Filter the events to the Q4 2025 date range (Oct 1, 2025 through Dec 31, 2025).
>>> Step 3:
>>> 3. Assign each event to a calendar week within that quarter using a consistent week definition (e.g., week starting Monday or Sunday).
>>> Aggregated from 3 sample plans.
...

Database connector

Use DatabaseConnector to inspect a database and return a typed schema. Pass either a SQLAlchemy URL, a connection string, or one of Piglets' helper URL classes.

from piglets import BigQueryURL, DatabaseConnector

database_connector = DatabaseConnector(
    connection=BigQueryURL(
        dataset="my_bigquery_dataset",
    ),
)

database = database_connector.get_database_schema()

print(database.name)
for table in database.tables:
    print(table.name)
    for column in table.columns:
        print(f"- {column.name} ({column.data_type})")

BigQuery connections can include an explicit GCP project ID:

database_connector = DatabaseConnector(
    connection=BigQueryURL(
        project_id="my-gcp-project",
        dataset="my_bigquery_dataset",
    ),
)

Supported databases

DatabaseConnector supports any database URL accepted by SQLAlchemy. Use URL for SQLAlchemy-native dialects and Piglets helper URL classes where the connection string has backend-specific parameters.

Backend Connection object Install requirement Notes
SQLAlchemy-supported databases URL or a connection string Depends on the SQLAlchemy dialect and DBAPI driver Use this for SQLite, PostgreSQL, MySQL, Oracle, SQL Server, and other standard SQLAlchemy dialects.
BigQuery BigQueryURL piglets[bigquery] Uses GOOGLE_CLOUD_PROJECT, then GOOGLE_CLOUD_PROJECT_ID, when project_id is omitted.
Snowflake SnowflakeURL piglets[snowflake] Builds Snowflake URLs from explicit connection parameters.
DuckDB DuckDBURL piglets[duckdb] Builds local or in-memory DuckDB URLs.
MotherDuck MotherDuckURL piglets[duckdb] Builds MotherDuck URLs through the DuckDB SQLAlchemy dialect.

For a SQLAlchemy-native database, create a standard SQLAlchemy URL:

from piglets import DatabaseConnector, URL

database_connector = DatabaseConnector(
    connection=URL.create(
        drivername="sqlite",
        database="example.db",
    ),
)
database = database_connector.get_database_schema()

For a backend with a Piglets helper class, pass that URL object directly:

from piglets import DatabaseConnector, SnowflakeURL

database_connector = DatabaseConnector(
    connection=SnowflakeURL(
        account="my-account",
        user="my-user",
        password="my-password",
        database="SNOWFLAKE_SAMPLE_DATA",
        schema="TPCH_SF1",
    ),
)
database = database_connector.get_database_schema()

Dual-pathway pruning

Use Pruner to reduce a database schema with both preservation and deletion signals. The preservation pathway selects tables and columns that look useful for the query. The deletion pathway removes tables and columns that look irrelevant. dual_pathway_pruning() combines both paths into a final Database schema.

from piglets import BigQueryURL, DatabaseConnector, LogicalPlanner, Pruner

question = "Which tags saw the largest increase in average answer score from 2022 to 2023, considering only questions with at least 5 answers?"

logical_planner = LogicalPlanner("gpt-5.2")
logical_plan = logical_planner.plan(
    natural_language_query=question,
    num_samples=3,
)

database_connector = DatabaseConnector(
    connection=BigQueryURL(
        dataset="stack_overflow",
    ),
)
database = database_connector.get_database_schema()

pruner = Pruner(model_name="gpt-5.2")
pruned_database = pruner.dual_pathway_pruning(
    natural_language_query=question,
    database=database,
    logical_plan=logical_plan,
)

print(pruned_database.export_as_string())

Semantic Linking

You can use SemanticLinker to combine a Database and a LogicalPlan into schema-specific instructions.

from piglets import SemanticLinker

semantic_linker = SemanticLinker(model_name=MODEL_NAME)

semantic_linking_result = semantic_linker.link(
    natural_language_query=QUESTION,
    database=database,
    logical_plan=logical_plan,
)

Here database is a Database and logical_plan is a LogicalPlan or AggregatePlan.

The semantic_linking_result is of type SemanticLinkingResult and includes the following fields:

  • database_structure: Overview of the database structure relevant to the user query.
  • query_specific_content_analysis: Detailed analysis mapping the query intent to tables, columns, filters, and joins.
  • table_functions: Mapping of table name to its functional role for the query.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piglets-0.1.16.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

piglets-0.1.16-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file piglets-0.1.16.tar.gz.

File metadata

  • Download URL: piglets-0.1.16.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for piglets-0.1.16.tar.gz
Algorithm Hash digest
SHA256 8763f41da5cbe3028e51f73cb7550c81e565cf06313b0c3b25052b13c620d55c
MD5 d57ed2040dfdccc51d6b044bcd45609d
BLAKE2b-256 cd51fe304d21130ec73d6da953914c0ecc9c4e97a9d648864d7ac141e47a0de4

See more details on using hashes here.

Provenance

The following attestation bundles were made for piglets-0.1.16.tar.gz:

Publisher: publish.yml on mportdata/piglets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piglets-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: piglets-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for piglets-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f1637a4bf51f7c935fa7dcd5c1ae05d8b0e98a7e6a667710b6c0c4db03f60558
MD5 9266fcf3e442d198002e892ef1ba54eb
BLAKE2b-256 a923f5bf7f5c38f004cf0617b4f7a409764b43307a074a7082d63e7f4c724bd8

See more details on using hashes here.

Provenance

The following attestation bundles were made for piglets-0.1.16-py3-none-any.whl:

Publisher: publish.yml on mportdata/piglets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page