Skip to main content

No project description provided

Project description

Gyyre - Context-Aware Semantic Operators for Machine Learning Pipelines

Gyyre is a research project to extend Python-based machine learning scripts with semantic operators. It is heavily relying on the awesome work from the skrub project!

Semantic Operators

  • sem_choose(nl_prompt) -- a semantic drop-in alternative for skrub's choose_from to suggest hyperparameter ranges and other pipeline components
  • sem_fillna(target_column, nl_prompt: str, impute_with_existing_values_only) -- missing value imputation
  • with_sem_features(nl_prompt, how_many) -- automated generation of additional feature columns in dataframes
  • sem_select(nl_prompt) -- a semantic drop-in alternative for skrub's selectors to select columns from dataframes

Example

import gyyre
import skrub
from sklearn.ensemble import HistGradientBoostingClassifier
from gyyre import sem_choose

dataset = skrub.datasets.fetch_credit_fraud()

products = skrub.var("products", dataset.products)
baskets = skrub.var("baskets", dataset.baskets)
baskets = baskets.skb.subsample(n=5000, how="random")

basket_ids = baskets[["ID"]].skb.mark_as_X()
fraud_flags = baskets["fraud_flag"].skb.mark_as_y()

# Impute missing values in your data
products = products.sem_fillna(
    target_column="make",
    nl_prompt="Infer the manufacturer from relevant product-related attributes like title or description.",
    impute_with_existing_values_only=True,
)

kept_products = products[products["basket_ID"].isin(basket_ids["ID"])]
# Generate new features for the model to train
kept_products = kept_products.with_sem_features(
    nl_prompt="""
    Generate additional brand- and manufacturer-related product features. Make sure that they can be
    efficiently computed on large datasets, and that they work across a large number of brands and
    manufacturers. Use your intrinsic knowledge about what products and brands fraudsters focus on
    to make sure that the new features are helpful for the prediction task  at hand.
    """,
    name="brand_features",
    how_many=5,
)

vectorizer = skrub.TableVectorizer()
vectorized_products = kept_products.skb.apply_with_sem_choose(
    vectorizer,
    exclude_cols="basket_ID",
    # Choose encoders for your data
    choices=sem_choose(
        high_cardinality="""
        A fast encoder for messy columns with potentially invalid data that can scale to many unique
        values, can handle missing values and that outputs a pandas Dataframe as result.
    """
    ),
)

aggregated_products = vectorized_products.groupby("basket_ID").agg("mean").reset_index()
augmented_baskets = basket_ids.merge(aggregated_products, left_on="ID", right_on="basket_ID").drop(
    columns=["ID", "basket_ID"]
)

hgb = HistGradientBoostingClassifier()
fraud_detector = augmented_baskets.skb.apply_with_sem_choose(
    hgb,
    y=fraud_flags,
    # Get suggestions for hyperparameters
    choices=sem_choose(learning_rate="A range of reasonable learning rates to try")
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gyyre-0.0.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gyyre-0.0.1-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file gyyre-0.0.1.tar.gz.

File metadata

  • Download URL: gyyre-0.0.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Darwin/23.6.0

File hashes

Hashes for gyyre-0.0.1.tar.gz
Algorithm Hash digest
SHA256 dfc6cc1f97fe531133afad5e98dbd2b12b119704f23027e9ba16dc634c992595
MD5 a61c83a111645e41db90b446cf571bba
BLAKE2b-256 ba6f92d969c9f278015605fc90e4cfcce29afe7b0b123ff6b11425849ee56cb6

See more details on using hashes here.

File details

Details for the file gyyre-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: gyyre-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Darwin/23.6.0

File hashes

Hashes for gyyre-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f6bf543980e87eee65fde36fdea20fd4fb93edbcd403f40d3c62d9ba1779ac1
MD5 a9bd9a482d4e8e1ba0c0808104f40390
BLAKE2b-256 06e9675684aa98cad0ffb16197cab40a70d7a86d6f148181fc0c6eb1ca6303b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page