Skip to main content

Regex Inference Engine based on ChatGPT

Project description

Introduction to regex_inference:

This is a python package for inference regex from patterns using ChatGPT. It that allows you to infer a regex from a list of strings. It also supports multi-threading for improved performance on ChatGPT interaction. Moreover, regex_inference comes with a built-in evaluator that calculates precision, recall, and F1 score to quantify the performance of the inferred regex.

Installation

You can install regex_inference using pip:

pip install regex_inference

Obtain API key and add it to the environment

  1. Get an OpenAI api token following the link below:

https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt

  1. Insert it as an environement variable.
export OPENAI_API_KEY=<key>

Basic Usage

Here's a simple guide on how you can use regex_inference package:

from regex_inference import Evaluator, Inference
import random

# Define the number of training samples
TRAIN_CNT = 200

# Read the whole patterns from a txt file
whole_patterns = []
with open('data/version.txt', 'r') as f:
    whole_patterns = f.read().split('\n')

# Randomly sample some patterns for training
train_patterns = random.sample(whole_patterns, TRAIN_CNT)

# Use the remaining patterns for evaluation
eval_patterns = list(set(whole_patterns) - set(train_patterns))

# Create an instance of Inference class
inferencer = Inference(verbose=False, n_thread=1, engine='fado+ai')

# Run the inferencer on the training patterns
regex = inferencer.run(train_patterns)

# Evaluate the inferred regex
precision, recall, f1 = Evaluator.evaluate(regex, eval_patterns)

# Print the evaluation metrics
print('precision:', precision)
print('recall:', recall)
print('f1:', f1)

In the above code snippet, we first read the patterns from a file. Then, we randomly select some of these patterns for training and use the remaining for evaluation. After that, we create an instance of Inference class and run it on the training patterns to infer a regex. Lastly, we evaluate the inferred regex and print the evaluation metrics.

You can adjust the number of threads (n_thread) and choose a different engine (engine) as per your needs. The Inference class currently supports two engine mode: fado+ai and ai.

The fado+ai engine do the inference by minimize a DFA of the training patterns, convert the DFA to regex, and ask ChatGPT to make to generalize to other similar patterns.

The ai engine simply make inference by sending the training patterns to ChatGPT and ask it to produce a regex that match the training patterns.

The fado+ai approach is cheaper than ai approach since it sends less amount of token to ChatGPT.

Contribute

We welcome contributions to regex_inference. Whether it's improving the documentation, adding new features, reporting bugs, or any other improvements, we appreciate all kinds of contributions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regex-inference-0.1.2.tar.gz (11.4 kB view details)

Uploaded Source

File details

Details for the file regex-inference-0.1.2.tar.gz.

File metadata

  • Download URL: regex-inference-0.1.2.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for regex-inference-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3babd31d52c695353461d3bf5c12cad2874eda720ba4074b716bf18d8d4409df
MD5 1940f5b915186437bcd3440248693b2c
BLAKE2b-256 389a396b5302c8d0a0525cf33db9d7447adaade12ea0c5d8c49587a45d15f830

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page