Skip to main content

dpq is an open-source python library that makes prompt-based data processing and feature engineering easy.

Project description

dpq: data. prompt. query.

dpq is a Python library that makes it easy to process data and engineer features using generative AI.

dpq_demo

installation

pip install dpq

quick start

import dpq

# Initialize dpq agent with API configuration
dpq_agent = dpq.Agent(
    url="ENDPOINT_URL",
    api_key="YOUR_API_KEY",
    model="MODEL_ID",
    custom_messages_path="OPTIONAL_PATH_TO_CUSTOM_PROMPTS"
)

# Apply prompt to each item in list-like iterable such as pandas series
dpq_agent.classify_sentiment(df['some_column'])

adding functionalities

A function is defined by a JSON holding messages.

[
    {
        "role": "system",
        "content": "You are a sentiment classifier. You classify statements as having
         either a positive or negative sentiment. You return only one of two words:
         positive, negative."
    },
    {
        "role": "user",
        "content": "I like dpq. It makes prompt-based feature engineering a breeze."
    },
    {
        "role": "assistant",
        "content": "positive"
    }
]

To add a new function, simply add the JSON file to a prompts folder on your system and initialize the dpq agent with the respective custom_messages_path pointing to the folder. The function name is automatically set to the name of the JSON file.

Alternatively, you can pass the messages to generate a new function directly in your code.

# Define messages
messages = [
    {
        "role": "system",
        "content": "You return the country of a city."
    },
    {
        "role": "user",
        "content": "Berlin"
    },
    {
        "role": "assistant",
        "content": "Germany"
    },
]

# Add new function
dpq_agent.return_country = dpq_agent.generate_function(messages)

# Apply to a list
dpq_agent.return_country(["Berlin", "London", "Paris"])

examples

In addition to the prompts in the prompts directory, which are loaded by default when initializing the dpq.Agent(), we maintain a library of additional examples in the examples directory. These are typically slightly less general-purpose. Feel free to open a pull request and share prompts you have found useful with everyone!

features

  • feature engineering using prompts
  • library of standard functions
  • parallelized by default

compatibility

dpq uses the requests library to send OpenAI-style Chat Completions API requests. For GPT-3.5 Turbo, the configuration is as follows.

dpq_agent = dpq.Agent(
    url="https://api.openai.com/v1/chat/completions",
    api_key="YOUR_API_KEY",
    model="gpt-3.5-turbo",
)

costs and speed

dpq currently comes as is without cost or speed guarantees. To still give a very rough estimate: on a test data set of 1000 product reviews, the classify_sentiment.json finishes in approx. 30 seconds (parallelized) on a standard Macbook and costs $0.05 using gpt-3.5-turbo.

is using LLMs a good idea?

Recent studies have shown promising results using general-purpose LLMs for text annotation and classification. For example, Gilardi, Alizadeh, and Kubli (2023) and Törnberg (2023) report better-than-human performance. This is an active research area and we are looking forward to seeing more results in this field. In general, we believe that LLMs can deliver consistent, high-quality output resulting in scalability, reduced time and costs (see also Aguda (2024)).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpq-0.1.4.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dpq-0.1.4-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file dpq-0.1.4.tar.gz.

File metadata

  • Download URL: dpq-0.1.4.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/22.6.0

File hashes

Hashes for dpq-0.1.4.tar.gz
Algorithm Hash digest
SHA256 a2790b7472ceb8a87e3af8e429dbe88d53cf508e8b594906abab491be4308a82
MD5 28c407595e23ea883f89d542e36c8049
BLAKE2b-256 ae3af35f87de3b1c8d9ca202a687a3c5714c1959e370a5a0607f32d4a6d3ff86

See more details on using hashes here.

File details

Details for the file dpq-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: dpq-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/22.6.0

File hashes

Hashes for dpq-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 028ccebfcaf0cc11bdcc12240fff8403fad707d3c182c8159f3390eca7b7ce08
MD5 0b35ef5114f8128a13d3c319c0da2e7d
BLAKE2b-256 10ead1902cd84547c5e7ea415a347a994e406ae2c75860bd93b2f77612a040ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page