Skip to main content

From Dataset Labeling to Deployment: The Power of NLP and LLMs Combined.

Project description

PromptedGraphs

From Dataset Labeling to Deployment: The Power of NLP and LLMs Combined.

Description

PromptedGraphs is a Python library that aims to seamlessly integrate traditional NLP methods with the capabilities of modern Large Language Models (LLMs) in the realm of knowledge graphs. Our library offers tools tailored for dataset labeling, model training, and smooth deployment to production environments. We leverage the strengths of spacy for core NLP tasks, snorkel for effective data labeling, and async to ensure enhanced performance. Our mission is to provide a harmonized solution to knowledge graph development when you have to merge traditional and LLM-driven approaches, squarely addressing the challenges associated with accuracy, efficiency, and affordability.

✨ Features

  • Named Entity Recognition (NER): Customize ER labels based on your domain.
  • Structured Data Extraction: Extract structured data from unstructured text.
  • Entity Resolution: Deduplication and normalization
  • Relationship Extraction: Either open ended labels or constrain to your domain
  • Entity Linking: Link references in text to entities in a graph
  • Graph Construction: Create or update knowledge graphs

Core Functions

  • Dataset Labeling: Efficient tools for labeling datasets, powered by haystack.
  • Model Training: Combine the reliability of NLP and the prowess of LLMs.
  • Deployment: Streamlined processes to ensure smooth transition to production.

Requirements

  • Python 3.10 or newer.

📦 Installation

To install PromptedGraphs via pip:

pip install promptedgraphs
# or
poetry add promptedgraphs

Usage

Entity Recognition

from examples/er_reviews.ipynb

from spacy import displacy
from promptedgraphs.config import Config
from promptedgraphs.extraction.entities_from_text import entities_from_text

labels = {
    "POSITIVE": "A postive review of a product or service.",
    "NEGATIVE": "A negative review of a product or service.",
    "NEUTRAL": "A neutral review of a product or service.",
}

text_of_reviews = """
1. "I absolutely love this product. It's been a game changer!"
2. "The service was quite poor and the staff was rude."
3. "The item is okay. Nothing special, but it gets the job done."
""".strip()


# Label Sentiment
ents = []
async for msg in entities_from_text(
    name="sentiment",
    description="Sentiment Analysis of Customer Reviews",
    text=text_of_reviews,
    labels=labels,
    config=Config(),  # Reads `OPENAI_API_KEY` from .env file or environment
):
    ents.append(msg)

# Show Results using spacy.displacy
render_entities(
    text=text_of_reviews,
    entities=ents,
    labels=labels,
    colors = {"POSITIVE": "#7aecec", "NEGATIVE": "#f44336", "NEUTRAL": "#f4f442"}
)

displacy-sentiment-example

Brainstorming Data

Generate a list of data that fits a given data model.

from examples/er_reviews.ipynb

from pydantic import BaseModel, Field

from promptedgraphs.config import Config
from promptedgraphs.ideation import brainstorm
from promptedgraphs.vis import render_entities


class BusinessIdea(BaseModel):
    """A business idea generated using the Jobs-to-be-done framework
    For example "We help [adj] [target_audience] [action] so they can [benefit or do something else]"
    """

    target_audience: str = Field(title="Target Audience")
    action: str = Field(title="Action")
    benefit: str = Field(title="Benefit or next action")
    adj: str | None = Field(
        title="Adjective",
        description="Optional adjective describing the target audience's condition",
    )


ideas = []
async for idea in brainstorm(
    text=BusinessIdea.__doc__,
    output_type=list[BusinessIdea],
    config=Config(),
    n=10,
    max_workers=2,
):
    ideas.append(idea)
    render_entities(
        f"We help {idea.adj} {idea.target_audience} {idea.action} so they can {idea.benefit}",
        idea,
    )

brainstorm-examples

Structured Data Extraction

from examples/de_chatintents.ipynb

from pydantic import BaseModel, Field

from promptedgraphs.config import Config


class UserIntent(BaseModel):
    """The UserIntent entity, representing the canonical description of what a user desires to achieve in a given conversation."""

    intent_name: str = Field(
        title="Intent Name",
        description="Canonical name of the user's intent",
        examples=[
            "question",
            "command",
            "clarification",
            "chit_chat",
            "greeting",
            "feedback",
            "nonsensical",
            "closing",
            "harrassment",
            "unknown",
        ],
    )
    description: str | None = Field(
        title="Intent Description",
        description="A detailed explanation of the user's intent",
    )


msg = """It's a busy day, I need to send an email and to buy groceries"""

async for intent in data_from_text(
    text=msg, output_type=UserIntent, config=Config()
):
    print(intent)
intent_name='task' description='User wants to complete a task'
intent_name='communication' description='User wants to send an email'
intent_name='shopping' description='User wants to buy groceries'

📚 Resources

Related Libraries

Contributing

We welcome contributions! Please DM me @seankruzel or create issues or pull requests.

📝 License

This project is licensed under the terms of the MIT license.

Built using quantready using template https://github.com/closedloop-technologies/quantready-api

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptedgraphs-0.4.3.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

promptedgraphs-0.4.3-py3-none-any.whl (59.9 kB view details)

Uploaded Python 3

File details

Details for the file promptedgraphs-0.4.3.tar.gz.

File metadata

  • Download URL: promptedgraphs-0.4.3.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for promptedgraphs-0.4.3.tar.gz
Algorithm Hash digest
SHA256 c41e29d667b28ff743fb13aeee44fcf9ea57af62bd8996c5c8bac05cb64b6968
MD5 e1fe22d4add32221f608713dc4b260d2
BLAKE2b-256 6449fc6cbb444eadc32e4940314a535c3ca8a41060b2646c8dec3f0c3f4c91bd

See more details on using hashes here.

File details

Details for the file promptedgraphs-0.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for promptedgraphs-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1328ad2015ed4ee25cd80e4d2a7dbfbca960a11459f1834903cb739f3b9b9fbe
MD5 d2c98ad4972034149066e65e26763c05
BLAKE2b-256 ae9f03a39ee95aab153ce644396ea45a8caa88c7231d54ac8f0d45c988b3deef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page