Skip to main content

PyLLMCore provides a light-weighted interface with LLMs

Project description

PyLLMCore

Overview

PyLLMCore provides a light-weighted structured interface with Large Language Models API.

Use cases with examples are described :

  • Question answering using Chain of Verification
  • Summarize using the Chain of Density prompting
  • parse unstructured text and obtain Python objects (a populated dataclass)
  • extract information
  • describe arbitrary tasks for the LLM to perform (translation, ...)

The current version supports OpenAI API through the openai package.

Changelog

  • 1.2.0: Chain of density prompting implemented
  • 1.1.0: Chain of Verification implemented
  • 1.0.0: Initial version

How to install

pip install py-llm-core

# Add you OPENAI_API_KEY to the environment
export OPENAI_API_KEY=sk-<replace with your actual api key>

Use cases

Question answering with Chain of Verification

The following example implements the technique from the paper Chain-of-Verification Reduces Hallucination in Large Language Models from Shehzaad Dhuliawala et al. (2023)

>>> from llm_core.assistants import COVQuestionAnswering
>>> cov_qa = COVQuestionAnswering.ask(
...     question="Name some politicians who were born in NY, New York"
... )
>>> print(cov_qa.revised_answer)

Some politicians who were born in NY, New York include Donald Trump,
Franklin D. Roosevelt, Theodore Roosevelt, and Andrew Cuomo.

Summarizing with Chain of Density Prompting

The following example implements the technique from the paper From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting from Adams et al. (2023).

>>> from llm_core.assistants import DenserSummaryCollection
>>> import wikipedia
>>> text = wikipedia.page("Foundation from Isaac Asimov").content
>>> summary_collection = DenserSummaryCollection.summarize(text)
>>> print(summary_collection)
DenserSummaryCollection(
    summaries=[
        DenseSummary(
            denser_summary="""This article discusses the Foundation series, a
            science fiction book series written by American author Isaac Asimov.
            The series was first published as a series of short stories and
            novellas in 1942–50, and subsequently in three collections in
            1951–53. The premise of the stories is that in the waning days of
            a future Galactic Empire, the mathematician Hari Seldon spends his
            life developing a theory of psychohistory, a new and effective
            mathematics of sociology.""",
            missing_entities=["Isaac Asimov", "Hari Seldon", "psychohistory"],
        ),

        ...

        DenseSummary(
            denser_summary="""Isaac Asimov's Foundation series, inspired by
            Edward Gibbon's History of the Decline and Fall of the Roman
            Empire, explores Hari Seldon's psychohistory predicting the fall of
            a future Galactic Empire and a 30,000-year Dark Age. Seldon's plan
            aims to limit this interregnum to a thousand years. The series,
            initially a trilogy, was expanded with two sequels and two
            prequels. The plot follows the series' in-universe chronology, not
            the order of publication, and won the one-time Hugo Award for
            'Best All-Time Series' in 1966.""",
            missing_entities=["Hugo Award for 'Best All-Time Series'"],
        ),
    ]
)

Parsing

When given unstructured content, LLMs are powerful enough to extract information and produce structured content.

We use a dataclass as a light-weighted structure to hold parsed data.

from dataclasses import dataclass
from llm_core.parsers import OpenAIParser


@dataclass
class Book:
    title: str
    summary: str
    author: str
    published_year: int


text = """Foundation is a science fiction novel by American writer
Isaac Asimov. It is the first published in his Foundation Trilogy (later
expanded into the Foundation series). Foundation is a cycle of five
interrelated short stories, first published as a single book by Gnome Press
in 1951. Collectively they tell the early story of the Foundation,
an institute founded by psychohistorian Hari Seldon to preserve the best
of galactic civilization after the collapse of the Galactic Empire.
"""


with OpenAIParser(Book) as parser:
    book = parser.parse(text)
    print(book)
Book(
    title='Foundation',
    summary="""Foundation is a cycle of five interrelated
    short stories, first published as a single book by Gnome Press in 1951.
    Collectively they tell the early story of the Foundation, an institute
    founded by psychohistorian Hari Seldon to preserve the best of galactic
    civilization after the collapse of the Galactic Empire.""",
    author='Isaac Asimov',
    published_year=1951
)

Summary and advanced information extraction

We can use all the abilities to perform all kind of text processing with the same class.

from typing import List
import wikipedia
from dataclasses import dataclass
from llm_core.parsers import OpenAIParser


@dataclass
class Book:
    title: str
    summary: str
    author: str
    published_year: int


@dataclass
class BookCollection:
    books: List[Book]


text = wikipedia.page("Foundation from Isaac Asimov").content


with OpenAIParser(BookCollection, model='gpt-3.5-turbo-16k') as parser:
    book_collection = parser.parse(text)
    print(book_collection)
BookCollection(
    books=[
        Book(
            title="Foundation",
            summary="The first book in the Foundation series. It introduces the concept of psychohistory and follows the mathematician Hari Seldon as he predicts the fall of the Galactic Empire and establishes the Foundation to preserve knowledge and shorten the Dark Age.",
            author="Isaac Asimov",
            published_year=1951,
        ),
        ...
        
        Book(
            title="Forward the Foundation",
            summary="The final book in the Foundation series. It takes place eight years after Prelude to Foundation and explores Hari Seldon's final years as he works to establish the Second Foundation.",
            author="Isaac Asimov",
            published_year=1993,
        ),
    ]
)

Performing arbitrary tasks (summary, translations,...)

When a task should be performed by the language model, we add an explicit prompt (and system_prompt) to the desired structure.

from typing import List
from dataclasses import dataclass
from llm_core.assistants import OpenAIAssistant


@dataclass
class SummaryWithInsights:
    system_prompt = """
    You are a world-class copy writer and work in broad domains.
    You help users produce better analysis of content by summarizing
     written content.
    """

    prompt = """
    Article:

    {content}

    - Summarize the previous content in approx. {word_count} words.
    - Provide a list key facts

    """

    summary: str
    facts: List[str]

    @classmethod
    def summarize(cls, content, word_count=100):
        with OpenAIAssistant(cls, model='gpt-3.5-turbo-16k') as assistant:
            return assistant.process(content=content, word_count=word_count)




import wikipedia  # Run `make test-setup` to install wikipedia package
text = wikipedia.page("Foundation from Isaac Asimov").content

response = SummaryWithInsights.summarize(text)

print(response)
SummaryWithInsights(
    summary="""The Foundation series is a science fiction book series written 
    by Isaac Asimov. It was first published as a series of short stories and
     novellas in 1942–50 and subsequently in three collections in 1951–53. The 
     series follows the mathematician Hari Seldon as he develops a theory of
      psychohistory, a new mathematics of sociology that can predict the future
       of large populations. The series explores the rise and fall of a future
        Galactic Empire and the efforts of the Foundation to preserve 
        civilization during a Dark Age. The series has had a significant
         cultural impact and has won several awards.""",
    facts=[
        "The Foundation series was written by Isaac Asimov.",
        "It was first published as a series of short stories and novellas in 1942–50.",
        "The series follows the mathematician Hari Seldon and his development of psychohistory.",
        "The series explores the rise and fall of a future Galactic Empire.",
        "The Foundation works to preserve civilization during a Dark Age.",
        "The series has had a significant cultural impact and has won several awards.",
    ],
)

Tokenizer

Tiktoken library is registered as a codec within the Python codecs registry :

import llm_core
import codecs

text = """Foundation is a science fiction novel by American writer
Isaac Asimov. It is the first published in his Foundation Trilogy (later
expanded into the Foundation series). Foundation is a cycle of five
interrelated short stories, first published as a single book by Gnome Press
in 1951. Collectively they tell the early story of the Foundation,
an institute founded by psychohistorian Hari Seldon to preserve the best
of galactic civilization after the collapse of the Galactic Empire.
"""


# You can encode the text into tokens like that:

tokens = codecs.encode(text, 'gpt-3.5-turbo')

print(tokens)
[19137, 374, 264, 8198, ... 627]


print(len(tokens))

100

Chunking and splitting

from llm_core.splitters import TokenSplitter


text = """Foundation is a science fiction novel by American writer
Isaac Asimov. It is the first published in his Foundation Trilogy (later
expanded into the Foundation series). Foundation is a cycle of five
interrelated short stories, first published as a single book by Gnome Press
in 1951. Collectively they tell the early story of the Foundation,
an institute founded by psychohistorian Hari Seldon to preserve the best
of galactic civilization after the collapse of the Galactic Empire.
"""


splitter = TokenSplitter(chunk_size=50, chunk_overlap=0)

for chunk in splitter.chunkify(text):
    print(chunk)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-llm-core-1.2.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

py_llm_core-1.2.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file py-llm-core-1.2.0.tar.gz.

File metadata

  • Download URL: py-llm-core-1.2.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for py-llm-core-1.2.0.tar.gz
Algorithm Hash digest
SHA256 37554d06d3bd7d95bdbeb6b4c768003781ddf3574fc50e9573a579e2677c123f
MD5 5e053facd59d82c7d6932c319df7bc7b
BLAKE2b-256 df93a76a8a428c1e6f24dbdb039980f9dbe0ab73f77c6d3b98a99d6ec6f09cdd

See more details on using hashes here.

File details

Details for the file py_llm_core-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: py_llm_core-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for py_llm_core-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c602bae454ce835b9549731be382510cf3d74c4146bbe7b9f06a19e0cb02cd1
MD5 b4797ece8c62468697eed9d67a00ff5f
BLAKE2b-256 b59401f93085598ce143f46c1fb68e91f28b669d9b2d72ba6bfdeaac604cefa2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page