Skip to main content

A Pathway LLM App

Project description

Pathway banner

LLM App

LICENSE Contributors

Linux macOS chat on Discord follow on Twitter

Pathway's LLM (Large Language Model) App enables innovative AI applications that provide real-time human-like responses to user queries, based on the most up-to-date knowledge available in a document store. What sets LLM App apart is it does not require a separate vector database, thereby avoiding the need for complex and fragmented typical LLM stacks (such as Pinecone/Weaviate + Langchain + Redis + FastAPI +...). Your document data remains secure and undisturbed in its original storage location. LLM App's design ensures high performance and offers the flexibility for easy customization and expansion. It is particularly recommended for privacy-preserving LLM applications.

To get started explore one of the examples:

Example Description
contextless This simple example calls OpenAI ChatGPT API but does not use an index when processing queries. It relies solely on the given user query. We recommend it to start your Pathway LLM journey.
contextful This default example of the app will index the documents located in the data/pathway-docs directory. These indexed documents are then taken into account when processing queries. The pathway pipeline being run in this mode is located at examples/pipelines/contextful/app.py.
contextful_s3 This example operates similarly to the contextful mode. The main difference is that the documents are stored and indexed from an S3 bucket, allowing the handling of a larger volume of documents. This can be more suitable for production environments.
local This example runs the application using Huggingface Transformers, which eliminates the need for the data to leave the machine. It provides a convenient way to use state-of-the-art NLP models locally.

Quick links - 💡Use cases 📚 How it works 🌟 Key Features 🏁 Getting Started 🛠️ Troubleshooting 👥 Contributing

Use cases

LLM App examples can be used as templates for developing multiple applications running on top of Pathway. Here are examples of possible uses:

  • Build your own Discord AI chatbot that answers questions (this is what you see covered in the video!). Or any similar AI chatbot.
  • Ask privacy-preserving queries to an LLM using a private knowledge base that is frequently updated.
  • Extend Kafka-based streaming architectures with LLM's.
  • Process LLM queries in bulk with prompts created automatically out of input data streams.
  • Obtain structured data on the fly out of streams of documents.
  • Validate incoming documents against existing documents with an LLM.
  • Monitor live information streams with an LLM: news and social media, spotting fake news, travel disruptions...

How it works

The default contextful LLM App takes a bunch of documents that might be stored in AWS S3 or locally on your computer. Then it processes and organizes these documents by building a 'vector index' using the Pathway package. It waits for user queries that come as HTTP REST requests, then uses the index to find relevant documents and responds using OpenAI API or Hugging Face in natural language. The cool part is, the app is always aware of changes in the documents. If new pieces of information are added, it updates its index in real-time and uses this new knowledge to answer the next questions. In this way, it provides the most accurate real-time data answers.

The app can also be combined with streams of fresh data, such as news feeds or status reports, either through REST or a technology like Kafka. It can also be combined with extra static data sources and user-specific contexts, for example to eliminate ambiguity problems of natural language with clearer prompts and better contexts.

Read more about the implementation details and how to extend this application in our blog article.

Watch it in action

Build your LLM App without a vector database (in 30 lines of code)

▶️ Building an LLM Application without a vector database - by Jan Chorowski

Features

Key Features

  • HTTP REST queries - The system is capable of responding in real time to HTTP REST queries.
  • Real-time document indexing pipeline - This pipeline reads data directly from S3-compatible storage, without the need to query an extra vector document database.
  • Code reusability for offline evaluation - The same code can be used for static evaluation of the system.
  • Model testing - Present and past queries can be run against fresh models to evaluate their quality.

Advanced Features

  • Local Machine Learning models - LLM App can be configured to run with local Machine Learning models, without making API calls outside of the User's Organization.

  • Live data sources - It can also be extended to handle live data sources (news feeds, APIs, data streams in Kafka), to include user permissions, a data security layer, and an LLMops monitoring layer.

  • User session handling - The query-building process can be extended to handle user sessions.

  • To learn more about advanced features see: Features for Organizations.

Coming Soon:

  • Splitting the application into indexing and request-serving processes easily.
  • Expanding context doc selection with a graph walk.
  • Model drift and monitoring setup.
  • A guide to model A/B testing.

Getting Started

Follow easy steps to install and get started using the app.

Step 1: Clone the repository

This is done with the git clone command followed by the URL of the repository:

git clone https://github.com/pathwaycom/llm-app.git

Next, navigate to the repository:

cd llm-app

Step 2: Set environment variables

Create an .env file in the root directory and add the following environment variables, adjusting their values according to your specific requirements and setup.

Environment Variable Description
APP_VARIANT Determines which pipeline to run in your application. Available modes are [contextful,contextful_s3, contextless, local]. By default, the mode is set tocontextful.
PATHWAY_REST_CONNECTOR_HOST Specifies the host IP for the REST connector in Pathway. For the dockerized version, set itto 0.0.0.0 Natively, you can use 127.0.01
PATHWAY_REST_CONNECTOR_PORT Specifies the port number on which the REST connector service of the Pathway should listen.Here, it is set to8080.
OPENAI_API_TOKEN The API token for accessing OpenAI services. If you are not running the local version, pleaseremember to replace it with your personal API token, which you can generate from your account on openai.com.
PATHWAY_CACHE_DIR Specifies the directory where cache is stored. You could use /tmpcache.

For example:

APP_VARIANT=contextful
PATHWAY_REST_CONNECTOR_HOST=0.0.0.0
PATHWAY_REST_CONNECTOR_PORT=8080
OPENAI_API_TOKEN=<Your Token>
PATHWAY_CACHE_DIR=/tmp/cache

Step 3: Build and run the app

You can install and run the LLM App in two different ways.

Using Docker

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Here is how to use Docker to build and run the LLM App:

```bash
docker compose run --build --rm -p 8080:8080 llm-app-examples
```

If you have set a different port in PATHWAY_REST_CONNECTOR_PORT, replace the second 8080 with this port in the command above.

When the process is complete, the App will be up and running inside a Docker container and accessible at 0.0.0.0:8080. From there, you can proceed to the "Usage" section of the documentation for information on how to interact with the application.

Native Approach

Important: The instructions in this section are intended for users operating Unix-like systems (such as Linux, macOS, BSD). If you are a Windows user, we highly recommend leveraging Windows Subsystem for Linux (WSL) or Docker, as outlined in the previous sections, to ensure optimal compatibility and performance.

  • Install poetry:

    pip install poetry
    
  • Install llm_app and dependencies:

    poetry install --with examples --extras local
    

    You can ommit --extras local part if you're not going to run local example.

  • Run the examples: You can start the example with the command:

    poetry run ./run_examples.py contextful
    

Step 4: Start to use it

  1. Send REST queries (in a separate terminal window): These are examples of how to interact with the application once it's running. curl is a command-line tool used to send data using various network protocols. Here, it's being used to send HTTP requests to the application.

    curl --data '{"user": "user", "query": "How to connect to Kafka in Pathway?"}' http://localhost:8080/
    
    curl --data '{"user": "user", "query": "How to use LLMs in Pathway?"}' http://localhost:8080/
    

    If you are on windows CMD, then the query would rather look like this

    curl --data "{\"user\": \"user\", \"query\": \"How to use LLMs in Pathway?\"}" http://localhost:8080/
    
  2. Test reactivity by adding a new file: This shows how to test the application's ability to react to changes in data by adding a new file and sending a query.

    cp ./data/documents_extra.jsonl ./data/pathway-docs/
    

    Or if using docker compose:

    docker compose exec llm-app-examples mv /app/examples/data/documents_extra.jsonl /app/examples/data/pathway-docs/
    

    Let's query again:

    curl --data '{"user": "user", "query": "How to use LLMs in Pathway?"}' http://localhost:8080/
    

Step 5: Build your own Pathway-powered LLM App

Simply add llm-app to your project's dependencies and copy one of the examples to get started!

Troubleshooting

Please check out our Q&A to get solutions for common installation problems and other issues.

Raise an issue

To provide feedback or report a bug, please raise an issue on our issue tracker.

Contributing

Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code cleanup, testing, or code reviews, is very much encouraged to do so.

To join, just raise your hand on the Pathway Discord server (#get-help) or the GitHub discussion board.

If you are unfamiliar with how to contribute to GitHub projects, here is a Getting Started Guide. A full set of contribution guidelines, along with templates, are in progress.

Supported and maintained by

Pathway is a free ultra-performant data processing framework to power your real-time data products and pipelines. To learn more, checkout Pathway's website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_app-0.1.1.tar.gz (17.5 kB view hashes)

Uploaded Source

Built Distribution

llm_app-0.1.1-py3-none-any.whl (15.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page