Skip to main content

No project description provided

Project description

Integration: TitanML

Use TitanML's Takeoff server to serve local models efficiently with Haystack 2.0

Introduction

You can use the Takeoff inference server to deploy local models efficiently in your Haystack 2.0 pipelines. Takeoff is a state-of-the art inference server focused on deploying openly available language models at scale. It can run LLMs on local machines with consumer GPUs, and on cloud infrastructure.

The TakeoffGenerator component in Haystack 2.0 is a wrapper around the Takeoff server API, and can be used to serve takeoff-deployed models efficiently in Haystack pipelines.

Installation

pip install takeoff_haystack

Usage

You can interact with takeoff deployed models using the TakeoffGenerator component in Haystack. To do so, you must have a takeoff model deployed. For information on how to do so, please read the takeoff docs here.

The following example deploys a gpt2 model using takeoff locally on port 3000.

docker run --gpus all -e TAKEOFF_MODEL_NAME=TheBloke/Llama-2-7B-Chat-AWQ \
                      -e TAKEOFF_DEVICE=cuda \
                      -e TAKEOFF_MAX_SEQUENCE_LENGTH=256 \
                      -it \
                      -p 3000:3000 tytn/takeoff-pro:0.11.0-gpu

TextGeneration

Below is an example of using takeoff models in a Haystack RAG pipeline. It summarizes headlines from popular news sites in technology.

from typing import Dict, List
from haystack import Document, Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder  
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
import feedparser
from takeoff_haystack import TakeoffGenerator

# Dict of website RSS feeds  
urls = {
  'theverge': 'https://www.theverge.com/rss/frontpage/',
  'techcrunch': 'https://techcrunch.com/feed',
  'mashable': 'https://mashable.com/feeds/rss/all',
  'cnet': 'https://cnet.com/rss/news',
  'engadget': 'https://engadget.com/rss.xml',
  'zdnet': 'https://zdnet.com/news/rss.xml',
  'venturebeat': 'https://feeds.feedburner.com/venturebeat/SZYF',
  'readwrite': 'https://readwrite.com/feed/',    
  'wired': 'https://wired.com/feed/rss',
  'gizmodo': 'https://gizmodo.com/rss',
}

# Configurable parameters
NUM_WEBSITES = 3  
NUM_TITLES = 1

def get_titles(urls: Dict[str, str], num_sites: int, num_titles: int) -> List[str]:
  titles: List[str] = []
  sites = list(urls.keys())[:num_sites]
  
  for site in sites:
    feed = feedparser.parse(urls[site])  
    entries = feed.entries[:num_titles]
    
    for entry in entries:
      titles.append(entry.title)
      
  return titles
  
titles = get_titles(urls, NUM_WEBSITES, NUM_TITLES)
titles_string = " - ".join(titles)

document_store = InMemoryDocumentStore()
document_store.write_documents([Document(content=titles_string)])

template = """
HEADLINES:  
{% for document in documents %}
  {{ document.content }}  
{% endfor %}
REQUEST: {{ query }}
"""

pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", TakeoffGenerator(base_url="http://localhost", port="3000"))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

query = f"Summarize each of the {NUM_WEBSITES * NUM_TITLES} provided headlines in three words."
response = pipe.run({"prompt_builder": {"query": query}, "retriever": {"query": query}})
print(response["llm"]["replies"])

You should see a response like the following

['\n\n\nANSWER:\n\n1. Poker Roguelike - Exciting gameplay\n2. AI-powered news reader - Personalized feed\n3. Best laptops MWC 2024 - Powerful devices']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

takeoff_haystack-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

takeoff_haystack-0.1.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file takeoff_haystack-0.1.0.tar.gz.

File metadata

  • Download URL: takeoff_haystack-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.1 CPython/3.10.12 Linux/6.5.0-18-generic

File hashes

Hashes for takeoff_haystack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1e61b373fe7938117694b3a5828cb59f0329d504727b033fb3d5570292822b30
MD5 7f3f1fe8deef10d93fd7015a35baab80
BLAKE2b-256 ce5b0841d4dad820fb59d3f374fd4bbda17e86d648d34b01b7c66932fd2c38c3

See more details on using hashes here.

File details

Details for the file takeoff_haystack-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: takeoff_haystack-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.1 CPython/3.10.12 Linux/6.5.0-18-generic

File hashes

Hashes for takeoff_haystack-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4988927dcb204a2455e9a396afb77a98c498943d72e142285f03d65b1a87bcb0
MD5 5a133aa4713ae0be481703cf0faa1082
BLAKE2b-256 0999815d30a87ff135e18db51823c9bcd832dc2a54df431b63917c413c6ba522

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page