Skip to main content

MLflow automatic tracing for txtai

Project description

MLflow Automatic Tracing for txtai

Version GitHub Release Date GitHub issues GitHub last commit Build Status Coverage Status

This project is an extension that adds MLflow automatic tracing for txtai.

Installation

The easiest way to install is via pip and PyPI

pip install mlflow-txtai

Examples

The following is a list of examples showing how this plugin works. This notebook also has all of these examples.

Initialization

The following code initializes the environment. It assumes a mlflow server is running locally. That can be started as follows.

mlflow server --host 127.0.0.1 --port 8000
import mlflow

mlflow.set_tracking_uri(uri="http://localhost:8000")
mlflow.set_experiment("txtai")

# Enable txtai automatic tracing
mlflow.txtai.autolog()

Textractor

The first example traces a Textractor pipeline.

from txtai.pipeline import Textractor

with mlflow.start_run():
    textractor = Textractor()
    textractor("https://github.com/neuml/txtai")

textractor

Embeddings

Next, we'll trace an Embeddings query.

from txtai import Embeddings

with mlflow.start_run():
    wiki = Embeddings()
    wiki.load(provider="huggingface-hub", container="neuml/txtai-wikipedia-slim")

    embeddings = Embeddings(content=True, graph=True)
    embeddings.index(wiki.search("SELECT id, text FROM txtai LIMIT 25"))

    embeddings.search("MATCH (A)-[]->(B) RETURN A")

embeddings-load embeddings-index

Retrieval Augmented Generation (RAG)

The next example traces a RAG pipeline.

from txtai import Embeddings, RAG

with mlflow.start_run():
    wiki = Embeddings()
    wiki.load(provider="huggingface-hub", container="neuml/txtai-wikipedia-slim")

    # Define prompt template
    template = """
    Answer the following question using only the context below. Only include information
    specifically discussed.

    question: {question}
    context: {context} """

    # Create RAG pipeline
    rag = RAG(
        wiki,
        "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
        system="You are a friendly assistant. You answer questions from users.",
        template=template,
        context=10
    )

    rag("Tell me about the Roman Empire", maxlength=2048)

rag

Workflow

This example runs a workflow. This workflow runs an embeddings query and then translates each result to French.

from txtai import Embeddings, Workflow
from txtai.pipeline import Translation
from txtai.workflow import Task

with mlflow.start_run():
    wiki = Embeddings()
    wiki.load(provider="huggingface-hub", container="neuml/txtai-wikipedia-slim")

    # Translation instance
    translate = Translation()

    workflow = Workflow([
        Task(lambda x: [y[0]["text"] for y in wiki.batchsearch(x, 1)]),
        Task(lambda x: translate(x, "fr"))
    ])

    print(list(workflow(["Roman Empire", "Greek Empire", "Industrial Revolution"])))

workflow

Agent

The last example runs a txtai agent designed to research questions on astronomy.

from txtai import Agent, Embeddings

def search(query):
    """
    Searches a database of astronomy data.

    Make sure to call this tool only with a string input, never use JSON.    

    Args:
        query: concepts to search for using similarity search

    Returns:
        list of search results with for each match
    """

    return embeddings.search(
        "SELECT id, text, distance FROM txtai WHERE similar(:query)",
        10, parameters={"query": query}
    )

embeddings = Embeddings()
embeddings.load(provider="huggingface-hub", container="neuml/txtai-astronomy")

agent = Agent(
    tools=[search],
    llm="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    max_iterations=10,
)

researcher = """
{command}

Do the following.
 - Search for results related to the topic.
 - Analyze the results
 - Continue querying until conclusive answers are found
 - Write a Markdown report
"""

with mlflow.start_run():
    agent(researcher.format(command="""
    Write a detailed list with explanations of 10 candidate stars that could potentially be habitable to life.
    """), maxlength=16000)

agent

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_txtai-0.2.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_txtai-0.2.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_txtai-0.2.0.tar.gz.

File metadata

  • Download URL: mlflow_txtai-0.2.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for mlflow_txtai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4fca27493f90830ff5434af0dc89a7fea096b607de9cef6cbf797b5a30a9ec30
MD5 80afcd4cad34d996e41dd113bd679fdc
BLAKE2b-256 a79bfd8a32ba2177de415dfba59bc55a3fe3b7a710cc71ed463f45bdbdf3f2e9

See more details on using hashes here.

File details

Details for the file mlflow_txtai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mlflow_txtai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for mlflow_txtai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17cd48213e8ad4afefa375cad269ddaa882d7754973ce3422c3a89906abe0a75
MD5 cfde2e5cd321f05016f282edc98617bc
BLAKE2b-256 b36bd1b2b1275b23ae37d51d2b819b0f672851efaa935ad4d72214a57c27d958

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page