Skip to main content

MLflow automatic tracing for txtai

Project description

MLflow Automatic Tracing for txtai

Version

This project is an extension that adds MLflow automatic tracing for txtai.

Installation

The easiest way to install is via pip and PyPI

pip install mlflow-txtai

Examples

The following is a list of examples showing how this plugin works. This notebook also has all of these examples.

Initialization

The following code initializes the environment. It assumes a mlflow server is running locally. That can be started as follows.

mlflow server --host 127.0.0.1 --port 8000
import mlflow
import mlflow_txtai

mlflow.set_tracking_uri(uri="http://localhost:8000")
mlflow.set_experiment("txtai")

mlflow_txtai.autolog()

Textractor

The first example traces a Textractor pipeline.

from txtai.pipeline import Textractor

with mlflow.start_run():
    textractor = Textractor()
    textractor("https://github.com/neuml/txtai")

textractor

Embeddings

Next, we'll trace an Embeddings query.

from txtai import Embeddings

with mlflow.start_run():
    wiki = Embeddings()
    wiki.load(provider="huggingface-hub", container="neuml/txtai-wikipedia-slim")

    embeddings = Embeddings(content=True, graph=True)
    embeddings.index(wiki.search("SELECT id, text FROM txtai LIMIT 25"))

    embeddings.search("MATCH (A)-[]->(B) RETURN A")

embeddings-load embeddings-index

Retrieval Augmented Generation (RAG)

The next example traces a RAG pipeline.

from txtai import Embeddings, RAG

with mlflow.start_run():
    wiki = Embeddings()
    wiki.load(provider="huggingface-hub", container="neuml/txtai-wikipedia-slim")

    # Define prompt template
    template = """
    Answer the following question using only the context below. Only include information
    specifically discussed.

    question: {question}
    context: {context} """

    # Create RAG pipeline
    rag = RAG(
        wiki,
        "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
        system="You are a friendly assistant. You answer questions from users.",
        template=template,
        context=10
    )

    rag("Tell me about the Roman Empire", maxlength=2048)

rag

Workflow

This example runs a workflow. This workflow runs an embeddings query and then translates each result to French.

from txtai import Embeddings, Workflow
from txtai.pipeline import Translation
from txtai.workflow import Task

with mlflow.start_run():
    wiki = Embeddings()
    wiki.load(provider="huggingface-hub", container="neuml/txtai-wikipedia-slim")

    # Translation instance
    translate = Translation()

    workflow = Workflow([
        Task(lambda x: [y[0]["text"] for y in wiki.batchsearch(x, 1)]),
        Task(lambda x: translate(x, "fr"))
    ])

    print(list(workflow(["Roman Empire", "Greek Empire", "Industrial Revolution"])))

workflow

Agent

The last example runs a txtai agent designed to research questions on astronomy.

from txtai import Agent, Embeddings

def search(query):
    """
    Searches a database of astronomy data.

    Make sure to call this tool only with a string input, never use JSON.    

    Args:
        query: concepts to search for using similarity search

    Returns:
        list of search results with for each match
    """

    return embeddings.search(
        "SELECT id, text, distance FROM txtai WHERE similar(:query)",
        10, parameters={"query": query}
    )

embeddings = Embeddings()
embeddings.load(provider="huggingface-hub", container="neuml/txtai-astronomy")

agent = Agent(
    tools=[search],
    llm="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    max_iterations=10,
)

researcher = """
{command}

Do the following.
 - Search for results related to the topic.
 - Analyze the results
 - Continue querying until conclusive answers are found
 - Write a Markdown report
"""

with mlflow.start_run():
    agent(researcher.format(command="""
    Write a detailed list with explanations of 10 candidate stars that could potentially be habitable to life.
    """), maxlength=16000)

agent

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_txtai-0.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_txtai-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_txtai-0.1.0.tar.gz.

File metadata

  • Download URL: mlflow_txtai-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for mlflow_txtai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f9205efe4914e2d8e67e328028bf3af03fe6492fea5342e1ec763488a16841b
MD5 3a1b06e6a25993c8e4bd13ca86c097f0
BLAKE2b-256 c0e6929065168c09f9a2644dded684bb7db85d660259690be691c8d60db3075a

See more details on using hashes here.

File details

Details for the file mlflow_txtai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlflow_txtai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for mlflow_txtai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6175cffd815f90c1022e303d5e6fe758c5a811f37c692c6437827ac93def19d
MD5 ae02e2f4584c3a727a0e4e0926e935b2
BLAKE2b-256 3110f24c043c2ae7e3c22ffdcc342bcb27cb36048a0d1ea4b4979443c10a7f3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page