mlverse-mall

Run multiple 'Large Language Model' predictions against a table. The predictions run row-wise over a specified column.

These details have not been verified by PyPI

Project links

Project description

Use Large Language Models (LLM) to run Natural Language Processing (NLP) operations against your data. It takes advantage of the LLMs general language training in order to get the predictions, thus removing the need to train a new traditional NLP model. mall is available for R and Python.

It works by running multiple LLM predictions against your data. The predictions are processed row-wise over a specified column. The package includes prompts to perform the following specific NLP operations:

Sentiment analysis
Text summarizing
Classify text
Extract one, or several, specific pieces information from the text
Translate text
Verify that something is true about the text (binary)

For other NLP operations, mall offers the ability for you to write your own prompt.

mall lets you use local and external LLMs such as OpenAI, Gemini and Anthropic. It uses chatlas package to integrate to perform the integration. It is a library extension to Polars. To interact with Ollama, it uses the official Python library.

Get started

Install mall:

From PyPi:
```
pip install mlverse-mall
```

Install mall from Github

pip install "mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python"

LLM functions

We will start with loading a very small data set contained in mall. It has 3 product reviews that we will use as the source of our examples.

import mall 
reviews = mall.MallData.reviews
reviews

review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"

Because mall is loaded, the reviews Polars data frame contain a class named llm. This is the class that enables access to all of the NLP functions.

Setup

The connection to the LLM is created via a Chat object from chatlas. For this article, an Ollama chat connection is created:

from chatlas import ChatOllama
chat = ChatOllama(model = "llama3.2", seed = 100)

Now, reviews is “told” to use the chat object by calling .llm.use(). In this case, the _cache path is set in order to re-run render this article faster as edits are made to the prose:

reviews.llm.use(chat, _cache = "_readme_cache")

{'backend': 'chatlas',
 'chat': <Chat Ollama/llama3.2 turns=0 tokens=0/0>,
 '_cache': '_readme_cache'}

Sentiment

Automatically returns “positive”, “negative”, or “neutral” based on the text.

reviews.llm.sentiment("review")

review	sentiment
"This has been the best TV I've ever used. Great screen, and sound."	"positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"neutral"

Summarize

There may be a need to reduce the number of words in a given text. Typically to make it easier to understand its intent. The function has an argument to control the maximum number of words to output (max_words):

reviews.llm.summarize("review", 5)

review	summary
"This has been the best TV I've ever used. Great screen, and sound."	"exceptional tv for its price"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"not a good laptop purchase"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"some assembly required included"

Classify

Use the LLM to categorize the text into one of the options you provide:

reviews.llm.classify("review", ["computer", "appliance"])

review	classify
"This has been the best TV I've ever used. Great screen, and sound."	"appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"appliance"

Extract

One of the most interesting use cases Using natural language, we can tell the LLM to return a specific part of the text. In the following example, we request that the LLM return the product being referred to. We do this by simply saying “product”. The LLM understands what we mean by that word, and looks for that in the text.

reviews.llm.extract("review", "product")

review	extract
"This has been the best TV I've ever used. Great screen, and sound."	"tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"washing machine"

Classify

Use the LLM to categorize the text into one of the options you provide:

reviews.llm.classify("review", ["computer", "appliance"])

review	classify
"This has been the best TV I've ever used. Great screen, and sound."	"appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"appliance"

Verify

This functions allows you to check and see if a statement is true, based on the provided text. By default, it will return a 1 for “yes”, and 0 for “no”. This can be customized.

reviews.llm.verify("review", "is the customer happy with the purchase")

review	verify
"This has been the best TV I've ever used. Great screen, and sound."	1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	0

Translate

As the title implies, this function will translate the text into a specified language. What is really nice, it is that you don’t need to specify the language of the source text. Only the target language needs to be defined. The translation accuracy will depend on the LLM

reviews.llm.translate("review", "spanish")

review	translation
"This has been the best TV I've ever used. Great screen, and sound."	"Este ha sido la mejor televisión que he utilizado. Una pantalla excelente y buena calidad de sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"Me arrepiento de haber comprado este portátil. Es demasiado lento y la tecla tiene un ruido excesivo…
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"No estoy seguro de cómo sentirme sobre mi nueva lavadora. Una buena cromática, pero difícil de compr…

Custom prompt

It is possible to pass your own prompt to the LLM, and have mall run it against each text entry:

my_prompt = (
    "Answer a question."
    "Return only the answer, no explanation."
    "Only 'yes' and 'no' are the acceptable answers."
    "If unsure about the answer, return 'no'."
    "Answer this about the following text: 'is this a happy customer?':"
)

reviews.llm.custom("review", prompt = my_prompt)

review	custom
"This has been the best TV I've ever used. Great screen, and sound."	"Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"No"

Results caching

By default mall caches the requests and corresponding results from a given LLM run. Each response is saved as individual JSON files. By default, the folder name is _mall_cache. The folder name can be customized, if needed. Also, the caching can be turned off by setting the argument to empty ("").

reviews.llm.use(_cache = "my_cache")

To turn off:

reviews.llm.use(_cache = "")

Vectors

mall also includes a class to work with character vectors. This is a separate module from that of the Polars extension, but offers the same functionality. To start, import the LLMVec class from mall, and then assign it to a new variable. The function call works just like <df>.llm.use(), this is where the cache can be specified.

from mall import LLMVec
llm_ollama = LLMVec(chat, _cache="_readme_cache")

To use, call the same NLP functions used data frames. For example sentiment:

llm_ollama.sentiment(["I am happy", "I am sad"])

['positive', 'negative']

The functions will also return a character vector. As mentioned before, all of the same functions are accessible via this class:

Classify
Custom
Extract
Sentiment
Summarize
Translate
Verify

Key considerations

The main consideration is cost. Either, time cost, or money cost.

If using this method with an LLM locally available, the cost will be a long running time. Unless using a very specialized LLM, a given LLM is a general model. It was fitted using a vast amount of data. So determining a response for each row, takes longer than if using a manually created NLP model. The default model used in Ollama is Llama 3.2, which was fitted using 3B parameters.

If using an external LLM service, the consideration will need to be for the billing costs of using such service. Keep in mind that you will be sending a lot of data to be evaluated.

Another consideration is the novelty of this approach. Early tests are providing encouraging results. But you, as an user, will still need to keep in mind that the predictions will not be infallible, so always check the output. At this time, I think the best use for this method, is for a quick analysis.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Aug 18, 2025

0.1.0

Oct 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlverse_mall-0.2.0.tar.gz (362.6 kB view details)

Uploaded Aug 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlverse_mall-0.2.0-py3-none-any.whl (12.9 kB view details)

Uploaded Aug 18, 2025 Python 3

File details

Details for the file mlverse_mall-0.2.0.tar.gz.

File metadata

Download URL: mlverse_mall-0.2.0.tar.gz
Upload date: Aug 18, 2025
Size: 362.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for mlverse_mall-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0292c4603e3f137798cffc71de9a52a8c59ec915ebbcaf8f96afee26c9452783`
MD5	`eae27ae6bea898e83507998e244cfe83`
BLAKE2b-256	`15e3faeb46e692e704abfc572c6c7150632798aefcc06005e3be20361941c7b7`

See more details on using hashes here.

File details

Details for the file mlverse_mall-0.2.0-py3-none-any.whl.

File metadata

Download URL: mlverse_mall-0.2.0-py3-none-any.whl
Upload date: Aug 18, 2025
Size: 12.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for mlverse_mall-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56895d8eda45f46d9cdf53875e892565eff8e5319969d9043dfe2fa922cf9f62`
MD5	`05c670e43d48a898090756e8e8265948`
BLAKE2b-256	`064d3bf0b11523bfc4b7e33f114843b9510b11ea5c2c1c0351d0a5fda14af45a`

See more details on using hashes here.

mlverse-mall 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Get started

LLM functions

Setup

Sentiment

Summarize

Classify

Extract

Classify

Verify

Translate

Custom prompt

Results caching

Vectors

Key considerations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes