A trained RL sales policy that augments any LLM: it picks the strategic move, your LLM writes the words.

These details have not been verified by PyPI

Project links

Project description

rl-sales-augment

A trained RL sales policy that augments any LLM. The reinforcement-learning policy has learned, from a chaotic multi-segment sales world, which strategic move works next given the buyer's state. At serve time it reads the conversation, picks the move (RAPPORT, PITCH, HANDLE_OBJECTION, DISCOUNT, CLOSE, ...), and your LLM writes the words. The policy is bundled in the package — no training or GPU required to use it.

The LLM handles language and empathy; the RL policy supplies the timing and strategy the LLM can't get from its priors. In a grounded conversational A/B, the same GPT/Gemini/Claude closes far more deals with the policy than without it.

▶ 67-second demo — real paired transcripts: the same LLM answers objections forever (no close) vs closes in 8 turns with the policy choosing the moves.

Install

pip install rl-sales-augment                 # core (numpy + torch) + the bundled model
pip install "rl-sales-augment[gemini]"       # + Google Gemini (Vertex or API key)
pip install "rl-sales-augment[openai]"       # + OpenAI
pip install "rl-sales-augment[anthropic]"    # + Anthropic Claude
pip install "rl-sales-augment[gemma]"        # + local Gemma 4 via transformers (Python >=3.10)
pip install "rl-sales-augment[all]"          # everything

The core installs on any Python that PyTorch supports (3.9–3.13); provider SDKs and the local Gemma path are optional extras.

Quickstart

import rl_sales_augment as rsa

# 1. pick any LLM  (or bring your own: gen = lambda prompt: my_llm(prompt))
gen = rsa.providers.gemini_vertex(project="my-gcp-project")     # uses your gcloud ADC, no API key
# gen = rsa.providers.openai_chat(model="gpt-4o")
# gen = rsa.providers.anthropic_chat(model="claude-opus-4-8")

# 2. load the bundled policy and wrap the LLM (optionally ground it in your company's facts)
bot = rsa.load_agent(gen, company_ctx="""
Company: NimbusEdge. Products: NimbusBox (on-prem appliance, ~$8k), NimbusOne (edge SaaS, ~$2k/mo).
Edge: 1-day deploy, ~30% lower TCO than hyperscalers.
""")
bot.new_conversation(segment=7)     # optional bias (0-9, see rsa.SEG_NAMES)

# 3. converse — perception -> RL move -> grounded reply, with internal memory
out = bot.reply("honestly it feels expensive compared to just using AWS")
print(out["chosen_move"])   # e.g. 'RAPPORT'  (the RL-chosen strategy)
print(out["belief"])        # perceived buyer state {interest, trust, budget_fit, objection, patience}
print(out["reply"])         # the words your LLM produced for that move

bot keeps its own memory (belief state + full history), so it works over any stateless LLM API.

Providers & models

Every provider takes a model= argument (defaults reflect mid-2026 lineups; pass any id you have):

rsa.providers.gemini_vertex(project="...", model="gemini-3.5-flash")   # or gemini-3.5-pro
rsa.providers.gemini_api(model="gemini-3.5-flash")                     # AI Studio key
rsa.providers.openai_chat(model="gpt-5.5")                             # or gpt-5.4, gpt-5.6-*
rsa.providers.anthropic_chat(model="claude-sonnet-5")                  # or claude-opus-4-8
rsa.providers.gemma_e2b(model="google/gemma-4-E2B-it")                 # local, needs [gemma]

Conversation history & chat templates

The agent keeps the full conversation and sends it to the LLM as native chat turns (proper system instruction + user/assistant roles), not history flattened into one string — so the model has real multi-turn context. The RL-chosen move for the current turn goes in the system prompt.

Multi-turn conversations

Two ways to run a conversation. Stateful — the agent remembers, you just keep calling reply() as the user asks the next question, and the next:

bot = rsa.load_agent(gen, company_ctx="...")
bot.new_conversation(segment=7)

print(bot.reply("hey, what does NimbusBox actually cost?")["reply"])   # turn 1
print(bot.reply("hmm, and how is that cheaper than AWS?")["reply"])    # turn 2 — remembers turn 1
print(bot.reply("ok. what would rollout look like for us?")["reply"])  # turn 3 — full context

Stateless (ChatGPT template) — pass the whole conversation in OpenAI message format each call; ideal behind an API where the client owns the history:

messages = [
    {"role": "system", "content": "Extra facts for this call (optional)."},
    {"role": "user", "content": "hey, what does NimbusBox cost?"},
    {"role": "assistant", "content": "Depends on the setup. What are you running today?"},
    {"role": "user", "content": "a few old racks. honestly budget is tight this quarter"},
]
out = bot.chat(messages)         # rebuilds belief from the history, RL picks the move
print(out["chosen_move"], "->", out["reply"])
messages.append({"role": "assistant", "content": out["reply"]})   # ...and continue the loop

REST API (FastAPI)

Serve it as a web service — a complete server is in examples/fastapi_server.py:

pip install "rl-sales-augment[gemini,api]"      # [api] = fastapi + uvicorn
GCP_PROJECT=my-project uvicorn fastapi_server:app --port 8000

from fastapi import FastAPI
import rl_sales_augment as rsa

app = FastAPI()
bot = rsa.load_agent(rsa.providers.gemini_vertex(project="my-project"), company_ctx="...")

@app.post("/v1/chat")
def chat(payload: dict):                        # {"messages": [...OpenAI format...]}
    return bot.chat(payload["messages"])        # {chosen_move, reply, belief, history_len}

curl -X POST localhost:8000/v1/chat -H 'Content-Type: application/json' -d '{
  "messages": [{"role": "user", "content": "honestly it feels expensive vs AWS"}]}'

Bring your own LLM / any API

Pass any generate_fn to load_agent. Two signatures are supported:

bot = rsa.load_agent(lambda prompt: my_client.complete(prompt))          # simplest: prompt -> str

def gen(prompt="", *, system=None, history=None) -> str:                 # richer: native chat + history
    msgs = ([{"role": "system", "content": system}] if system else []) + (history or [])
    if prompt: msgs.append({"role": "user", "content": prompt})
    return my_client.chat(msgs)
bot = rsa.load_agent(gen)

For any OpenAI-compatible endpoint (vLLM, Together, Groq, OpenRouter, a local server), just point openai_chat at it: rsa.providers.openai_chat(base_url="https://...", api_key="...", model="..."). A prompt containing "Return ONLY JSON" (the perception step) is decoded greedily.

Gemma 4 E2B (open weights)

The policy was trained alongside Google's Gemma 4 E2B (not gated). Two ways to use it — both need pip install "rl-sales-augment[gemma]" and run on MPS / CUDA / CPU (auto-detected):

import rl_sales_augment as rsa

# 1) simple: Gemma writes the words for the portable agent (like any other LLM)
gen = rsa.providers.gemma_e2b()               # downloads google/gemma-4-E2B-it on first use
bot = rsa.load_agent(gen, company_ctx="...")

# 2) Gemma-native: a SalesBot with the bundled experience bridge + trained style reranker,
#    the open-weights-only path that can inject the RL 'experience' latent into Gemma's
#    residual stream (the "common latent space")
bot = rsa.load_gemma_bot(company_ctx="...")   # or pass a local path to avoid re-download
out = bot.reply("we keep getting random crashes")

Honest note: the bundled bridge's output layer is zero-initialised, so route 2's latent injection is currently inert — augmentation runs at the prompt level (the RL move) plus Gemma-native perception and best-of-N style reranking. Run the bridge-alignment step to activate the latent path. Route 1 gives you a fully working Gemma bot today.

Connect via MCP

Expose the RL policy as an MCP server, so any MCP client (Claude Desktop, Cursor, Windsurf, ...) can call it. The client's LLM does perception and writes the words; the server supplies the RL strategy (the tiny policy runs on CPU, no LLM on the server).

pip install "rl-sales-augment[mcp]"
rl-sales-augment-mcp                 # stdio server (or: rl-sales-augment-mcp streamable-http)

{
  "mcpServers": {
    "rl-sales-augment": { "command": "rl-sales-augment-mcp" }
  }
}

Tools: next_move (perceived belief → RL-chosen move + what it should accomplish), perception_prompt (rubric to estimate the belief from a conversation), list_moves, list_segments. The client's workflow: estimate the buyer's state → call next_move → write the reply that executes the returned move.

What's in the box

Symbol	Meaning
`rsa.load_agent(gen, ...)`	load the bundled policy, wrap an LLM → an `AugmentedAgent`
`rsa.AugmentedAgent`	the portable serving agent (policy + perception + memory)
`rsa.providers`	`gemini_vertex`, `gemini_api`, `openai_chat`, `anthropic_chat`, `gemma_e2b`, `local_gemma`
`rsa.load_gemma_bot(...)`	Gemma-native `SalesBot` (open-weights experience-injection path)
`rl-sales-augment-mcp`	MCP server exposing the policy as tools (`[mcp]` extra)
`rsa.MODEL_PATH`	filesystem path to the bundled `rl_sales_agent.pt`
`rsa.estimate_state_via(gen, history)`	the perception step alone (LLM → belief JSON)
`rsa.ACTION_NAMES`, `rsa.SEG_NAMES`	the 8 moves and 10 market segments
`rsa.SalesWorld`, `rsa.SalesConfig`	the world model (for retraining / evaluation)

Evidence (grounded conversational A/B)

A hidden verified sales world is the outcome oracle; the same LLM plays with and without the policy.

vs Gemini 3.5 Flash (16 paired conversations): with-RL closed 100% in ~8 turns at ~3.5× the revenue of pure Gemini (which gets trapped answering objections and rarely asks for the sale).
Quantitative (~49k quarters): RL 3.2× revenue and 94% win rate vs the best hand-tuned heuristic; emergent segment-specific tactics (DISCOUNT for price-sensitive SMB, HANDLE_OBJECTION for enterprise), and it only CLOSEs when the deal is actually ripe.

Honest caveat: the policy's strength is timing/strategy, not magic. The perception step reads the buyer's state from the conversation; feed it strong signals and it will reach the close. See the project's EVALUATION.md, GEMINI_AB.md, and LOCAL_GEMMA_AB.md for full transcripts and caveats.

Training & company fine-tuning (commercial)

This package distributes the trained policy and the serving stack only. The world model's dynamics, the GPU-vectorized training pipeline, and company-specific fine-tuning (aligning the policy and its knowledge to your company from your documents) are the proprietary training stack of Convai Innovations Pvt. Ltd. For a policy trained on your market, your segments, and your playbook, contact nandakishor@convaiinnovations.com. The bundled policy is ready to serve as-is.

License

GNU AGPL-3.0-or-later. Copyright (C) 2026 Nandakishor M, Convai Innovations Pvt. Ltd. If you run a modified version of this software as a network service, the AGPL requires you to offer its source to the users of that service. For commercial licensing outside the AGPL, contact nandakishor@convaiinnovations.com.

Credits

Built by Nandakishor M (Convai Innovations Pvt. Ltd.) with Claude (Anthropic) as engineering co-author — architecture, evaluation harnesses, and packaging were pair-built end to end.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jul 2, 2026

0.3.0

Jul 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl_sales_augment-0.4.0.tar.gz (16.4 MB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rl_sales_augment-0.4.0-py3-none-any.whl (16.4 MB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file rl_sales_augment-0.4.0.tar.gz.

File metadata

Download URL: rl_sales_augment-0.4.0.tar.gz
Upload date: Jul 2, 2026
Size: 16.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rl_sales_augment-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`746eb280548df40362cb6a60d5019cd629226bc7a9a7d9e2da10c2f4af6d770c`
MD5	`55305c37f7773dc80c016939ea45a7b3`
BLAKE2b-256	`f1a1efdf14d786c83a8282bffeb01112dd49132bce9a384a05e51d9c2c945b74`

See more details on using hashes here.

File details

Details for the file rl_sales_augment-0.4.0-py3-none-any.whl.

File metadata

Download URL: rl_sales_augment-0.4.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 16.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rl_sales_augment-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbc16c276f342ebc4cee8e667a9862973bcd7f3778077f2c07b76f6e799ec4f0`
MD5	`a7763fd179d4535297cd2ec0a3b2fc80`
BLAKE2b-256	`eb80103349853e88c2e53b3abd1e07879e457501afa622071d6659fef7c2b11b`

See more details on using hashes here.

rl-sales-augment 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rl-sales-augment

Install

Quickstart

Providers & models

Conversation history & chat templates

Multi-turn conversations

REST API (FastAPI)

Bring your own LLM / any API

Gemma 4 E2B (open weights)

Connect via MCP

What's in the box

Evidence (grounded conversational A/B)

Training & company fine-tuning (commercial)

License

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes