Skip to main content

A trained RL sales policy that augments any LLM: it picks the strategic move, your LLM writes the words.

Project description

rl-sales-augment

PyPI Python License: AGPL-3.0 MCP

A trained RL sales policy that augments any LLM. The reinforcement-learning policy has learned, from a chaotic multi-segment sales world, which strategic move works next given the buyer's state. At serve time it reads the conversation, picks the move (RAPPORT, PITCH, HANDLE_OBJECTION, DISCOUNT, CLOSE, ...), and your LLM writes the words. The policy is bundled in the package — no training or GPU required to use it.

The LLM handles language and empathy; the RL policy supplies the timing and strategy the LLM can't get from its priors. In a grounded conversational A/B, the same GPT/Gemini/Claude closes far more deals with the policy than without it.

▶ 67-second demo — real paired transcripts: the same LLM answers objections forever (no close) vs closes in 8 turns with the policy choosing the moves.

Install

pip install rl-sales-augment                 # core (numpy + torch) + the bundled model
pip install "rl-sales-augment[gemini]"       # + Google Gemini (Vertex or API key)
pip install "rl-sales-augment[openai]"       # + OpenAI
pip install "rl-sales-augment[anthropic]"    # + Anthropic Claude
pip install "rl-sales-augment[gemma]"        # + local Gemma 4 via transformers (Python >=3.10)
pip install "rl-sales-augment[all]"          # everything

The core installs on any Python that PyTorch supports (3.9–3.13); provider SDKs and the local Gemma path are optional extras.

Quickstart

import rl_sales_augment as rsa

# 1. pick any LLM  (or bring your own: gen = lambda prompt: my_llm(prompt))
gen = rsa.providers.gemini_vertex(project="my-gcp-project")     # uses your gcloud ADC, no API key
# gen = rsa.providers.openai_chat(model="gpt-4o")
# gen = rsa.providers.anthropic_chat(model="claude-opus-4-8")

# 2. load the bundled policy and wrap the LLM (optionally ground it in your company's facts)
bot = rsa.load_agent(gen, company_ctx="""
Company: NimbusEdge. Products: NimbusBox (on-prem appliance, ~$8k), NimbusOne (edge SaaS, ~$2k/mo).
Edge: 1-day deploy, ~30% lower TCO than hyperscalers.
""")
bot.new_conversation(segment=7)     # optional bias (0-9, see rsa.SEG_NAMES)

# 3. converse — perception -> RL move -> grounded reply, with internal memory
out = bot.reply("honestly it feels expensive compared to just using AWS")
print(out["chosen_move"])   # e.g. 'RAPPORT'  (the RL-chosen strategy)
print(out["belief"])        # perceived buyer state {interest, trust, budget_fit, objection, patience}
print(out["reply"])         # the words your LLM produced for that move

bot keeps its own memory (belief state + full history), so it works over any stateless LLM API.

Providers & models

Every provider takes a model= argument (defaults reflect mid-2026 lineups; pass any id you have):

rsa.providers.gemini_vertex(project="...", model="gemini-3.5-flash")   # or gemini-3.5-pro
rsa.providers.gemini_api(model="gemini-3.5-flash")                     # AI Studio key
rsa.providers.openai_chat(model="gpt-5.5")                             # or gpt-5.4, gpt-5.6-*
rsa.providers.anthropic_chat(model="claude-sonnet-5")                  # or claude-opus-4-8
rsa.providers.gemma_e2b(model="google/gemma-4-E2B-it")                 # local, needs [gemma]

Conversation history & chat templates

The agent keeps the full conversation and sends it to the LLM as native chat turns (proper system instruction + user/assistant roles), not history flattened into one string — so the model has real multi-turn context. The RL-chosen move for the current turn goes in the system prompt.

Multi-turn conversations

Two ways to run a conversation. Stateful — the agent remembers, you just keep calling reply() as the user asks the next question, and the next:

bot = rsa.load_agent(gen, company_ctx="...")
bot.new_conversation(segment=7)

print(bot.reply("hey, what does NimbusBox actually cost?")["reply"])   # turn 1
print(bot.reply("hmm, and how is that cheaper than AWS?")["reply"])    # turn 2 — remembers turn 1
print(bot.reply("ok. what would rollout look like for us?")["reply"])  # turn 3 — full context

Stateless (ChatGPT template) — pass the whole conversation in OpenAI message format each call; ideal behind an API where the client owns the history:

messages = [
    {"role": "system", "content": "Extra facts for this call (optional)."},
    {"role": "user", "content": "hey, what does NimbusBox cost?"},
    {"role": "assistant", "content": "Depends on the setup. What are you running today?"},
    {"role": "user", "content": "a few old racks. honestly budget is tight this quarter"},
]
out = bot.chat(messages)         # rebuilds belief from the history, RL picks the move
print(out["chosen_move"], "->", out["reply"])
messages.append({"role": "assistant", "content": out["reply"]})   # ...and continue the loop

REST API (FastAPI)

Serve it as a web service — a complete server is in examples/fastapi_server.py:

pip install "rl-sales-augment[gemini,api]"      # [api] = fastapi + uvicorn
GCP_PROJECT=my-project uvicorn fastapi_server:app --port 8000
from fastapi import FastAPI
import rl_sales_augment as rsa

app = FastAPI()
bot = rsa.load_agent(rsa.providers.gemini_vertex(project="my-project"), company_ctx="...")

@app.post("/v1/chat")
def chat(payload: dict):                        # {"messages": [...OpenAI format...]}
    return bot.chat(payload["messages"])        # {chosen_move, reply, belief, history_len}
curl -X POST localhost:8000/v1/chat -H 'Content-Type: application/json' -d '{
  "messages": [{"role": "user", "content": "honestly it feels expensive vs AWS"}]}'

Bring your own LLM / any API

Pass any generate_fn to load_agent. Two signatures are supported:

bot = rsa.load_agent(lambda prompt: my_client.complete(prompt))          # simplest: prompt -> str

def gen(prompt="", *, system=None, history=None) -> str:                 # richer: native chat + history
    msgs = ([{"role": "system", "content": system}] if system else []) + (history or [])
    if prompt: msgs.append({"role": "user", "content": prompt})
    return my_client.chat(msgs)
bot = rsa.load_agent(gen)

For any OpenAI-compatible endpoint (vLLM, Together, Groq, OpenRouter, a local server), just point openai_chat at it: rsa.providers.openai_chat(base_url="https://...", api_key="...", model="..."). A prompt containing "Return ONLY JSON" (the perception step) is decoded greedily.

Gemma 4 E2B (open weights)

The policy was trained alongside Google's Gemma 4 E2B (not gated). Two ways to use it — both need pip install "rl-sales-augment[gemma]" and run on MPS / CUDA / CPU (auto-detected):

import rl_sales_augment as rsa

# 1) simple: Gemma writes the words for the portable agent (like any other LLM)
gen = rsa.providers.gemma_e2b()               # downloads google/gemma-4-E2B-it on first use
bot = rsa.load_agent(gen, company_ctx="...")

# 2) Gemma-native: a SalesBot with the bundled experience bridge + trained style reranker,
#    the open-weights-only path that can inject the RL 'experience' latent into Gemma's
#    residual stream (the "common latent space")
bot = rsa.load_gemma_bot(company_ctx="...")   # or pass a local path to avoid re-download
out = bot.reply("we keep getting random crashes")

Honest note: the bundled bridge's output layer is zero-initialised, so route 2's latent injection is currently inert — augmentation runs at the prompt level (the RL move) plus Gemma-native perception and best-of-N style reranking. Run the bridge-alignment step to activate the latent path. Route 1 gives you a fully working Gemma bot today.

Connect via MCP

Expose the RL policy as an MCP server, so any MCP client (Claude Desktop, Cursor, Windsurf, ...) can call it. The client's LLM does perception and writes the words; the server supplies the RL strategy (the tiny policy runs on CPU, no LLM on the server).

pip install "rl-sales-augment[mcp]"
rl-sales-augment-mcp                 # stdio server (or: rl-sales-augment-mcp streamable-http)

Register it with your client (e.g. Claude Desktop / Cursor mcp.json):

{
  "mcpServers": {
    "rl-sales-augment": { "command": "rl-sales-augment-mcp" }
  }
}

Tools: next_move (perceived belief → RL-chosen move + what it should accomplish), perception_prompt (rubric to estimate the belief from a conversation), list_moves, list_segments. The client's workflow: estimate the buyer's state → call next_move → write the reply that executes the returned move.

What's in the box

Symbol Meaning
rsa.load_agent(gen, ...) load the bundled policy, wrap an LLM → an AugmentedAgent
rsa.AugmentedAgent the portable serving agent (policy + perception + memory)
rsa.providers gemini_vertex, gemini_api, openai_chat, anthropic_chat, gemma_e2b, local_gemma
rsa.load_gemma_bot(...) Gemma-native SalesBot (open-weights experience-injection path)
rl-sales-augment-mcp MCP server exposing the policy as tools ([mcp] extra)
rsa.MODEL_PATH filesystem path to the bundled rl_sales_agent.pt
rsa.estimate_state_via(gen, history) the perception step alone (LLM → belief JSON)
rsa.ACTION_NAMES, rsa.SEG_NAMES the 8 moves and 10 market segments
rsa.SalesWorld, rsa.SalesConfig the world model (for retraining / evaluation)

Evidence (grounded conversational A/B)

A hidden verified sales world is the outcome oracle; the same LLM plays with and without the policy.

  • vs Gemini 3.5 Flash (16 paired conversations): with-RL closed 100% in ~8 turns at ~3.5× the revenue of pure Gemini (which gets trapped answering objections and rarely asks for the sale).
  • Quantitative (~49k quarters): RL 3.2× revenue and 94% win rate vs the best hand-tuned heuristic; emergent segment-specific tactics (DISCOUNT for price-sensitive SMB, HANDLE_OBJECTION for enterprise), and it only CLOSEs when the deal is actually ripe.

Honest caveat: the policy's strength is timing/strategy, not magic. The perception step reads the buyer's state from the conversation; feed it strong signals and it will reach the close. See the project's EVALUATION.md, GEMINI_AB.md, and LOCAL_GEMMA_AB.md for full transcripts and caveats.

Training & company fine-tuning (commercial)

This package distributes the trained policy and the serving stack only. The world model's dynamics, the GPU-vectorized training pipeline, and company-specific fine-tuning (aligning the policy and its knowledge to your company from your documents) are the proprietary training stack of Convai Innovations Pvt. Ltd. For a policy trained on your market, your segments, and your playbook, contact nandakishor@convaiinnovations.com. The bundled policy is ready to serve as-is.

License

GNU AGPL-3.0-or-later. Copyright (C) 2026 Nandakishor M, Convai Innovations Pvt. Ltd. If you run a modified version of this software as a network service, the AGPL requires you to offer its source to the users of that service. For commercial licensing outside the AGPL, contact nandakishor@convaiinnovations.com.

Credits

Built by Nandakishor M (Convai Innovations Pvt. Ltd.) with Claude (Anthropic) as engineering co-author — architecture, evaluation harnesses, and packaging were pair-built end to end.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl_sales_augment-0.4.0.tar.gz (16.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rl_sales_augment-0.4.0-py3-none-any.whl (16.4 MB view details)

Uploaded Python 3

File details

Details for the file rl_sales_augment-0.4.0.tar.gz.

File metadata

  • Download URL: rl_sales_augment-0.4.0.tar.gz
  • Upload date:
  • Size: 16.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rl_sales_augment-0.4.0.tar.gz
Algorithm Hash digest
SHA256 746eb280548df40362cb6a60d5019cd629226bc7a9a7d9e2da10c2f4af6d770c
MD5 55305c37f7773dc80c016939ea45a7b3
BLAKE2b-256 f1a1efdf14d786c83a8282bffeb01112dd49132bce9a384a05e51d9c2c945b74

See more details on using hashes here.

File details

Details for the file rl_sales_augment-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rl_sales_augment-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cbc16c276f342ebc4cee8e667a9862973bcd7f3778077f2c07b76f6e799ec4f0
MD5 a7763fd179d4535297cd2ec0a3b2fc80
BLAKE2b-256 eb80103349853e88c2e53b3abd1e07879e457501afa622071d6659fef7c2b11b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page