Skip to main content

Specialized multi-agent orchestrator with cyclic personal/organizational memory and skills

Project description

Praxia

๐ŸŒ Live: praxia.tools (primary, Cloudflare) ยท praxia-dev.github.io/praxia (mirror, GitHub Pages)

Praxia hero

Specialized Multi-Agent Orchestrator with Cyclic Personal/Organizational Memory

A workflow-specific multi-agent orchestrator that automatically promotes individual tacit knowledge into organizational know-how. Built on a 5-layer memory stack with three independent promotion paths.

License: Apache 2.0 Python: 3.11+ Status: Alpha Tests: 431 Connectors: 20 Languages: 8 MCP: stdio + HTTP/SSE

๐Ÿ” Complete feature reference: docs/FEATURES.md ๐Ÿ“Š Concrete Before/After tables: docs/use-cases.md


๐ŸŽฏ Why Praxia?

General-purpose multi-agent frameworks (CrewAI, AutoGen, LangGraph, โ€ฆ) are powerful but stop short on these four problems:

Problem with existing frameworks Praxia's approach
Setup is complex; production deployment is hard Workflow-specific templates (sales prep / logic check / RAG optimization) that run in 5 minutes
Senior-engineer "magic prompts" stay locked in one person's editor Personal-to-org auto-promotion pipeline built in
"It works" doesn't prove "it works well" Hallucination detection + retrieval evals shipped by default
Agents stagnate after launch Sleep-time consolidation distills your past flows nightly

Praxia turns "one expert's drawer" into "everyone's best practices."


๐Ÿ‘ฅ Who Praxia is for

Persona What they need How Praxia fits Typical year-1 result
๐Ÿข Information Systems / Platform team (300โ€“5,000 employees) Roll out AI tools without paywalled SSO/RBAC/audit, on-prem option Auth + RBAC + ACL + per-user OAuth + audit log all in OSS, self-hostable 100 KW ร— ~$1.25M net benefit, full audit trail
๐Ÿ—๏ธ Engineering / Product VP (50โ€“500 in scope) Senior architect bottleneck; junior PM 12โ€“18mo ramp DesignSkill + sleep-time consolidation distills senior review patterns; Markdown+git frozen layer fits PR workflow Senior load 16h/wk โ†’ 4h/wk; junior ramp 6โ€“9mo
โš–๏ธ Legal / Compliance lead (regulated industry) 50โ€“100 contracts/mo bottleneck; need auditable AI workflow without lock-in LegalSkill (RACE) + read-only memory mode + per-user OAuth + every action audited; Apache 2.0 source for auditors 60โ€“90min โ†’ 10โ€“15min/contract; throughput 50โ€“80/mo โ†’ 200โ€“300/mo
๐Ÿงช OSS / Research integrator Build domain agent system without re-implementing auth, memory cycling, exporters 7 plugin types (~50 LoC each); use as library, run praxia serve as backend, embed in LangGraph Day-30: domain skill + custom connector + memory cycling working โ€” ~3 weeks ahead of from-scratch

Detailed Before/After by industry: docs/use-cases.md.


๐Ÿ’Ž Why OSS matters here

The capabilities you typically pay enterprise tier for โ€” already in the Apache 2.0 package:

  • SSO + RBAC + audit are not paywalled. OIDC SSO (Google / Microsoft / Okta / GitHub / Keycloak) is in the OSS. Most agent frameworks ship without it; most agent platforms paywall it. Praxia treats it as table stakes.
  • Memory format is not locked in. Layer 4 is plain Markdown in your git repo. Layer 3 exports to JSONL. Layer 1 is your chosen backend's native format. Leaving costs nothing.
  • You can read every line. Apache 2.0. Show the source to your auditors, your security team, your customers.
  • Multi-LTM ensembles, not single-vendor. Run Mem0 + Zep + HindSight in parallel, fuse with RRF, or route per query. No commercial agent platform exposes this โ€” they pick a backend and lock you in.
  • Per-user OAuth respects external ACL. When Alice pulls from Box, Box's own ACL applies โ€” Alice only sees what Alice can see. Service-account designs (typical SaaS shortcut) leak data across users.
  • Air-gapped operation. PRAXIA_LOCAL_MODEL=gemma, Ollama, backend=json โ€” no cloud LLM, no cloud vector DB, no telemetry. Same code as cloud customers.
  • Production-grade OAuth + KMS in OSS. Multi-worker safe state cache, 5 KMS adapters (AWS / Azure / GCP / Vault / local). Most agent platforms paywall this; Praxia ships it.
  • A/B experiments + quality eval included. Test prompt variants on real users with deterministic assignment; catch LLM output quality regressions in CI.

๐Ÿ— Architecture โ€” 5-Layer Memory Stack

Architecture diagram

The same picture as ASCII art:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  AI Agents (Skills + MCP)                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ Users just have normal conversations
               โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Layer 1: Personal memory (auto-extracted)                 โ•‘
โ•‘   Mem0 / LangMem / HindSight / Letta / Zep / JSON         โ•‘
โ•‘   namespace = user_id                                     โ•‘
โ•‘   โ˜… Zero-effort tacit-knowledge capture                   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
               โ”‚ Sleep-time Consolidation (nightly batch)
               โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Layer 2: Distillation & promotion engine                  โ•‘
โ•‘   Three parallel "validity tests":                        โ•‘
โ•‘     โ‘  Frequency  (recurring across N+ users)              โ•‘
โ•‘     โ‘ก Outcome    (correlated with wins/losses)            โ•‘
โ•‘     โ‘ข Self-eval  (LLM scored)                             โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
               โ”‚ Auto-promote above threshold; queue otherwise
               โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Layer 3: Shared memory (living organizational knowledge)  โ•‘
โ•‘   Letta-style shared blocks; all agents read/write        โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
               โ”‚ PR review for high-impact items
               โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Layer 4: Frozen layer (git-managed best practices)        โ•‘
โ•‘   Markdown + git + PR review                              โ•‘
โ•‘   GitHub Copilot / Cursor Rules-compatible format         โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
               โ”‚ (optional)
               โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Layer 5: Graph layer (only relationship-heavy domains)    โ•‘
โ•‘   Zep / Graphiti โ€” decisions, customer 360, incident DAG  โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Parallel Layer 6: Skills registry
  Personal skills get promoted to the organizational catalog.
  MCP / Claude Skills / Cursor Skills compatible.

Three promotion paths (auto / statistical / manual) run side by side โ€” never depending on a single mechanism.

For details, see docs/architecture.md.


โœจ What's Bundled

Autonomous agent (LLM-driven tool-use loop)

praxia.agent.AutonomousAgent runs an LLM-driven tool-use loop over the full Praxia stack โ€” personal/org memory, skills, frozen layer, connectors โ€” with ACL checks and audit logging built in. The LLM picks tools on its own until it has the information it needs, mirroring how modern code-editing assistants drives its own tool use.

from praxia.agent import AutonomousAgent
from praxia.core.llm import LLM

agent = AutonomousAgent(user_id="alice", org_id="acme", llm=LLM("claude"))
result = agent.run("Tell me what we know about Acme and draft a proposal.")
print(result.final_text)
praxia agent run "Summarize where we stand with Acme this quarter and draft a proposal"
praxia agent tools     # list the 11 built-in tools

The agent is also exposed as a single MCP meta-tool (autonomous_agent) so remote clients (Claude Desktop, Cursor) can delegate an entire investigation without orchestrating individual tools by hand. See FEATURES ยง 38.

3 Specialized Multi-Agent Flows

Flow What it does
SalesAgentFlow Reads customer IR, past minutes, RAG context โ†’ generates hypotheses โ†’ FAQ โ†’ proposal outline
LogicCheckerFlow Three agents (structure / contradiction / reader) review long documents for logical consistency
RAGOptimizationFlow Self-correcting RAG: query expansion โ†’ retrieval โ†’ relevance eval โ†’ hallucination check loop

6 Default Business-Domain Skills

Skill Domain Use cases
InvestmentSkill Investment Equity research, due diligence, portfolio decisions
SalesSkill Sales Account research, proposal drafting, FAQ prep
DesignSkill Engineering Design System design review, requirements engineering
PurchasingSkill Procurement Supplier evaluation, RFQ analysis, TCO, BCP risk
PatentSkill IP / Patent Prior-art search, claims drafting, patent maps
LegalSkill Legal Contract review, compliance, M&A diligence

Each skill serializes to Claude-Skills / MCP-compatible SKILL.md.

Plus two utility skills:

Skill What it does
PromptDesignerSkill Take a one-line task description โ†’ produce a production-grade prompt template (system + user + 2-3 few-shot examples + 5-criterion rubric) tuned for the target LLM (Claude / OpenAI / DeepSeek / Mistral / Llama / โ€ฆ). Save to PromptStore, A/B-test via praxia.experiments.
OutputFormatSkill Detect "PowerPoint ใงๅ‡บใ—ใฆ" / "as Word doc" / etc. in natural language and dispatch to the matching exporter (PPTX / DOCX / HTML / MD / JSON).
# Generate a prompt template for any task
praxia skill run prompt_designer "Have in-house legal score contract risk on a 5-point scale"

All Major LLMs

LiteLLM-powered single-line provider switching:

Provider Aliases Auth
Anthropic Claude claude / claude-sonnet / claude-haiku ANTHROPIC_API_KEY
OpenAI ChatGPT chatgpt / gpt-4o / o1 OPENAI_API_KEY
Google Gemini gemini / gemini-flash GEMINI_API_KEY
Google Gemma (open) gemma / gemma-2b / gemma-9b / gemma-27b (Ollama) ยท gemma-cloud (Vertex AI) (none for local) / Vertex auth
Alibaba Qwen (cloud) qwen / qwen-72b DASHSCOPE_API_KEY
DeepSeek deepseek (v3 chat) ยท deepseek-reasoner (R1) DEEPSEEK_API_KEY
Mistral mistral (large) ยท mistral-small ยท codestral MISTRAL_API_KEY
xAI Grok grok XAI_API_KEY
Llama (Groq fast) llama (3.3 70B Versatile via Groq) GROQ_API_KEY
Cohere command-r (Command R+) COHERE_API_KEY
Perplexity Sonar perplexity (web-search-augmented) PERPLEXITY_API_KEY
Microsoft Phi (local) phi (3.5 3.8B Ollama) (none โ€” runs in-house)
Qwen / Llama / Phi (local) qwen-local ยท llama-local ยท phi (Ollama) (none โ€” runs in-house)
LLM("claude")        # Anthropic Claude
LLM("deepseek")      # DeepSeek v3 โ€” strong + low cost
LLM("mistral")       # Mistral large โ€” EU-friendly
LLM("llama")         # Llama 3.3 70B via Groq (fast)
LLM("gemma")         # Google Gemma 9B via local Ollama
LLM("phi")           # Microsoft Phi 3.5 โ€” small / edge
LLM("qwen-local")    # Local Qwen via Ollama
LLM("openai/gpt-4o") # Any LiteLLM-compatible model string

File parsing โ€” PDF ยท Office ยท CSV ยท TXT ยท HTML ยท MD ยท code

Auto-dispatched by extension:

Extension Parser Optional dep
.txt .md .rst .py .ts .js TextParser (none)
.csv .tsv CsvParser (stdlib)
.json .yaml .yml StructuredParser (core)
.html .xml HtmlParser (stdlib)
.pdf PdfParser praxia[office]
.docx DocxParser praxia[office]
.pptx PptxParser praxia[office]
.xlsx .xlsm XlsxParser praxia[office]
from praxia.io.parsers import parse_file

doc = parse_file("contract.pdf")          # works
doc = parse_file("Q3_results.xlsx")       # also works
print(doc.content)

Third-party formats register via [project.entry-points."praxia.parsers"] โ€” no fork required.

Output exporters โ€” render skill output to HTML / PPTX / DOCX / MD / JSON

Skills produce Markdown by default. Convert to whatever the user requested:

from praxia.io.exporters import export_as
result = export_as(md_text, format="pptx", title="Q3 Review")
# result.bytes โ†’ write to disk, stream over HTTP, push via a connector

OutputFormatSkill infers the format from natural-language hints (English and Japanese):

from praxia.skills.output_format import OutputFormatSkill
fs = OutputFormatSkill()
fs.detect("ใƒฌใƒใƒผใƒˆใ‚’ใƒ‘ใƒฏใƒใง").format       # โ†’ "pptx"
fs.detect("as a Word document").format       # โ†’ "docx"
fs.deliver(md, user_request="HTML please")    # ExporterResult with .bytes

CLI shortcut:

praxia export report.md report.html
praxia export report.md slides.pptx --title "Q3 Review"

Custom formats register via the praxia.exporters entry-point โ€” same pattern as connectors.

Memory mode โ€” accumulate or read-only, per user

Some sessions shouldn't leave a trail (legal review, sensitive data exploration). Toggle per-user:

praxia memory mode --user-id alice read_only      # writes silently dropped
praxia memory mode --user-id alice accumulate     # back on
praxia memory show --user-id alice                # see the resolved config + reason

Admins can lock the mode for the whole tenant or for specific roles:

praxia admin memory-policy-set --default-mode read_only --mode-locked
praxia admin memory-policy-set --enforced-backend mem0 --allowed mem0,zep
praxia admin memory-policy-set --accumulate-locked-roles operator,admin

Resolution order: admin enforced > call-site argument > user pref > admin default. See praxia.memory.policy.

Multi-LTM fusion + dynamic routing (accuracy boost)

Each LTM has different strengths โ€” entity linking (Mem0), temporal KG (Zep), audit trail (JSON), vector recall (HindSight). You can run several at once and either fuse the results or pick per-query:

from praxia.memory.composite import CompositeBackend, WeightedBackend
from praxia.memory.router import RoutedBackend, RuleRouter

# A. Parallel fan-out + Reciprocal Rank Fusion
composite = CompositeBackend(
    backends=[WeightedBackend("mem0", ..., weight=1.5),
              WeightedBackend("zep", ..., weight=1.0),
              WeightedBackend("hindsight", ..., weight=1.0)],
    fusion="rrf",
)

# B. Query-aware dispatch (RuleRouter handles English + Japanese keywords)
routed = RoutedBackend(
    backends={"mem0": ..., "zep": ..., "hindsight": ..., "json": ...},
    router=RuleRouter(),
    write_to="mem0",
)

Full design + tradeoffs: docs/FEATURES.md ยง 5.1.

Voice input / output

from praxia.io.audio import STT, TTS

text = STT().transcribe(audio_bytes, filename="meeting.wav", language="ja")
audio = TTS().synthesize("Hello world", voice="alloy", format="mp3")

Both Streamlit UI tabs (Run Flow, Skill) include ๐ŸŽ™ Audio input and ๐Ÿ”Š Read response aloud toggles. Providers: OpenAI Whisper / TTS (default), ElevenLabs (premium voices), local Whisper / Piper (praxia[audio-local]).

6 Pluggable LTM Backends

Backend Notes
json (default) Zero-dependency, JSONL on disk, fully auditable
mem0 Entity linking + hybrid search (recommended for production)
langmem LangChain LangMem SDK
letta Letta shared blocks (with read-only policy support)
zep Zep / Graphiti for temporal KGs (Layer 5)
hindsight vectorize-io/hindsight โ€” agent memory store

Switch with one line:

PersonalMemory(user_id="alice", backend="mem0")

Built-in Authentication, RBAC, SSO & Resource Policies

  • API-key + JWT auth (praxia.auth) with 4 default roles (admin / operator / member / viewer)
  • SSO via OIDC: Google, Microsoft Entra ID, Okta, GitHub, Keycloak, custom OIDC, plus SAML skeleton
  • Resource access policies (ACL) โ€” glob-pattern allow/deny rules per resource (built for enterprise IS departments)
  • Append-only audit log โ€” every authn / authz / policy decision / privileged action recorded
  • Admin data exports โ€” CSV / JSON / JSONL dumps of audit, users, usage, memory, policies, shared blocks (chain-of-custody preserved)

Admin User Management

  • Create / read / update / delete users
  • Activate / deactivate, role grants, API-key rotation
  • All actions audited
  • Available via CLI, Streamlit UI, and SDK

Custom Prompts (per-user + admin-distributed)

  • Users save personal prompts; admins promote them to org or distribute to specific users / roles
  • Three scopes (personal / org / distributed) with merge precedence
  • Same model as the skill registry

Per-user OAuth for connectors

Each Praxia user can authorize external systems with their own credentials โ€” the external system's native ACL is enforced per-user.

# Set the OAuth app credentials once
export PRAXIA_OAUTH_BOX_CLIENT_ID=...
export PRAXIA_OAUTH_BOX_CLIENT_SECRET=...

# Each user authorizes individually (CLI loopback)
praxia oauth start box --user-id alice
# โ†’ opens authorization URL โ†’ user logs in โ†’ redirect captures code
# โ†’ token saved encrypted to .praxia/auth/oauth_tokens.jsonl

# From now on, alice's connector calls use her token
praxia connector pull box 0 --user-id alice
# alice can only see Box folders alice has access to

Supported providers: Box, Microsoft (SharePoint/OneDrive), Dropbox, Google Drive, Salesforce. Tokens auto-refresh; access logged in audit log.

Production HTTP callback (praxia serve): four endpoints under /api/v1/oauth/{provider}/: start, callback, status, revoke (DELETE). State cache is multi-worker-safe (PersistentStateStore โ€” TTL-pruned JSON), so the IdP redirect can land on any FastAPI worker. Set PRAXIA_PUBLIC_URL to pin the redirect URI.

KMS-backed token encryption (production)

OAuth tokens use envelope encryption: a fresh 256-bit data key per write, AES-GCM payload encryption, and the data key wrapped by a configurable KmsAdapter. The master key never lives on the application host:

Adapter Install Use
local (default) (none) dev / single-host
aws pip install 'praxia[kms-aws]' AWS KMS CMK
azure pip install 'praxia[kms-azure]' Azure Key Vault Keys
gcp pip install 'praxia[kms-gcp]' GCP Cloud KMS
vault pip install 'praxia[kms-vault]' HashiCorp Vault Transit
export PRAXIA_KMS_ADAPTER=aws
export PRAXIA_KMS_KEY_ID=arn:aws:kms:us-east-1:111122223333:key/...

Legacy v0.1 tokens decode transparently โ€” re-saving rewrites in the new envelope format.

A/B experiments โ€” prompts / skills / LLMs

Test variants of any payload (system prompt, LLM provider, memory backend) with deterministic per-user assignment + outcome tracking:

praxia experiment create proposal_v2 \
    --name "Proposal: shorter vs longer prompt" \
    --variants '{"control":{"prompt":"<800-word>"},"candidate":{"prompt":"<400-word>"}}' \
    --traffic-split "control=0.5,candidate=0.5"
praxia experiment start proposal_v2
# ...users run flows; outcomes recorded automatically...
praxia experiment results proposal_v2
# โ†’ ๐Ÿ† Tentative winner: candidate (confidence 0.41)

Same user always sees the same variant during the experiment (SHA-256 bucket). Audience filter (roles / users / time window). See praxia.experiments.

LLM output quality evaluation

Separate from the deterministic regression suite โ€” tests/llm_eval/ runs real LLM calls and grades output against rubrics + a committed baseline. CI flags PRs where quality drops > 5 points:

# Skipped by default (requires API keys + costs tokens)
pytest tests/llm_eval -m llm_eval -v

# Update baselines after a known-good change
pytest tests/llm_eval --update-baselines

# Compare providers on the same cases
pytest tests/llm_eval --llm-eval-model gpt-4o

Built-in rubrics: keyword match, structure (heading) match, length band, must-not-contain, LLM-as-judge. One canonical case per business skill ships out of the box.

External Connectors โ€” 20 systems, Pull + Push

Storage / Files (8)

Connector Pull Push Auth
Box โœ… folder โ†’ files โœ… upload OAuth2 / JWT
SharePoint / M365 โœ… โœ… Microsoft Entra
Dropbox โœ… โœ… OAuth2
Google Drive โœ… โœ… OAuth / SA
AWS S3 โœ… bucket/prefix โœ… object upload IAM (boto3 chain)
Azure Blob Storage โœ… โœ… DefaultAzureCredential / connstr / SAS
GCS โœ… โœ… ADC / service account
WebDAV / Nextcloud โœ… โœ… HTTP Basic

Knowledge / Docs (3)

Connector Pull Push Auth
Notion โœ… database query โœ… child page OAuth (Notion)
Confluence โœ… CQL search โœ… child page OAuth (Atlassian)
Jira โœ… JQL search โœ… create issue OAuth (Atlassian)

Communication (3)

Connector Pull Push Auth
Slack โœ… history / search โœ… post message OAuth (Slack)
Microsoft Teams โœ… channel messages โœ… post message OAuth (Microsoft)
Email (IMAP / Gmail / Outlook) โœ… folder + query โœ… send IMAP/SMTP / Google / Microsoft OAuth

CRM / Tickets / Engineering (5)

Connector Pull Push Auth
kintone โœ… โœ… API token
Salesforce โœ… SOQL โœ… sObject create OAuth
HubSpot CRM โœ… contacts/companies/deals โœ… note attach OAuth
Zendesk โœ… ticket search โœ… create ticket OAuth or API token
GitHub โœ… issues/code/files โœ… issue / comment OAuth (GitHub)
Linear โœ… issues by team โœ… create issue OAuth or API key

Pull data into agent flows; push agent outputs back to your system of record. All access subject to admin policies. Per-user OAuth means alice only sees what alice has access to in each system.

Dashboards

  • Personal: flow runs, skill invocations, memory entries, outcome success rate, token usage, top skills, recent episodes
  • Organizational: active users, total invocations, promoted/frozen/distributed counts, top users, top skills, audit event counts

๐Ÿ–ผ UI Tour

The bundled Streamlit UI gives non-technical users access to flows, skills, memory, dashboards, prompts, user / policy management, and connectors.

Tab Screenshot
๐ŸŽฌ Run Flow Run Flow
๐Ÿ›  Skill Business Skill
๐Ÿ“Š Dashboard Dashboard
๐Ÿ“ Prompts Custom Prompts
๐Ÿ‘ฅ Users User management
๐Ÿ”Œ Connectors External connectors
๐Ÿ›ก Policies Resource access policies
๐Ÿ’พ Admin Downloads Admin downloads

CLI users get the same functionality with rich-formatted output:

CLI terminal


๐Ÿš€ Quickstart

# 1. Install (pick the extras you actually need)
pip install praxia                              # Core
pip install "praxia[ui,connectors,office,audio]" # Common stack
pip install "praxia[all]"                       # Everything

# 2. Configure once โ€” all keys live in one place
praxia config init      # interactive walkthrough
praxia config show      # display resolved config (secrets masked)
praxia config path      # show key resolution order
# Or: cp .env.example .env  and edit

# 3. Initialize
praxia init --backend json --model auto

# 4. Run a flow (auto-parses .pdf / .docx / .xlsx / .pptx if attached)
praxia run sales --customer-name "Acme" --product "BizFlow"
praxia run logic --document spec.pdf
praxia run rag --question "What license is Praxia released under?"

# Run a business skill
praxia skill run investment "Mid-term thesis on a hypothetical mid-cap electronics issuer"
praxia skill run legal "Review the risk in this services agreement"

# Launch the UI (11 tabs incl. Dashboard / Policies / Admin / Connectors)
praxia ui --port 8501

# Personal โ†’ org memory distillation
praxia consolidate --dry-run
praxia freeze --block team_norms

# Dashboards
praxia dashboard --scope personal --user-id alice
praxia dashboard --scope org

# Admin: user management
praxia user create alice --role member
praxia user update alice --role operator --email alice@a.test
praxia user deactivate alice
praxia user delete alice --yes
praxia user audit --limit 100

# Admin: resource access policies (ACL โ€” for IS depts)
praxia policy add deny connector "box:/Confidential/*" \
    --principals "role:member,role:viewer" \
    --description "Lock Confidential folder to operators+"
praxia policy list
praxia policy test alice member connector box:/Confidential/q3.pdf read

# Admin: data exports (CSV / JSON / JSONL โ€” every export audit-logged)
praxia admin export-audit audit.csv --since-days 30
praxia admin export-users users.json --format json
praxia admin export-memory ./memory_backup --all
praxia admin export-policies policies.json

# External connectors (Pull / Push, subject to ACL)
praxia connector list
praxia connector pull box 0 --limit 20 --save-to ./box_pulled
praxia connector push salesforce Lead lead.json
praxia connector pull kintone "42?status='open'"

# Custom prompts (per-user + admin distribution)
praxia prompt create my_qualifier prompt_body.txt
praxia prompt list
praxia prompt distribute curated_prompt body.md --target-roles member

# Skill registry โ€” promotion and admin distribution
praxia skill promote --candidates
praxia skill distribute investment_analyst --target-roles member,operator

Minimal Python example:

from praxia import Praxia
from praxia.flows import SalesAgentFlow
from praxia.skills import InvestmentSkill

m = Praxia(user_id="alice", default_model="claude")

# Run a multi-agent flow
result = m.run(SalesAgentFlow, inputs={
    "customer_name": "Acme",
    "product": "BizFlow",
})

# Run a single business skill
print(InvestmentSkill().run("3-year investment thesis on Acme Mfg (TYO:0000)"))

# Personal memory accumulates automatically โ€” no explicit save needed.
# The nightly consolidator promotes effective patterns to org memory.
m.consolidate(dry_run=True)

Full guide: docs/quickstart.md.

Deploying it? Two paths โ€” fastest is praxia ui (full-stack); for "Praxia as a brain behind your own frontend" use the SDK or praxia serve (HTTP API). Setup recipes: docs/deployment-modes.md. Building a connector? Step-by-step recipe in docs/CUSTOM_CONNECTORS.md. The pattern is ~50 lines + an entry-point. Formal specs? Basic design / I/F / detailed design / functional spec (ๆฉŸ่ƒฝไป•ๆง˜ๆ›ธ) (EN + JA) under docs/specs/. Regression suite? 364 tests covering auth/memory/exporters/CLI/i18n/etc. โ€” see docs/EVALUATION.md. Multilingual? Landing page + Streamlit UI ship in 8 languages (en / ja / zh-CN / ko / es / fr / de / pt-BR) with browser-language auto-detection โ€” see docs/i18n.md. Contributing? PRs require a DCO sign-off (git commit -s โ€ฆ). Trademark policy + GDPR notes for operators are in docs/legal/. MCP for Claude Desktop / Cursor? Local stdio (praxia mcp serve) or remote HTTP+SSE (/api/v1/mcp after praxia serve). Every skill + flow becomes an MCP tool automatically. OAuth scopes for connectors? Per-provider scopes, app registration steps, least-privilege alternatives in docs/OAUTH_SCOPES.md. Mobile-friendly? Both the landing page and the Streamlit UI are responsive โ€” chip-style nav on phones, scrollable tabs, โ‰ฅ44px touch targets, compact mode toggle.


๐Ÿ“ Design Philosophy

1. Capture tacit knowledge with zero effort

No explicit CLAUDE.md-style writing. Mem0/LangMem/HindSight extract entities and preferences from ordinary conversations.

2. Promote only what's effective, automatically

Three independent verdicts run in parallel. The framework auto-promotes only when consensus is high; medium-confidence items go to a review queue.

3. Separate "frozen" from "living" knowledge

  • Living layer (shared blocks): updated instantly, all agents see it
  • Frozen layer (Markdown + git): only PR-reviewed, stable best practices

This keeps both freshness and trust intact.

4. Use Graph storage only where relationships are the value

Mem0 OSS removed graph_store support in April 2026. We follow that signal: vector + entity linking is the default; graphs apply only to decision histories, customer 360, and incident causal chains.

5. Vendor lock-in is a non-goal

  • LiteLLM lets any provider work
  • LTM backends are pluggable โ€” and you can run several at once via CompositeBackend / RoutedBackend for higher recall without picking a winner
  • Markdown + git is the persistence layer of last resort
  • Apache 2.0 license, evolving toward an open-core model

6. Ship "evidence" alongside the framework

Hallucination detection (praxia.eval.hallucination) and retrieval metrics (praxia.eval.metrics) are first-class. Customers don't have to take "it works" on faith.

For more, see docs/design-philosophy.md.


๐Ÿ“Š Use Cases by Industry

Detailed Before/After tables for each domain are in docs/use-cases.md. Highlights:

Industry Representative use case Headline impact
Investment Seed-stage VC due diligence 4โ€“6h โ†’ 45โ€“60 min per deck
Sales Pre-meeting research + storyboard Proposal-acceptance rate +15โ€“20pt
Engineering Design Requirements doc review Senior architect time freed: week 16h โ†’ 4h
Procurement RFQ TCO comparison Hidden costs found: +30% vs initial quote
Patent Prior-art search + novelty assessment External patent-attorney fees โˆ’50โ€“70%
Legal M&A contract review External law-firm costs halved (~$100k/deal)

3-year compounding effects: New-hire ramp 6โ€“12mo โ†’ 2โ€“3mo / Veteran-departure knowledge loss โ†’ zero / Cross-team best-practice diffusion 30+ items/month.


๐Ÿ†š When to pick what

Praxia is opinionated for organizations that want **OSS + workflow templates

  • auto personal-to-org memory cycling + integrated auth/ACL/audit** all in one library. Adjacent tools have different goals โ€” pick the one that fits your need:
  • LangGraph โ€” generic agent graph builder, fine-grained state machines, deep LangChain integration
  • CrewAI โ€” lightweight role-based crew abstraction
  • AutoGen โ€” research-grade conversational multi-agents from Microsoft Research
  • Glean โ€” hosted enterprise knowledge platform (no operational burden, commercial)
  • Mem0 / LangMem / Letta / Zep / HindSight โ€” memory backends (Praxia uses them as plug-in backends, you can run several at once)

These are not mutually exclusive โ€” Praxia uses Mem0 as a backend and can be embedded inside a LangGraph node.

For a feature-level matrix, see docs/COMPARISON.md (verifiable against each project's public documentation; corrections welcome via Issues).


๐Ÿ—บ Roadmap

Phase Scope Status
Phase 1 Personal memory + 3 specialized flows + 6 business skills โœ… Done
Phase 2 Sleep-time consolidator + statistical (outcome-correlated) promotion โœ… Done
Phase 3 Shared blocks + Markdown freeze workflow + CLI โœ… Done
Phase 4 Skill registry promotion (personal โ†’ org) โœ… Done
Phase 5 Auth + RBAC + SSO + audit log + admin user CRUD โœ… Done
Phase 5+ Resource access policies (ACL) + admin data exports + custom prompts + 6 connectors + dashboards โœ… Done
Phase 6 Multi-tenant SaaS, advanced GUI, vertical editions ๐Ÿšง Commercial

๐Ÿค Contributing

We're building a community-driven library of industry recipes. Three primary contribution paths:

  1. New workflow flows (praxia/flows/)
  2. New business skills (praxia/skills/business/)
  3. Industry recipes (docs/recipes/)

See CONTRIBUTING.md.


๐Ÿ“œ License

Apache License 2.0 โ€” commercial use, modification, and redistribution permitted.

Copyright holder: GenArch and Praxia Contributors.

Third-party dependencies retain their own licenses; see NOTICE.md for the full attribution list.

Trademarks: All product and company names referenced in Praxia documentation (Claude, ChatGPT, Gemini, Qwen, Box, SharePoint, Dropbox, Google Drive, kintone, Salesforce, Mem0, Letta, LangChain, CrewAI, Glean, etc.) are trademarks or registered trademarks of their respective owners. Praxia is not affiliated with, sponsored by, or endorsed by any of these companies โ€” references are descriptive (nominative fair use) only. See NOTICE.md ยง Trademark notice for the full list.

Demo data: Company names in code examples (e.g., "Acme Manufacturing", "AcmeAuto Inc.") are fictional and for illustration only. Built-in skills include guardrails reminding users that final professional advice (investment, legal, patent, etc.) requires a qualified professional.

We may evolve toward an open-core model: enterprise GUI / advanced audit features under a separate license, while the framework remains Apache 2.0.


๐Ÿšข Deployment modes

Praxia ships in two halves you can mix:

Mode What you run When to choose it
A. Full-stack praxia ui (Streamlit) + Praxia core, one process Internal team, fastest path
B-1. Embedded SDK Your Python service import praxia You already have a Python backend
B-2. HTTP service praxia serve (FastAPI) + your own frontend Non-Python frontend, mobile, or CDN-cached UI

Both modes share the same auth, memory, and skills โ€” only the frontend differs. Step-by-step setup, production checklist, and migration path: docs/deployment-modes.md / ๆ—ฅๆœฌ่ชž็‰ˆ.

# Full-stack
praxia ui --port 8501

# Backend-only HTTP API (8 endpoints under /api/v1)
pip install "praxia[server]"
praxia serve --host 0.0.0.0 --port 8000 --cors-origin https://your-frontend.example

๐Ÿ“ Design specs (formal documents)

For procurement / architecture review / extension work, formal design specs are available in EN + JA:

Document English ๆ—ฅๆœฌ่ชž
Basic design (ๅŸบๆœฌ่จญ่จˆไป•ๆง˜ๆ›ธ) basic-design.en.md basic-design.ja.md
Interface spec (I/F ไป•ๆง˜ๆ›ธ) interface-spec.en.md interface-spec.ja.md
Detailed design (่ฉณ็ดฐ่จญ่จˆไป•ๆง˜ๆ›ธ) detailed-design.en.md detailed-design.ja.md

๐Ÿ›  Extending Praxia

Praxia uses a single extensibility primitive (praxia.extensions.Registry) for every plugin point โ€” connectors, memory backends, skills, flows, file parsers, output exporters, OAuth providers. Adding a plugin does not require editing any core file.

Plugin type Base Registry Entry-point group Lines
Connector Connector protocol CONNECTORS praxia.connectors ~50
Memory backend MemoryBackend protocol BACKENDS praxia.memory_backends ~80
File parser Parser protocol PARSERS praxia.parsers ~30
Output exporter Exporter protocol EXPORTERS praxia.exporters ~40
OAuth provider OAuthProviderConfig (instance) praxia.oauth_providers ~10
KMS adapter KmsAdapter protocol KMS_ADAPTERS praxia.kms_adapters ~30
Business skill Skill SKILLS praxia.skills ~20
Multi-agent flow Flow FLOWS praxia.flows ~30
Industry recipe Markdown n/a โ€” n/a

Custom connector tutorial (end-to-end Notion example): docs/CUSTOM_CONNECTORS.md / ๆ—ฅๆœฌ่ชž็‰ˆ.

Two ways to register:

# (a) Decorator (in-tree contributions)
from praxia.connectors.registry import CONNECTORS

@CONNECTORS.register_decorator("notion")
class NotionConnector: ...
# (b) Entry-point (third-party packages โ€” no fork needed)
[project.entry-points."praxia.connectors"]
notion = "praxia_connector_notion:NotionConnector"

After pip install praxia-connector-notion, the new connector shows up automatically in praxia connector list, the Streamlit UI, and the SDK โ€” with no edit to Praxia itself.

Full guide with examples for all 4 plugin types: docs/PLUGINS.md.


๐Ÿ“ˆ ROI estimate (100-knowledge-worker mid-cap)

Variable Year 1 Year 2
Workers in scope (N) 100 100
Loaded cost / FTE (C) $90k $90k
Routine work share (t) 40% 40%
Time savings (s) 35% 60%
Quality lift (Q) $65k $200k
Praxia cost (P) $80k $80k
Net benefit $1.25M $2.30M

3-year cumulative net โ‰ˆ $5.2M. Even halving every parameter still produces > 10ร— ROI.

Full model + worked examples: docs/FEATURES.md#roi-projection-model.


๐Ÿ“š Acknowledgements & Inspirations

  • Mem0 โ€” personal memory layer
  • Letta โ€” shared memory blocks concept
  • LangMem โ€” long-term memory SDK
  • LiteLLM โ€” unified provider abstraction
  • Claude Skills โ€” skills registry conventions
  • Model Context Protocol โ€” tool/skill interop
  • HindSight โ€” Experience / Entity Summary / Belief model

Theoretical groundwork:

  • LinkedIn Cognitive Memory Agent (Episodic + Semantic + Procedural)
  • Mem0 paper (arXiv:2504.19413)
  • Letta sleep-time agents

Mission: Bridge "individual brilliance" and "organizational continuity" with AI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

praxia-0.1.0a0.tar.gz (650.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

praxia-0.1.0a0-py3-none-any.whl (289.7 kB view details)

Uploaded Python 3

File details

Details for the file praxia-0.1.0a0.tar.gz.

File metadata

  • Download URL: praxia-0.1.0a0.tar.gz
  • Upload date:
  • Size: 650.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for praxia-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 1c21092ef095206a2c659194f3ca48d001a5b26af6c9b2db1c35402635d5b5b0
MD5 6ac274ce1de163d8290da3b06acf25b6
BLAKE2b-256 f6536412192d71d32f6b10aecd9ff1286f4f3dd6d45768318a7ae6f8b70d5b34

See more details on using hashes here.

File details

Details for the file praxia-0.1.0a0-py3-none-any.whl.

File metadata

  • Download URL: praxia-0.1.0a0-py3-none-any.whl
  • Upload date:
  • Size: 289.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for praxia-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc1ee24babf4e280b1d3a59d50798c7e565450e33beae363ef11747a67917bbb
MD5 9222d2ea3bdf69d601ca116a8a59c940
BLAKE2b-256 b71d9fc30e83c84eb181eb03f0ba532c849b37d8d183caf649aa7ec4902b16fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page