Skip to main content

Conversational investor onboarding chatbot — PDF form-filling via LLM extraction

Project description

chatbot Module

Conversational investor onboarding chatbot — collects investor data through natural language and fills PDF subscription forms.

🚀 Quick Start

1. Configure

cd modules/chatbot
cp .env.example .env
nano .env          # add OPENAI_API_KEY at minimum

See SETUP_GUIDE.md for full configuration.

2. Install Dependencies

pip install -r requirements.txt
pip install -r requirements-api.txt

3. Run

# API server (recommended)
python api_server.py

# Interactive CLI
python -m entrypoints.local

# Or interactively via CLI tool
python -m entrypoints.cli

Server: http://localhost:8001
API Docs: http://localhost:8001/docs


📁 Structure

modules/chatbot/
├── .env.example              ← Copy to .env — add your API keys
├── api_server.py             ← FastAPI server (run this!)
├── requirements.txt          ← Core dependencies
├── requirements-api.txt      ← FastAPI + uvicorn
├── requirements-mapper.txt   ← PDF mapper connector (optional)
├── requirements-s3.txt       ← AWS S3 storage (optional)
├── requirements-full.txt     ← Everything (used by Docker)
├── pyproject.toml            ← Package metadata
├── Dockerfile                ← Container build
├── SETUP_GUIDE.md            ← Detailed setup
├── API_SERVER.md             ← API endpoint reference
├── entrypoints/
│   ├── local.py              ← Interactive CLI / Python-callable
│   ├── cli.py                ← Command-line interface
│   ├── fastapi_app.py        ← Bare FastAPI app (no /chatbot prefix)
│   └── aws_lambda.py         ← AWS Lambda handler
├── src/chatbot/              ← Core SDK source
│   ├── client.py             ← chatbotClient — main entry point
│   ├── config/               ← Settings, FormConfig
│   ├── core/                 ← Engine, router, session, states (13-state machine)
│   ├── extraction/           ← LLM extractor, fallback, prompt builder
│   ├── handlers/             ← One handler per conversation state
│   ├── limits/               ← Rate limiter
│   ├── logging/              ← Debug logger
│   ├── managed/              ← Stub for private managed PDF service
│   ├── pdf/                  ← PDFFillerInterface, MapperPDFFiller, workflow
│   ├── storage/              ← LocalStorage, S3Storage, StorageBackend
│   ├── telemetry/            ← Opt-in telemetry collector
│   ├── utils/                ← Field utils, address utils, intent detection
│   └── validation/           ← Field + phone validators
├── config_samples/           ← Form config JSON files (10 investor types)
├── tests/
│   ├── conftest.py           ← Shared fixtures
│   ├── unit/                 ← Fast, no I/O tests
│   └── integration/          ← Full-stack tests (TestClient)
└── data/
    ├── input/                ← Place blank PDFs here
    ├── output/               ← Filled PDFs and session data written here
    └── cache/                ← Optional: session cache

🎯 What This Module Does

A 13-state conversation engine that:

  1. Greets the investor and checks for existing saved data
  2. Asks the investor to select their type (Individual, Corporation, LLC, Trust, etc.)
  3. Collects all mandatory fields through natural conversation using GPT-4o-mini extraction
  4. Validates fields (email, phone format, boolean checks)
  5. Handles address copy (mailing = registered), boolean groups, sequential fill for stubborn fields
  6. Fills the blank PDF via the mapper module (optional)
  7. Completes the session and saves structured JSON output

Conversation states

State Description
INIT Greeting, check for existing profile
UPDATE_EXISTING_PROMPT Offer to pre-fill from previous session
INVESTOR_TYPE_SELECT Choose from 10 investor types
DATA_COLLECTION Main loop — LLM extraction per turn
MISSING_FIELDS_PROMPT Re-ask skipped mandatory fields
BOOLEAN_GROUP_SELECT Handle yes/no checkbox groups
SEQUENTIAL_FILL One field at a time for stubborn fields
MAILING_ADDRESS_CHECK Is mailing same as registered?
CONTINUE_PROMPT Mid-session checkpoint
OPTIONAL_FIELDS_PROMPT Offer non-mandatory fields
ANOTHER_INFO_PROMPT Any corrections before submit?
CONFIRM_AND_SUBMIT Final confirmation
COMPLETE Session done, outputs saved

🔌 PDF Filling

Three modes controlled by chatbot_PDF_FILLER env var:

Mode Description
none (default) Data-only — no PDF filling
mapper Connect to the mapper module (modules/mapper/) via its API
managed Private Auth0+Lambda service (requires chatbot-managed package)

For mapper mode, start the mapper API server first:

cd ../mapper
python api_server.py     # runs on port 8000

# Then in modules/chatbot:
chatbot_PDF_FILLER=mapper
MAPPER_API_URL=http://localhost:8000
MAPPER_URL_PREFIX=/mapper

🌐 API Endpoints

Endpoint Method Description
/ GET API info
/health GET Health check
/chatbot/chat POST Send a message
/chatbot/session/{user_id}/{session_id} GET Get completed session data
/chatbot/session/{user_id}/{session_id}/fill-report GET Fill statistics report
/chatbot/session/{user_id}/{session_id} DELETE Delete session

See API_SERVER.md for full request/response schemas.


📦 Using as a Python Library

from src.chatbot import chatbotClient, LocalStorage, FormConfig

client = chatbotClient(
    openai_api_key="sk-...",
    storage=LocalStorage("./chatbot_data", "./config_samples"),
    form_config=FormConfig.from_directory("./config_samples"),
    pdf_filler=None,
)

# Send messages
response, complete, data = client.send_message(
    user_id="investor_123",
    session_id="session_abc",
    message="",
)
print(response)   # → "Hi! I am here to help you fill out..."

🧪 Testing

# Run all tests
pytest

# Unit tests only (fast, no network)
pytest tests/unit/ -v

# Integration tests
pytest tests/integration/ -v

# With coverage
pytest --cov=src/chatbot --cov-report=term-missing

# A specific test
pytest tests/unit/test_rate_limiter.py -v

🐳 Docker

docker build -t chatbot-module .
docker run -p 8001:8001 --env-file .env chatbot-module

🔗 Integration with mapper module

rv1 repo/
├── modules/
│   ├── mapper/          ← PDF extraction + mapping + filling engine
│   │   └── api_server.py  runs on :8000
│   └── chatbot/         ← This module
│       └── api_server.py  runs on :8001
│           └── MAPPER_API_URL=http://localhost:8000

📚 Documentation


Quick Command Reference

# Setup
cp .env.example .env && nano .env
pip install -r requirements.txt requirements-api.txt

# Run
python api_server.py

# Test
curl http://localhost:8001/health
pytest tests/unit/

# Interactive CLI
python -m entrypoints.local

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_autofillr_chatbot-0.1.0.tar.gz (212.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_autofillr_chatbot-0.1.0-py3-none-any.whl (100.4 kB view details)

Uploaded Python 3

File details

Details for the file pdf_autofillr_chatbot-0.1.0.tar.gz.

File metadata

  • Download URL: pdf_autofillr_chatbot-0.1.0.tar.gz
  • Upload date:
  • Size: 212.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pdf_autofillr_chatbot-0.1.0.tar.gz
Algorithm Hash digest
SHA256 59d7b75ea97eaa46f93c6482cbd9ed362137bdd13da575c515dbedbd2f1f625b
MD5 96dbbbd3acca9932750009110b8f9202
BLAKE2b-256 50da1d88bd160f14d7610d1e324c4d12c5481320a7e3617905fcde47140294cb

See more details on using hashes here.

File details

Details for the file pdf_autofillr_chatbot-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_autofillr_chatbot-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b60923e0b1b4c0ea52b62e781b05138ead6f341573776444360b2ad5614aec4f
MD5 8fe52ad9258198a85623be66f0a04ff3
BLAKE2b-256 5fd995be3a025cee115a92480a6fddf683c38b6da55bad0d76af14f9dc3b1bed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page