Conversational investor onboarding chatbot — PDF form-filling via LLM extraction
Project description
chatbot Module
Conversational investor onboarding chatbot — collects investor data through natural language and fills PDF subscription forms.
🚀 Quick Start
1. Configure
cd modules/chatbot
cp .env.example .env
nano .env # add OPENAI_API_KEY at minimum
See SETUP_GUIDE.md for full configuration.
2. Install Dependencies
pip install -r requirements.txt
pip install -r requirements-api.txt
3. Run
# API server (recommended)
python api_server.py
# Interactive CLI
python -m entrypoints.local
# Or interactively via CLI tool
python -m entrypoints.cli
Server: http://localhost:8001
API Docs: http://localhost:8001/docs
📁 Structure
modules/chatbot/
├── .env.example ← Copy to .env — add your API keys
├── api_server.py ← FastAPI server (run this!)
├── requirements.txt ← Core dependencies
├── requirements-api.txt ← FastAPI + uvicorn
├── requirements-mapper.txt ← PDF mapper connector (optional)
├── requirements-s3.txt ← AWS S3 storage (optional)
├── requirements-full.txt ← Everything (used by Docker)
├── pyproject.toml ← Package metadata
├── Dockerfile ← Container build
├── SETUP_GUIDE.md ← Detailed setup
├── API_SERVER.md ← API endpoint reference
├── entrypoints/
│ ├── local.py ← Interactive CLI / Python-callable
│ ├── cli.py ← Command-line interface
│ ├── fastapi_app.py ← Bare FastAPI app (no /chatbot prefix)
│ └── aws_lambda.py ← AWS Lambda handler
├── src/chatbot/ ← Core SDK source
│ ├── client.py ← chatbotClient — main entry point
│ ├── config/ ← Settings, FormConfig
│ ├── core/ ← Engine, router, session, states (13-state machine)
│ ├── extraction/ ← LLM extractor, fallback, prompt builder
│ ├── handlers/ ← One handler per conversation state
│ ├── limits/ ← Rate limiter
│ ├── logging/ ← Debug logger
│ ├── managed/ ← Stub for private managed PDF service
│ ├── pdf/ ← PDFFillerInterface, MapperPDFFiller, workflow
│ ├── storage/ ← LocalStorage, S3Storage, StorageBackend
│ ├── telemetry/ ← Opt-in telemetry collector
│ ├── utils/ ← Field utils, address utils, intent detection
│ └── validation/ ← Field + phone validators
├── config_samples/ ← Form config JSON files (10 investor types)
├── tests/
│ ├── conftest.py ← Shared fixtures
│ ├── unit/ ← Fast, no I/O tests
│ └── integration/ ← Full-stack tests (TestClient)
└── data/
├── input/ ← Place blank PDFs here
├── output/ ← Filled PDFs and session data written here
└── cache/ ← Optional: session cache
🎯 What This Module Does
A 13-state conversation engine that:
- Greets the investor and checks for existing saved data
- Asks the investor to select their type (Individual, Corporation, LLC, Trust, etc.)
- Collects all mandatory fields through natural conversation using GPT-4o-mini extraction
- Validates fields (email, phone format, boolean checks)
- Handles address copy (mailing = registered), boolean groups, sequential fill for stubborn fields
- Fills the blank PDF via the mapper module (optional)
- Completes the session and saves structured JSON output
Conversation states
| State | Description |
|---|---|
INIT |
Greeting, check for existing profile |
UPDATE_EXISTING_PROMPT |
Offer to pre-fill from previous session |
INVESTOR_TYPE_SELECT |
Choose from 10 investor types |
DATA_COLLECTION |
Main loop — LLM extraction per turn |
MISSING_FIELDS_PROMPT |
Re-ask skipped mandatory fields |
BOOLEAN_GROUP_SELECT |
Handle yes/no checkbox groups |
SEQUENTIAL_FILL |
One field at a time for stubborn fields |
MAILING_ADDRESS_CHECK |
Is mailing same as registered? |
CONTINUE_PROMPT |
Mid-session checkpoint |
OPTIONAL_FIELDS_PROMPT |
Offer non-mandatory fields |
ANOTHER_INFO_PROMPT |
Any corrections before submit? |
CONFIRM_AND_SUBMIT |
Final confirmation |
COMPLETE |
Session done, outputs saved |
🔌 PDF Filling
Three modes controlled by chatbot_PDF_FILLER env var:
| Mode | Description |
|---|---|
none (default) |
Data-only — no PDF filling |
mapper |
Connect to the mapper module (modules/mapper/) via its API |
managed |
Private Auth0+Lambda service (requires chatbot-managed package) |
For mapper mode, start the mapper API server first:
cd ../mapper
python api_server.py # runs on port 8000
# Then in modules/chatbot:
chatbot_PDF_FILLER=mapper
MAPPER_API_URL=http://localhost:8000
MAPPER_URL_PREFIX=/mapper
🌐 API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | API info |
/health |
GET | Health check |
/chatbot/chat |
POST | Send a message |
/chatbot/session/{user_id}/{session_id} |
GET | Get completed session data |
/chatbot/session/{user_id}/{session_id}/fill-report |
GET | Fill statistics report |
/chatbot/session/{user_id}/{session_id} |
DELETE | Delete session |
See API_SERVER.md for full request/response schemas.
📦 Using as a Python Library
from src.chatbot import chatbotClient, LocalStorage, FormConfig
client = chatbotClient(
openai_api_key="sk-...",
storage=LocalStorage("./chatbot_data", "./config_samples"),
form_config=FormConfig.from_directory("./config_samples"),
pdf_filler=None,
)
# Send messages
response, complete, data = client.send_message(
user_id="investor_123",
session_id="session_abc",
message="",
)
print(response) # → "Hi! I am here to help you fill out..."
🧪 Testing
# Run all tests
pytest
# Unit tests only (fast, no network)
pytest tests/unit/ -v
# Integration tests
pytest tests/integration/ -v
# With coverage
pytest --cov=src/chatbot --cov-report=term-missing
# A specific test
pytest tests/unit/test_rate_limiter.py -v
🐳 Docker
docker build -t chatbot-module .
docker run -p 8001:8001 --env-file .env chatbot-module
🔗 Integration with mapper module
rv1 repo/
├── modules/
│ ├── mapper/ ← PDF extraction + mapping + filling engine
│ │ └── api_server.py runs on :8000
│ └── chatbot/ ← This module
│ └── api_server.py runs on :8001
│ └── MAPPER_API_URL=http://localhost:8000
📚 Documentation
- SETUP_GUIDE.md — Configuration reference
- API_SERVER.md — API endpoint documentation
- config_samples/README.md — Form config format
Quick Command Reference
# Setup
cp .env.example .env && nano .env
pip install -r requirements.txt requirements-api.txt
# Run
python api_server.py
# Test
curl http://localhost:8001/health
pytest tests/unit/
# Interactive CLI
python -m entrypoints.local
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_autofillr_chatbot-0.2.6.tar.gz.
File metadata
- Download URL: pdf_autofillr_chatbot-0.2.6.tar.gz
- Upload date:
- Size: 105.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebe2f45d37961a6cdab0d6b7f29e25065e6cf128578ca8be5cbce83b9cbe3caa
|
|
| MD5 |
379ef0cb14fa5c3b4dd5dbe65efd1c4f
|
|
| BLAKE2b-256 |
7be9d9e6ec87f5bbecb433e655c6f89f560e089687d773a010d33a8b83068781
|
File details
Details for the file pdf_autofillr_chatbot-0.2.6-py3-none-any.whl.
File metadata
- Download URL: pdf_autofillr_chatbot-0.2.6-py3-none-any.whl
- Upload date:
- Size: 114.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa91b2b21bae905aee84a5de64a2bfbe0ca2d7ec662b7f2f4960d1e545c9d69e
|
|
| MD5 |
771999320aecbb0b315906303073daed
|
|
| BLAKE2b-256 |
76717ef69404bc8381f271016318347d5cbec9f79e2cc5da15847f7da6e9af7d
|