Skip to main content

Conversational investor onboarding chatbot — PDF form-filling via LLM extraction

Project description

chatbot Module

Conversational investor onboarding chatbot — collects investor data through natural language and fills PDF subscription forms.

🚀 Quick Start

1. Configure

cd modules/chatbot
cp .env.example .env
nano .env          # add OPENAI_API_KEY at minimum

See SETUP_GUIDE.md for full configuration.

2. Install Dependencies

pip install -r requirements.txt
pip install -r requirements-api.txt

3. Run

# API server (recommended)
python api_server.py

# Interactive CLI
python -m entrypoints.local

# Or interactively via CLI tool
python -m entrypoints.cli

Server: http://localhost:8001
API Docs: http://localhost:8001/docs


📁 Structure

modules/chatbot/
├── .env.example              ← Copy to .env — add your API keys
├── api_server.py             ← FastAPI server (run this!)
├── requirements.txt          ← Core dependencies
├── requirements-api.txt      ← FastAPI + uvicorn
├── requirements-mapper.txt   ← PDF mapper connector (optional)
├── requirements-s3.txt       ← AWS S3 storage (optional)
├── requirements-full.txt     ← Everything (used by Docker)
├── pyproject.toml            ← Package metadata
├── Dockerfile                ← Container build
├── SETUP_GUIDE.md            ← Detailed setup
├── API_SERVER.md             ← API endpoint reference
├── entrypoints/
│   ├── local.py              ← Interactive CLI / Python-callable
│   ├── cli.py                ← Command-line interface
│   ├── fastapi_app.py        ← Bare FastAPI app (no /chatbot prefix)
│   └── aws_lambda.py         ← AWS Lambda handler
├── src/chatbot/              ← Core SDK source
│   ├── client.py             ← chatbotClient — main entry point
│   ├── config/               ← Settings, FormConfig
│   ├── core/                 ← Engine, router, session, states (13-state machine)
│   ├── extraction/           ← LLM extractor, fallback, prompt builder
│   ├── handlers/             ← One handler per conversation state
│   ├── limits/               ← Rate limiter
│   ├── logging/              ← Debug logger
│   ├── managed/              ← Stub for private managed PDF service
│   ├── pdf/                  ← PDFFillerInterface, MapperPDFFiller, workflow
│   ├── storage/              ← LocalStorage, S3Storage, StorageBackend
│   ├── telemetry/            ← Opt-in telemetry collector
│   ├── utils/                ← Field utils, address utils, intent detection
│   └── validation/           ← Field + phone validators
├── config_samples/           ← Form config JSON files (10 investor types)
├── tests/
│   ├── conftest.py           ← Shared fixtures
│   ├── unit/                 ← Fast, no I/O tests
│   └── integration/          ← Full-stack tests (TestClient)
└── data/
    ├── input/                ← Place blank PDFs here
    ├── output/               ← Filled PDFs and session data written here
    └── cache/                ← Optional: session cache

🎯 What This Module Does

A 13-state conversation engine that:

  1. Greets the investor and checks for existing saved data
  2. Asks the investor to select their type (Individual, Corporation, LLC, Trust, etc.)
  3. Collects all mandatory fields through natural conversation using GPT-4o-mini extraction
  4. Validates fields (email, phone format, boolean checks)
  5. Handles address copy (mailing = registered), boolean groups, sequential fill for stubborn fields
  6. Fills the blank PDF via the mapper module (optional)
  7. Completes the session and saves structured JSON output

Conversation states

State Description
INIT Greeting, check for existing profile
UPDATE_EXISTING_PROMPT Offer to pre-fill from previous session
INVESTOR_TYPE_SELECT Choose from 10 investor types
DATA_COLLECTION Main loop — LLM extraction per turn
MISSING_FIELDS_PROMPT Re-ask skipped mandatory fields
BOOLEAN_GROUP_SELECT Handle yes/no checkbox groups
SEQUENTIAL_FILL One field at a time for stubborn fields
MAILING_ADDRESS_CHECK Is mailing same as registered?
CONTINUE_PROMPT Mid-session checkpoint
OPTIONAL_FIELDS_PROMPT Offer non-mandatory fields
ANOTHER_INFO_PROMPT Any corrections before submit?
CONFIRM_AND_SUBMIT Final confirmation
COMPLETE Session done, outputs saved

🔌 PDF Filling

Three modes controlled by chatbot_PDF_FILLER env var:

Mode Description
none (default) Data-only — no PDF filling
mapper Connect to the mapper module (modules/mapper/) via its API
managed Private Auth0+Lambda service (requires chatbot-managed package)

For mapper mode, start the mapper API server first:

cd ../mapper
python api_server.py     # runs on port 8000

# Then in modules/chatbot:
chatbot_PDF_FILLER=mapper
MAPPER_API_URL=http://localhost:8000
MAPPER_URL_PREFIX=/mapper

🌐 API Endpoints

Endpoint Method Description
/ GET API info
/health GET Health check
/chatbot/chat POST Send a message
/chatbot/session/{user_id}/{session_id} GET Get completed session data
/chatbot/session/{user_id}/{session_id}/fill-report GET Fill statistics report
/chatbot/session/{user_id}/{session_id} DELETE Delete session

See API_SERVER.md for full request/response schemas.


📦 Using as a Python Library

from src.chatbot import chatbotClient, LocalStorage, FormConfig

client = chatbotClient(
    openai_api_key="sk-...",
    storage=LocalStorage("./chatbot_data", "./config_samples"),
    form_config=FormConfig.from_directory("./config_samples"),
    pdf_filler=None,
)

# Send messages
response, complete, data = client.send_message(
    user_id="investor_123",
    session_id="session_abc",
    message="",
)
print(response)   # → "Hi! I am here to help you fill out..."

🧪 Testing

# Run all tests
pytest

# Unit tests only (fast, no network)
pytest tests/unit/ -v

# Integration tests
pytest tests/integration/ -v

# With coverage
pytest --cov=src/chatbot --cov-report=term-missing

# A specific test
pytest tests/unit/test_rate_limiter.py -v

🐳 Docker

docker build -t chatbot-module .
docker run -p 8001:8001 --env-file .env chatbot-module

🔗 Integration with mapper module

rv1 repo/
├── modules/
│   ├── mapper/          ← PDF extraction + mapping + filling engine
│   │   └── api_server.py  runs on :8000
│   └── chatbot/         ← This module
│       └── api_server.py  runs on :8001
│           └── MAPPER_API_URL=http://localhost:8000

📚 Documentation


Quick Command Reference

# Setup
cp .env.example .env && nano .env
pip install -r requirements.txt requirements-api.txt

# Run
python api_server.py

# Test
curl http://localhost:8001/health
pytest tests/unit/

# Interactive CLI
python -m entrypoints.local

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_autofillr_chatbot-0.1.1.tar.gz (529.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_autofillr_chatbot-0.1.1-py3-none-any.whl (114.1 kB view details)

Uploaded Python 3

File details

Details for the file pdf_autofillr_chatbot-0.1.1.tar.gz.

File metadata

  • Download URL: pdf_autofillr_chatbot-0.1.1.tar.gz
  • Upload date:
  • Size: 529.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pdf_autofillr_chatbot-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2dabc3a577ae26e402ba38e022ee5fa135a34fd25e59212a25a0a18057a3ca4e
MD5 b915b611145962ed20bfeb8336f54fa1
BLAKE2b-256 8110f3378f0e7c54bdb56081434e86b602c24a88c755ab51741ddaf7b66f2090

See more details on using hashes here.

File details

Details for the file pdf_autofillr_chatbot-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_autofillr_chatbot-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf1a0d997b642df208e4ff0b1647024707d016e87201a746fcb94e9090bf390f
MD5 2ef26bcf4a5dfb0fc68cddb228f8d463
BLAKE2b-256 eafd7fe719a8842f7c2245e4c0c1b0c56a49cbef5e54e2b3d4319b206b1e2cc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page