Skip to main content

A pipeline and package to implement and evaluate LLM chat bot tutors in education.

Project description

logo


๐Ÿš€ Overview

This package offers a framework for researchers to map and quantify interactions between students and LLM-based tutors in educational settings. It supports structured, objective evaluation through classification, simulation, and visualization tools, and is designed for flexible use across tasks of any scale. The framework accommodates both researchers analyzing pre-collected, annotated data and those starting from scratch, providing modular support through each step of the evaluation process.

The package is designed to:

  • Provide a customized framework for classification, evaluation, and fine-tuning
  • Simulate studentโ€“tutor interactions using role-based prompts and seed messages when real data is unavailable
  • Initiate an interface with locally hosted, open-source models (e.g., via LM Studio or Hugging Face)
  • Log interactions in structured formats (JSON/CSV) for downstream analysis
  • Train and applu classifiers to predict customized interaction classes and visualize patterns across conversations

Overview of the system architecture:

flowchart


โš™๏ธ Installation

pip install educhateval

๐Ÿค— Integration

Note that the framework and dialogue generation is integrated with LM Studio, and the wrapper and classifiers with Hugging Face.

๐Ÿ“– Documentation

Documentation Description
๐Ÿ“š User Guide Instructions on how to run the entire pipeline provided in the package
๐Ÿ’ก Prompt Templates Overview of system prompts, role behaviors, and instructional strategies
๐Ÿง  API References Full reference for the educhateval API: classes, methods, and usage
๐Ÿค” About Learn more about the thesis project, context, and contributors

โš™๏ธ Usage

from pathlib import Path
from educhateval import FrameworkGenerator, 
                        DialogueSimulator,
                        PredictLabels,
                        Visualizer

1. Generate Label Framework

generator = FrameworkGenerator(
    model_name="llama-3.2-3b-instruct",
    api_url="http://localhost:1234/v1/completions"
)

df_4 = generator.generate_framework(
    prompt_path="outline_prompts/prompt_default_4types.py",
    num_samples=200
)

filtered_df = generator.filter_with_classifier(
    train_data="data/tiny_labeled_default.csv",
    synth_data=df_4
)

2. Synthesize Interaction

simulator = DialogueSimulator(
    backend="mlx",
    model_id="mlx-community/Qwen2.5-7B-Instruct-1M-4bit"
)

seed_message = "Hi, can you please help me with my English course?"

# Simulate a single student-tutor dialogue with a custom YAML file
df_single = simulator.simulate_dialogue(
    mode="general_task_solving",
    turns=10,
    seed_message_input=seed_message,
    custom_prompt_file=Path("prompts/my_custom_prompts.yaml")
)

3. Classify and Predict

predictor = PredictLabels(model_name="distilbert/distilroberta-base")

annotaded_df = predictor.run_pipeline(
    train_data=filtered_df,
    new_data=df_single,
    text_column="text",
    label_column="category",
    columns_to_classify=["student_msg", "tutor_msg"],
    split_ratio=0.2
)

4. Visualize

viz = Visualizer()

summary = viz.create_summary_table(
    df=annotaded_df,
    label_columns=["predicted_labels_student_msg", "predicted_labels_tutor_msg"]
)

viz.plot_category_bars(
    df=annotaded_df,
    label_columns=["predicted_labels_student_msg", "predicted_labels_tutor_msg"],
    use_percent=True,
    title="Distribution of Predicted Classes"
)

viz.plot_turn_trends(
    df=annotaded_df,
    student_col="predicted_labels_student_msg",
    tutor_col="predicted_labels_tutor_msg",
    title="Category Distribution over Turns"
)

viz.plot_history_interaction(
    df=annotaded_df,
    student_col="predicted_labels_student_msg",
    tutor_col="predicted_labels_tutor_msg",
    focus_agent="student",
    use_percent=True
)

๐Ÿซถ๐Ÿผ Acknowdledgement

This project builds on existing tools and ideas from the open-source community. While specific references are provided within the relevant scripts throughout the repository, the key sources of inspiration are also acknowledged here to highlight the contributions that have shaped the development of this package.

๐Ÿ“ฌ Contact

Made by Laura Wulff Paaby
Feel free to reach out via:


Complete overview:

โ”œโ”€โ”€ data/                                  
โ”‚   โ”œโ”€โ”€ generated_dialogue_data/           # Generated dialogue samples
โ”‚   โ”œโ”€โ”€ generated_tuning_data/             # Generated framework data for fine-tuning 
โ”‚   โ”œโ”€โ”€ logged_dialogue_data/              # Logged real dialogue data
โ”‚   โ”œโ”€โ”€ Final_output/                      # Final classified data 
โ”‚
โ”œโ”€โ”€ Models/                                # Folder for trained models and checkpoints (ignored)
โ”‚
โ”œโ”€โ”€ src/educhateval/                       # Main source code for all components
โ”‚   โ”œโ”€โ”€ chat_ui.py                         # CLI interface for wrapping interactions
โ”‚   โ”œโ”€โ”€ descriptive_results/               # Scripts and tools for result analysis
โ”‚   โ”œโ”€โ”€ dialogue_classification/           # Tools and models for dialogue classification
โ”‚   โ”œโ”€โ”€ dialogue_generation/               
โ”‚   โ”‚   โ”œโ”€โ”€ agents/                        # Agent definitions and role behaviors
โ”‚   โ”‚   โ”œโ”€โ”€ models/                        # Model classes and loading mechanisms
โ”‚   โ”‚   โ”œโ”€โ”€ txt_llm_inputs/               # System prompts and structured inputs for LLMs
โ”‚   โ”‚   โ”œโ”€โ”€ chat_instructions.py          # System prompt templates and role definitions
โ”‚   โ”‚   โ”œโ”€โ”€ chat_model_interface.py       # Interface layer for model communication
โ”‚   โ”‚   โ”œโ”€โ”€ chat.py                       # Main script for orchestrating chat logic
โ”‚   โ”‚   โ””โ”€โ”€ simulate_dialogue.py          # Script to simulate full dialogues between agents
โ”‚   โ”œโ”€โ”€ framework_generation/            
โ”‚   โ”‚   โ”œโ”€โ”€ outline_prompts/              # Prompt templates for outlines
โ”‚   โ”‚   โ”œโ”€โ”€ outline_synth_LMSRIPT.py      # Synthetic outline generation pipeline
โ”‚   โ”‚   โ””โ”€โ”€ train_tinylabel_classifier.py # Training classifier on manually made true data
โ”‚
โ”œโ”€โ”€ .python-version                       # Python version file for (Poetry)
โ”œโ”€โ”€ poetry.lock                           # Locked dependency versions (Poetry)
โ”œโ”€โ”€ pyproject.toml                        # Main project config and dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

educhateval-0.1.7.tar.gz (37.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

educhateval-0.1.7-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file educhateval-0.1.7.tar.gz.

File metadata

  • Download URL: educhateval-0.1.7.tar.gz
  • Upload date:
  • Size: 37.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for educhateval-0.1.7.tar.gz
Algorithm Hash digest
SHA256 2745e6892642a7ade0e6b3f4373d65ab8c43c02d3b8df3ebbf71bcc4560d7197
MD5 2c08c0d36abedad5cc28a3c3b93d8b38
BLAKE2b-256 40c093b699fb37e8b110e3f37aff6c2139ffce63bdaa2f0607335de7cec5aa8b

See more details on using hashes here.

File details

Details for the file educhateval-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: educhateval-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 45.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for educhateval-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3d6baabb38bd0407a3ee858d3e2cf8c8dd8fb4b23f24ea7ce9d59c43b70fa5e3
MD5 f5b24287bf4ffb1c0247a1d442ff9cc2
BLAKE2b-256 31b3c0e9f39b5b1d42e4db552b5410993e0bdc474ad87fab2e9701521b5a098c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page