Skip to main content

A pipeline and package to implement and evaluate LLM chat bot tutors in education.

Project description

logo


๐Ÿš€ Overview

This package offers a framework for researchers to map and quantify interactions between students and LLM-based tutors in educational settings. It supports structured, objective evaluation through classification, simulation, and visualization tools, and is designed for flexible use across tasks of any scale. The framework accommodates both researchers analyzing pre-collected, annotated data and those starting from scratch, providing modular support through each step of the evaluation process.

The package is designed to:

  • Provide a customized framework for classification, evaluation, and fine-tuning
  • Simulate studentโ€“tutor interactions using role-based prompts and seed messages when real data is unavailable
  • Initiate an interface with locally hosted, open-source models (e.g., via LM Studio or Hugging Face)
  • Log interactions in structured formats (JSON/CSV) for downstream analysis
  • Train and apply classifiers to predict customized interaction classes and visualize patterns across conversations

Overview of the system architecture:

flowchart

๐Ÿค— Integration

Note that the framework and dialogue generation is integrated with LM Studio, and the wrapper and classifiers with Hugging Face.

The package currently requires Python 3.12 due to version constraints in core dependencies, particularly outlines.


โš™๏ธ Installation

pip install educhateval

โš™๏ธ Usage

from pathlib import Path
from educhateval import FrameworkGenerator, 
                        DialogueSimulator,
                        PredictLabels,
                        Visualizer

1. Generate Label Framework

# initiate generator 
generator = FrameworkGenerator(
    model_name="llama-3.2-3b-instruct", # the model already downloaded via LM Studio
    api_url="http://localhost:1234/v1/completions" # the address of manually activated local server 
)

# apply generator to synthesize data
df_4 = generator.generate_framework(
    prompt_path="../templates/prompt_default_4types.py", # path to prompt template, can also be a direct dictionary
    num_samples=200                                      # number of samples per category to simulate
)

# quality check and filter the data with classifier trained on a few true examples
filtered_df = generator.filter_with_classifier(
    train_data="../templates/manual_labeled.csv", # manually labeled training data
    synth_data=df_4                               # the data to quality check
)

2. Synthesize Interaction

# initiate simulater
simulator = DialogueSimulator(
    backend="mlx",                                       # choose either HF or MLX driven setup
    model_id="mlx-community/Qwen2.5-7B-Instruct-1M-4bit" # load model
)

# define seed_message and prompt scheme + mode
custom_prompts = {
    "conversation_types": { 
        "general_task_solving": { # the mode
            "student": "You are a student asking for help with your Biology homework.",
            "tutor": "You are a helpful tutor assisting a student. Provide short precise answers."
        },
    }
}
prompt = custom_prompts["conversation_types"]["general_task_solving"]

seed_message = "I'm trying to understand some basic concepts of human biology, can you help?" 

# Simulate the student-tutor dialogue
df_sim = simulator.simulate_dialogue(
    mode="general_task_solving",
    turns=10,                       # number of turns 
    seed_message_input=seed_message
    system_prompts=prompt
)

3. Classify and Predict

# initiate module to classify and predict labels
predictor = PredictLabels(model_name="distilbert/distilroberta-base") # model to be trained and used for predictions

annotaded_df = predictor.run_pipeline(
    train_data=filtered_df,         # the annotated data for training above
    new_data=df_sim,                # the generated dialogues 
    text_column="text",
    label_column="category",
    columns_to_classify=["student_msg", "tutor_msg"],
    split_ratio=0.2
)

4. Visualize

# initiate the module for descriptive visualizations 
viz = Visualizer()

# table of predicted categories (n, %) 
summary = viz.create_summary_table(
    df=annotaded_df,
    student_col="predicted_labels_student_msg",
    tutor_col="predicted_labels_tutor_msg"
)

# bar chart matching the table
viz.plot_category_bars(
    df=annotaded_df,
    student_col="predicted_labels_student_msg",
    tutor_col="predicted_labels_tutor_msg"
)

# line plot of predicted categories over turns
viz.plot_turn_trends(
    df=annotaded_df,
    student_col="predicted_labels_student_msg",
    tutor_col="predicted_labels_tutor_msg"
)

# bar chart over sequential category dependencies between agents
viz.plot_history_interaction(
    df=annotaded_df,
    student_col="predicted_labels_student_msg",
    tutor_col="predicted_labels_tutor_msg",     # only one requiring both student and tutor data
    focus_agent="student"                      # the agent to visualize category dependencies for
)



๐Ÿ“– Documentation

Documentation Description
๐Ÿ“š User Guide Instructions on how to run the entire pipeline provided in the package
๐Ÿ’ก Prompt Templates Overview of system prompts, role behaviors, and instructional strategies
๐Ÿง  API References Full reference for the educhateval API: classes, methods, and usage
๐Ÿค” About Learn more about the thesis project, context, and contributors



๐Ÿ“ฌ Contact

The package is made by Laura Wulff Paaby
Feel free to reach out via:



๐Ÿซถ๐Ÿผ Acknowdledgement

This project builds on existing tools and ideas from the open-source community. While specific references are provided within the relevant scripts throughout the repository, the key sources of inspiration are also acknowledged here to highlight the contributions that have shaped the development of this package.




Complete overview:

โ”œโ”€โ”€ data/                                  
โ”‚   โ”œโ”€โ”€ generated_dialogue_data/           # Generated dialogue samples
โ”‚   โ”œโ”€โ”€ generated_tuning_data/             # Generated framework data for fine-tuning 
โ”‚   โ”œโ”€โ”€ logged_dialogue_data/              # Logged real dialogue data
โ”‚   โ”œโ”€โ”€ Final_output/                      # Final classified data 
โ”‚   โ”œโ”€โ”€ templates/                         # Prompt and seed templates
โ”‚
โ”œโ”€โ”€ docs/                                  # Markdowns to publish with MKDocs
โ”‚
โ”œโ”€โ”€ src/educhateval/                       # Main source code for all components
โ”‚   โ”œโ”€โ”€ chat_ui.py                         # CLI interface for wrapping interactions
โ”‚   โ”œโ”€โ”€ classification_utils.py            # Functions to run the different classificiation models deployed
โ”‚   โ”œโ”€โ”€ core.py                            # Main script behind package wrapping all functions as callable classes
โ”‚   โ”œโ”€โ”€ descriptive_results/               # Scripts and tools for result analysis
โ”‚   โ”œโ”€โ”€ dialogue_classification/           # Tools and models for dialogue classification
โ”‚   โ”œโ”€โ”€ dialogue_generation/               
โ”‚   โ”‚   โ”œโ”€โ”€ agents/                        # Agent definitions and role behaviors
โ”‚   โ”‚   โ”œโ”€โ”€ models/                        # Model classes and loading mechanisms
โ”‚   โ”‚   โ”œโ”€โ”€ txt_llm_inputs/                # Prompt loading functions
โ”‚   โ”‚   โ”œโ”€โ”€ chat_model_interface.py        # Interface layer for model communication
โ”‚   โ”‚   โ”œโ”€โ”€ chat.py                        # Script for orchestrating chat logic
โ”‚   โ”‚   โ””โ”€โ”€ simulate_dialogue.py           # Script to simulate full dialogues between agents
โ”‚   โ”œโ”€โ”€ framework_generation/            
โ”‚   โ”‚   โ”œโ”€โ”€ outline_prompts/               # Prompt templates for outlines
โ”‚   โ”‚   โ”œโ”€โ”€ outline_synth_LMSRIPT.py       # Synthetic outline generation pipeline
โ”‚   โ”‚   โ””โ”€โ”€ train_tinylabel_classifier.py  # Training small classifier on manually made true data
โ”‚
โ”œโ”€โ”€ tutorials/                             # Tutorials on how to use the package in different settings
โ”‚
โ”œโ”€โ”€ mkdocs.yml                             # MKDocs configuration file
โ”œโ”€โ”€ LICENSE                                # MIT License
โ”œโ”€โ”€ .python-version                        # Python version file for (Poetry)
โ”œโ”€โ”€ poetry.lock                            # Locked dependency versions (Poetry)
โ”œโ”€โ”€ pyproject.toml                         # Main project config and dependencies
โ”‚
โ”œโ”€โ”€ models/                                # (ignored) Folder for trained models 
โ”œโ”€โ”€ results/                               # (ignored) Folder for training checkpoints
โ”œโ”€โ”€ site/                                  # (ignored) MKDocs files for documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

educhateval-0.1.8.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

educhateval-0.1.8-py3-none-any.whl (48.7 kB view details)

Uploaded Python 3

File details

Details for the file educhateval-0.1.8.tar.gz.

File metadata

  • Download URL: educhateval-0.1.8.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for educhateval-0.1.8.tar.gz
Algorithm Hash digest
SHA256 909f1979f9220aaa0177958282b5cbe8b96acfd5377abc7f1c2d4010edbb4224
MD5 45f58a552fbd64cbe09600b800167370
BLAKE2b-256 c66a7b10941c389cd25a2bd1a7f77e654b0843f7a94cc214f7ca2faaaf28b49a

See more details on using hashes here.

File details

Details for the file educhateval-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: educhateval-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 48.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for educhateval-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d8d3f480e637128dd9aeb12199c27f567e37c1e4e83a3eb5a8ac776c2325dc65
MD5 23fc6a4ca5084d525e93a4bcd2ed51e9
BLAKE2b-256 07f3c28e74a1d60296f2ce9a175ac82090370a2dc29a2e342009dc9e22769ced

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page