A library of algorithms for reproducing Knowledge Tracing, Cognitive Diagnosis, and Exercise Recommendation models.
Project description
PyEdmine
文档 | 数据集信息 | 教育数据挖掘论文列表 | 模型榜单
PyEdmine 是一个面向研究者的,易于开发与复现的教育领域数据挖掘代码库。 目前已实现了15个知识追踪模型、7个认知诊断模型、3个习题推荐模型。 我们约定了一个统一、易用的数据文件格式,并已支持 14 个 benchmark dataset。 此外,我们设计了一个统一的实验设置,该设置下,知识追踪模型和认知诊断模型可以在习题推荐任务上进行评估。
图片: PeEdmine 实验流程图
安装
从pip安装
pip install edmine
从源文件安装
git clone git@github.com:ZhijieXiong/pyedmine.git && cd pyedmine
pip install -e .
快速开始
如果你从GitHub下载了PyEdmine的源码,你可以使用examples里提供的脚本进行数据预处理、数据集划分、模型训练和模型评估:
配置数据和模型的存放目录
在examples目录下创建settings.json文件,在该文件中配置数据目录和模型目录,格式如下
{
"FILE_MANAGER_ROOT": "/path/to/save/data",
"MODELS_DIR": "/path/to/save/model"
}
然后运行脚本
python examples/set_up.py
则会自动生成(内置处理代码的)数据集的原始文件存放目录和经过统一处理的文件的存放目录 ,其中各数据集的原始存放目录(位于/path/to/save/data/dataset_raw)如下
.
├── SLP
│ ├── family.csv
│ ├── psycho.csv
│ ├── school.csv
│ ├── student.csv
│ ├── term-bio.csv
│ ├── term-chi.csv
│ ├── term-eng.csv
│ ├── term-geo.csv
│ ├── term-his.csv
│ ├── term-mat.csv
│ ├── term-phy.csv
│ ├── unit-bio.csv
│ ├── unit-chi.csv
│ ├── unit-eng.csv
│ ├── unit-geo.csv
│ ├── unit-his.csv
│ ├── unit-mat.csv
│ └── unit-phy.csv
├── assist2009
│ └── skill_builder_data.csv
├── assist2009-full
│ └── assistments_2009_2010.csv
├── assist2012
│ └── 2012-2013-data-with-predictions-4-final.csv
├── assist2015
│ └── 2015_100_skill_builders_main_problems.csv
├── assist2017
│ └── anonymized_full_release_competition_dataset.csv
├── edi2020
│ ├── images
│ ├── metadata
│ │ ├── answer_metadata_task_1_2.csv
│ │ ├── answer_metadata_task_3_4.csv
│ │ ├── question_metadata_task_1_2.csv
│ │ ├── question_metadata_task_3_4.csv
│ │ ├── student_metadata_task_1_2.csv
│ │ ├── student_metadata_task_3_4.csv
│ │ └── subject_metadata.csv
│ ├── test_data
│ │ ├── quality_response_remapped_private.csv
│ │ ├── quality_response_remapped_public.csv
│ │ ├── test_private_answers_task_1.csv
│ │ ├── test_private_answers_task_2.csv
│ │ ├── test_private_task_4.csv
│ │ ├── test_private_task_4_more_splits.csv
│ │ ├── test_public_answers_task_1.csv
│ │ ├── test_public_answers_task_2.csv
│ │ └── test_public_task_4_more_splits.csv
│ └── train_data
│ ├── train_task_1_2.csv
│ └── train_task_3_4.csv
├── junyi2015
│ ├── junyi_Exercise_table.csv
│ ├── junyi_ProblemLog_original.csv
│ ├── relationship_annotation_testing.csv
│ └── relationship_annotation_training.csv
├── moocradar
│ ├── problem.json
│ ├── student-problem-coarse.json
│ ├── student-problem-fine.json
│ └── student-problem-middle.json
├── poj
│ └── poj_log.csv
├── slepemapy-anatomy
│ └── answers.csv
├── statics2011
│ └── AllData_student_step_2011F.csv
└── xes3g5m
├── kc_level
│ ├── test.csv
│ └── train_valid_sequences.csv
├── metadata
│ ├── kc_routes_map.json
│ └── questions.json
└── question_level
├── test_quelevel.csv
└── train_valid_sequences_quelevel.csv
数据预处理
你可以选择使用我们的数据集预处理脚本
python data_preprocess/kt_data.py
该脚本会生成数据集经过统一格式处理后的文件(位于/path/to/save/data/dataset_preprocessed)
注意:Ednet-kt1数据集由于原始数据文件数量太多,需要首先使用脚本examples/data_preprocess/generate_ednet_raw.py对用户的数据按照5000为单位进行聚合,并且因为该数据集过于庞大,所以预处理默认是只使用随机抽选的5000名用户的数据
或者你可以直接下载已处理好的数据集文件
数据集划分
你可以选择使用我们提供的数据集划分脚本,划分好的数据集文件将存放在/path/to/save/data/settings/setting_name下
python examples/knowledge_tracing/prepare_dataset/pykt_setting.py # 知识追踪
python examples/cognitive_diagnosis/prepare_dataset/ncd_setting.py # 认知诊断
python examples/exercise_recommendation/preprare_dataset/offline_setting.py # 习题推荐
你也可以直接下载划分后的数据集文件(KT, CD, ER, CD4ER),然后将其存放在对应的目录下 或者你也可以参照我们提供的数据集划分脚本来设计自己的实验处理流程
训练模型
对于无需生成包含额外信息的模型,直接运行训练代码即可,如
python examples/knowledge_tracing/train/dkt.py # 使用默认参数训练DKT模型
python examples/cognitive_diagnosis/train/ncd.py # 使用默认参数训练NCD模型
对于需要预先生成额外信息的模型,例如DIMKT需要预先计算难度信息、HyperCD需要预先构造知识点超图信息,则需要先运行模型对应的额外信息生成脚本,如
python examples/knowledge_tracing/dimkt/get_difficulty.py # 生成DIMKT需要的难度信息
python examples/cognitive_diagnosis/hyper_cd/construct_hyper_graph.py # 生成HyperCD需要的图信息
训练时会得到类似如下的输出
2025-03-06 02:12:35 epoch 1 , valid performances are main metric: 0.7186 , AUC: 0.7186 , ACC: 0.64765 , MAE: 0.41924 , RMSE: 0.46919 , train loss is predict loss: 0.588902 , current best epoch is 1
2025-03-06 02:12:37 epoch 2 , valid performances are main metric: 0.72457 , AUC: 0.72457 , ACC: 0.63797 , MAE: 0.42329 , RMSE: 0.47456 , train loss is predict loss: 0.556672 , current best epoch is 2
2025-03-06 02:12:39 epoch 3 , valid performances are main metric: 0.72014 , AUC: 0.72014 , ACC: 0.63143 , MAE: 0.43218 , RMSE: 0.47536 , train loss is predict loss: 0.551513 , current best epoch is 2
2025-03-06 02:12:40 epoch 4 , valid performances are main metric: 0.71843 , AUC: 0.71843 , ACC: 0.65182 , MAE: 0.41843 , RMSE: 0.46837 , train loss is predict loss: 0.548907 , current best epoch is 2
2025-03-06 02:12:42 epoch 5 , valid performances are main metric: 0.72453 , AUC: 0.72453 , ACC: 0.65276 , MAE: 0.41841 , RMSE: 0.46684 , train loss is predict loss: 0.547639 , current best epoch is 2
...
2025-03-06 02:13:44 epoch 31 , valid performances are main metric: 0.72589 , AUC: 0.72589 , ACC: 0.65867 , MAE: 0.40794 , RMSE: 0.46316 , train loss is predict loss: 0.532516 , current best epoch is 16
2025-03-06 02:13:47 epoch 32 , valid performances are main metric: 0.72573 , AUC: 0.72573 , ACC: 0.65426 , MAE: 0.41602 , RMSE: 0.46415 , train loss is predict loss: 0.532863 , current best epoch is 16
2025-03-06 02:13:49 epoch 33 , valid performances are main metric: 0.72509 , AUC: 0.72509 , ACC: 0.6179 , MAE: 0.43133 , RMSE: 0.48417 , train loss is predict loss: 0.532187 , current best epoch is 16
2025-03-06 02:13:52 epoch 34 , valid performances are main metric: 0.72809 , AUC: 0.72809 , ACC: 0.63938 , MAE: 0.41994 , RMSE: 0.47377 , train loss is predict loss: 0.533765 , current best epoch is 16
2025-03-06 02:13:54 epoch 35 , valid performances are main metric: 0.72523 , AUC: 0.72523 , ACC: 0.63852 , MAE: 0.42142 , RMSE: 0.47327 , train loss is predict loss: 0.531101 , current best epoch is 16
2025-03-06 02:13:57 epoch 36 , valid performances are main metric: 0.72838 , AUC: 0.72838 , ACC: 0.61986 , MAE: 0.43105 , RMSE: 0.48364 , train loss is predict loss: 0.532342 , current best epoch is 16
best valid epoch: 16 , train performances in best epoch by valid are main metric: 0.74893 , AUC: 0.74893 , ACC: 0.72948 , MAE: 0.34608 , RMSE: 0.42706 , main_metric: 0.74893 ,
valid performances in best epoch by valid are main metric: 0.72902 , AUC: 0.72902 , ACC: 0.59389 , MAE: 0.43936 , RMSE: 0.49301 , main_metric: 0.72902 ,
如果训练模型时use_wandb参数为True,则可以在wandb上查看模型的损失变化和指标变化
评估模型
如果训练模型时save_model参数,则会将模型参数文件保存至/path/to/save/model目录下,那么可以使用测试集对模型进行评估,如
python examples/knowledge_tracing/evaluate/sequential_dlkt.py --model_dir_name [model_dir_name] --dataset_name [dataset_name] --test_file_name [test_file_name]
其中知识追踪和认知诊断模型处理常规的指标评估外,还可以进行一些细粒度的指标评估,例如冷启动评估,知识追踪的多步预测等,这些评估都可以通过设置对应的参数开启
自动调参
PyEdmine还支持基于贝叶斯网络的自动调参功能,如
python examples/cognitive_diagnosis/train/ncd_search_params.py
该脚本基于代码中的parameters_space变量设置搜参空间
PyEdmine 重要发布
| Releases | Date |
|---|---|
| v0.1.1 | 3/31/2025 |
贡献
如果您遇到错误或有任何建议,请通过 Issue 进行反馈
我们欢迎关于修复错误、添加新特性的任何贡献。
如果想贡献代码,请先在issue中提出问题,然后再提PR。
免责声明
PyEdmine 基于 MIT License 进行开发,本项目的所有数据和代码只能被用于学术目的。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edmine-0.0.9.tar.gz.
File metadata
- Download URL: edmine-0.0.9.tar.gz
- Upload date:
- Size: 96.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.9.20 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e09a396cd21cce1d309b0b5d4ecfa04e9f34aaa383f9a60fd1be88bc0a256222
|
|
| MD5 |
8448cf0710f8ad20a5b73d8f57f9306b
|
|
| BLAKE2b-256 |
c13f7ee45cb41b8719930067e0dd271006c9f14812d986401b60c3f2d6f643aa
|
File details
Details for the file edmine-0.0.9-py3-none-any.whl.
File metadata
- Download URL: edmine-0.0.9-py3-none-any.whl
- Upload date:
- Size: 132.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.9.20 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4c3017c0e7dc2f4c95e02853849031c2efcf601e016db5a69f3c869e2dfc436
|
|
| MD5 |
3d91ad910c44878a2ef514e00058f525
|
|
| BLAKE2b-256 |
b36d01337514eea716e0219f691d62ce6d9b1c56478ca389017166deca21964c
|