Library to simulate knowledge tracing datasets
Project description
ktdg (Knowledge tracing data generator)
Library used to create synthetic knowledge tracing data.
Example configs can be found in config
.
Usage | Setup | Documentation
Usage
To create a new config or complete an existing one:
$ ktdg create --help
Usage: ktdg create [OPTIONS] CONFIG
(c) Creates a config or completes it, saving it to the given file.
Arguments:
CONFIG Path of the config to complete or create [required]
Options:
-h, --help Show this message and exit.
To generate the synthetic data from the config:
$ ktdg generate --help
Usage: ktdg generate [OPTIONS] CONFIG
(g) Generates the data for the given config, saving it as a json file named
"data.json".
Arguments:
CONFIG Configuration file to use [required]
Options:
-h, --help Show this message and exit.
Setup
-
Install
poetry
-
poetry config virtualenvs.in-project true
-
poetry install
-
source .venv/bin/activate
Documentation
Generation
Skills
Skills are generated with the following parameters:
$n^K$ / n
: number of skills to generate
difficulty (float)
: by how much to scale question difficulties for questions needing this skill sampled from a distribution
seed (int)
: random seed to use when generating the skills
Students
Students are generated with the following parameters:
n
: number of students to generate
$n_i \sim N^S, n_i \in {0,...,n^K}$ / n_skills (int)
: number of skills per student sampled from a distribution
$m_{ik} \sim M^Q, m_{ik} \in [0,1]$ / skill_mastery (float)
: mastery for a given student and skill sampled from a distribution
$s_i^S \sim S^S, s_i^S \in [0,1]$ / slip (float)
: slip rate for a given student sampled from a distribution
$g_i^S \sim G^S, g_i^S \in [0,1]$ / guess (float)
: guess rate for a given student sampled from a distribution
$l_i^S \sim L^S, l_i^S \in [0,1]$ / learning_rate (float)
: rate of learning for a given student sampled from a distribution
$f_i^S \sim F^S, f_i^S \in [0,1]$ / forget_rate (float)
: rate of forgetting for a given student sampled from a distribution
binary_learning (bool)
: if a skill should be considered known ($=1$) or not ($=0$) instead of being continuous between 0 and 1
seed (int)
: random seed to use when generating the students
Questions
Questions are generated with the following parameters:
n
: number of questions to generate
$n_j \sim N^Q, n_j \in {0,...,n^K}$ / n_skills (int)
: number of skills per question sampled from a distribution
$m_{ik} \sim M^Q, m_{ik} \in [0,1]$ / skill_mastery (float)
: mastery for a given question and skill sampled from a distribution
$d_j^Q \sim D^Q, d_j^Q \in [0,1]$ / difficulty (float)
: difficulty for a given question sampled from a distribution
$s_j^Q \sim S^Q, s_j^Q \in [0,1]$ / slip (float)
: slip rate for a given question sampled from a distribution
$g_j^Q \sim G^Q, g_j^Q \in [0,1]$ / guess (float)
: guess rate for a given question sampled from a distribution
seed (int)
: random seed to use when generating the questions
Answers
Answers are generated using the following formulas:
$$\boldsymbol{q}j = \left(q{jk}\right)_{k=1,...,n^K}$$
$$s_{ij} = 1 - \sqrt{(1 - s_i) \cdot (1 - s_j)}$$
$$g_{ij} = 1 - \sqrt{(1 - g_i) \cdot (1 - g_j)}$$
$$\boldsymbol{s}i^0 = \left(s{ik}\right)_{k=1,...,n^K}$$
$$\boldsymbol{s}i^t = \underbrace{f_i \cdot \boldsymbol{s}i^{t-1}}{\text{skill forgetting}} + l_i \cdot \underbrace{(1 - g_a) \cdot (1 - g{ij})}{\text{adjustment for guessing}} \cdot \underbrace{(0.5 + d_j)}{\text{adjustment for difficulty}} \cdot \underbrace{(1 - w_a \cdot (1 - a_i^t))}_{\text{adjustment for correctness}} \cdot \boldsymbol{q}_j$$
$$a_i^t = g_{ij} + (1 - s_{ij}) \cdot \frac{m_{ij}}{1 + m_{ij}}$$
$$m_{ij} = \exp\left(m_a \cdot (\boldsymbol{q}_j^T\boldsymbol{s}_i^t - d_j)\right)$$
for question $j$ asked at time $t$ and with the following parameters:
$n_i^A \sim N^A, n_i^A \in \mathbb{N}$ / n_per_student (int)
: number of questions asked per student sampled from a distribution
$w_a \in \mathbb{R}^+$ / wrong_answer_adjustment (float)
: by how much should the learning be scaled for a wrong answer
$g_a \in \mathbb{R}^+$ / guess_adjustment (float)
: by how much should the learning be scaled proportional to the guess parameter
$m_a \in \mathbb{R}^+$ / mastery_importance (float)
: by how much should the mastery importance part in the exponential be scaled by
max_repetitions (int)
: maximum number of repetition of a given question allowed per student
can_repeat_correct (bool)
: if a question answered correctly can be repeated
seed (int)
: random seed to use when generating the answers
Distributions
constant: All samples have the same value value
.
normal: Samples are taken from a normal distribution with mean mu
and standard deviation sigma
.
binomial: Samples are taken from a binomial distribution with number of possible successes n
and probability of success p
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ktdg-0.1.18.tar.gz
.
File metadata
- Download URL: ktdg-0.1.18.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.11.2 Linux/5.4.109+
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 748cacc68b7f4ad6d827ffcad9b3be74b45f65674e169a4b1ec1302fb85ce67a |
|
MD5 | bd35a7de055ff78573bd7378a0373cd1 |
|
BLAKE2b-256 | b960bd23877b0a5e0a27b052296e4b4c61cb287eb8b15fd39afc5bdccb8607ee |
File details
Details for the file ktdg-0.1.18-py3-none-any.whl
.
File metadata
- Download URL: ktdg-0.1.18-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.11.2 Linux/5.4.109+
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc3b39a1e3e0a95833d1ab251bbc000dc6a0efa98f4181b6af7e1cb08f398df7 |
|
MD5 | 0f0fae595b00ea727b962fe382ab1681 |
|
BLAKE2b-256 | a72bec84250d05d61cae79d6e0c1307f6e48da725644ca6d0635ed86a86ee571 |