Skip to main content

Library to simulate knowledge tracing datasets

Project description

Pipeline coverage report Pypi version Code style: black security: bandit

ktdg (Knowledge tracing data generator)

Library used to create synthetic knowledge tracing data. Example configs can be found in config.

Usage | Setup | Documentation

Usage

To create a new config or complete an existing one:

$ ktdg create --help
Usage: ktdg create [OPTIONS] CONFIG

  (c) Creates a config or completes it, saving it to the given file.

Arguments:
  CONFIG  Path of the config to complete or create  [required]

Options:
  -h, --help  Show this message and exit.

To generate the synthetic data from the config:

$ ktdg generate --help
Usage: ktdg generate [OPTIONS] CONFIG

  (g) Generates the data for the given config, saving it as a json file named
  "data.json".

Arguments:
  CONFIG  Configuration file to use  [required]

Options:
  -h, --help  Show this message and exit.

Setup

  1. Install poetry

  2. poetry config virtualenvs.in-project true

  3. poetry install

  4. source .venv/bin/activate

Documentation

Generation

Skills

Skills are generated with the following parameters:

$n^K$ / n: number of skills to generate

difficulty (float): by how much to scale question difficulties for questions needing this skill sampled from a distribution

seed (int): random seed to use when generating the skills

Students

Students are generated with the following parameters:

n: number of students to generate

$n_i \sim N^S, n_i \in {0,...,n^K}$ / n_skills (int): number of skills per student sampled from a distribution

$m_{ik} \sim M^Q, m_{ik} \in [0,1]$ / skill_mastery (float): mastery for a given student and skill sampled from a distribution

$s_i^S \sim S^S, s_i^S \in [0,1]$ / slip (float): slip rate for a given student sampled from a distribution

$g_i^S \sim G^S, g_i^S \in [0,1]$ / guess (float): guess rate for a given student sampled from a distribution

$l_i^S \sim L^S, l_i^S \in [0,1]$ / learning_rate (float): rate of learning for a given student sampled from a distribution

$f_i^S \sim F^S, f_i^S \in [0,1]$ / forget_rate (float): rate of forgetting for a given student sampled from a distribution

binary_learning (bool): if a skill should be considered known ($=1$) or not ($=0$) instead of being continuous between 0 and 1

seed (int): random seed to use when generating the students

Questions

Questions are generated with the following parameters:

n: number of questions to generate

$n_j \sim N^Q, n_j \in {0,...,n^K}$ / n_skills (int): number of skills per question sampled from a distribution

$m_{ik} \sim M^Q, m_{ik} \in [0,1]$ / skill_mastery (float): mastery for a given question and skill sampled from a distribution

$d_j^Q \sim D^Q, d_j^Q \in [0,1]$ / difficulty (float): difficulty for a given question sampled from a distribution

$s_j^Q \sim S^Q, s_j^Q \in [0,1]$ / slip (float): slip rate for a given question sampled from a distribution

$g_j^Q \sim G^Q, g_j^Q \in [0,1]$ / guess (float): guess rate for a given question sampled from a distribution

seed (int): random seed to use when generating the questions

Answers

Answers are generated using the following formulas:

$$\boldsymbol{q}j = \left(q{jk}\right)_{k=1,...,n^K}$$

$$s_{ij} = 1 - \sqrt{(1 - s_i) \cdot (1 - s_j)}$$

$$g_{ij} = 1 - \sqrt{(1 - g_i) \cdot (1 - g_j)}$$

$$\boldsymbol{s}i^0 = \left(s{ik}\right)_{k=1,...,n^K}$$

$$\boldsymbol{s}i^t = \underbrace{f_i \cdot \boldsymbol{s}i^{t-1}}{\text{skill forgetting}} + l_i \cdot \underbrace{(1 - g_a) \cdot (1 - g{ij})}{\text{adjustment for guessing}} \cdot \underbrace{(0.5 + d_j)}{\text{adjustment for difficulty}} \cdot \underbrace{(1 - w_a \cdot (1 - a_i^t))}_{\text{adjustment for correctness}} \cdot \boldsymbol{q}_j$$

$$a_i^t = g_{ij} + (1 - s_{ij}) \cdot \frac{m_{ij}}{1 + m_{ij}}$$

$$m_{ij} = \exp\left(m_a \cdot (\boldsymbol{q}_j^T\boldsymbol{s}_i^t - d_j)\right)$$

for question $j$ asked at time $t$ and with the following parameters:

$n_i^A \sim N^A, n_i^A \in \mathbb{N}$ / n_per_student (int): number of questions asked per student sampled from a distribution

$w_a \in \mathbb{R}^+$ / wrong_answer_adjustment (float): by how much should the learning be scaled for a wrong answer

$g_a \in \mathbb{R}^+$ / guess_adjustment (float): by how much should the learning be scaled proportional to the guess parameter

$m_a \in \mathbb{R}^+$ / mastery_importance (float): by how much should the mastery importance part in the exponential be scaled by

max_repetitions (int): maximum number of repetition of a given question allowed per student

can_repeat_correct (bool): if a question answered correctly can be repeated

seed (int): random seed to use when generating the answers

Distributions

constant: All samples have the same value value.

normal: Samples are taken from a normal distribution with mean mu and standard deviation sigma.

binomial: Samples are taken from a binomial distribution with number of possible successes n and probability of success p.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ktdg-0.1.18.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

ktdg-0.1.18-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file ktdg-0.1.18.tar.gz.

File metadata

  • Download URL: ktdg-0.1.18.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.2 Linux/5.4.109+

File hashes

Hashes for ktdg-0.1.18.tar.gz
Algorithm Hash digest
SHA256 748cacc68b7f4ad6d827ffcad9b3be74b45f65674e169a4b1ec1302fb85ce67a
MD5 bd35a7de055ff78573bd7378a0373cd1
BLAKE2b-256 b960bd23877b0a5e0a27b052296e4b4c61cb287eb8b15fd39afc5bdccb8607ee

See more details on using hashes here.

File details

Details for the file ktdg-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: ktdg-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.2 Linux/5.4.109+

File hashes

Hashes for ktdg-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 dc3b39a1e3e0a95833d1ab251bbc000dc6a0efa98f4181b6af7e1cb08f398df7
MD5 0f0fae595b00ea727b962fe382ab1681
BLAKE2b-256 a72bec84250d05d61cae79d6e0c1307f6e48da725644ca6d0635ed86a86ee571

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page