Combining Adaptivity with Progression Ordering for Intelligent Tutoring Systems
Problems with current LAS systems:
- LAS systems rely on an expert instructor or team of instructors to create high quality content and order such content into an effective learning sequence for the students.
Bayesian Knowledge Tracing (BKT)
Recent work on automatically providing personalized student advancement through such a curriculum graph using concepts of a zone of proximal development and multi-armed bandits.
Organizing Vocabulary Knowledge
A sentence s1 is harder than another sentence s2, indicated as s1 > s2 if s1 covers all the vocabulary words in s2. A sentence s1 is directly harder than s2 if s1 > s2 and there does not exist a third sentence such that s1 > s3 > s2.
Create a graph in which each node represents a sentence in our corpus and each directed edge represents a directly harder than relation between two nodes.
Progessing Students Using Multi-Armed Bandits
ZPDES algorithm proposed by Clement et al. for using multi-armed bandits for problem selection.
Given a curriculum graph, for each node in the graph, the algorithm keeps track of a belief state of student mastery, mastered or unmastered.
At each timestep, the algorithm selects a problem from within the set of problems on the boundary of the student's knowledge, which is defined as the Zone of Proximal Development (ZPD).
The algorithm selects the problem from this set that it predicts will give the most reward, which is measured in terms of student learning progress.
ZPD Algo
The prerequisites of a node is the set of all nodes that have a directed edge towards that node. On initialization, all problems start in the non-learned knowledge state and start with an initial un-normalized weight w_a = w_i. The belief ZPD is initialized to the set of problems that have no prerequisites.
To select the next problem, the weights \( w_a \) of the problems in the ZPD are normalized \( w_{a, n}\) to ensure a proper probability distribution \( p_a \) over the problems:
$$ w_{a, n} = \frac{w_a}{\sum_{a' \in ZPD} w_{a'}}$$
Once a problem, problem a, is selected and presented to the student, the correctness of the student answer is recorded as \(C_{a, i}\) where \(i\) represents it is the \( i^{th}\) time problem \( a\) has been presented. The reward (r_{a, i}) of problem a at this timestep is calculated by the approximated gradient of the performance of the student on that problem and this