Arbitrage Theorem

Background reading for modern asset pricing theory

Pricing models for financial derivatives require, by their very nature, utilization of continous-time stochastic processes. Three major steps in the theoretical revolution led to the use of advanced mathematical methods:
    • 1. The arbitrage theorem gives the formal conditions under which "arbitrage profits" can or cannot exist. It is known that if asset prices satisfy a simple condition, then arbitrage cannot exit. This was a major development that eventually permitted the calculation of the arbitrage-free price of any "new" derivative product.
      Black Scholes model: used the method of arbitrage free pricing. But the paper was also infleuntial because of the technical steps introduced in obtaining a closed form formula of option prices.
      Using equivalent martingale measures was developed later. This method dramatically simplified and generalized the original approach of Black and Scholes. With these tools, a general method could be used to price any derivative product.
  • The value of derivatives often depends only on the value of the underlying asset, some interest rates, and a few parameters to be calculated. It is significantly easier to model such an instrument mathematically than, say, to model stocks. Some other books:
    • Hull 93
    • Jarrow and Turnbull (96)
    • Ingersoll and Duffie (87 and 96)
    Derivative securities are financial contracts that "derive" their value from the cash market instruments such as stocks, bonds, currencies and commodities. A financial contract is a derivative security, or a contingent claim if its value at expiration date T is determined exactly by the market price of the underlying cash instrument at time T.

    Adaptivity Progression Orders Tutoring

    Combining Adaptivity with Progression Ordering for Intelligent Tutoring Systems
    Problems with current LAS systems:
    • LAS systems rely on an expert instructor or team of instructors to create high quality content and order such content into an effective learning sequence for the students.
    Bayesian Knowledge Tracing (BKT) Recent work on automatically providing personalized student advancement through such a curriculum graph using concepts of a zone of proximal development and multi-armed bandits.

    Organizing Vocabulary Knowledge

    A sentence s1 is harder than another sentence s2, indicated as s1 > s2 if s1 covers all the vocabulary words in s2. A sentence s1 is directly harder than s2 if s1 > s2 and there does not exist a third sentence such that s1 > s3 > s2. Create a graph in which each node represents a sentence in our corpus and each directed edge represents a directly harder than relation between two nodes.

    Progessing Students Using Multi-Armed Bandits

    ZPDES algorithm proposed by Clement et al. for using multi-armed bandits for problem selection.

    Given a curriculum graph, for each node in the graph, the algorithm keeps track of a belief state of student mastery, mastered or unmastered. At each timestep, the algorithm selects a problem from within the set of problems on the boundary of the student's knowledge, which is defined as the Zone of Proximal Development (ZPD). The algorithm selects the problem from this set that it predicts will give the most reward, which is measured in terms of student learning progress.
    ZPD Algo
    The prerequisites of a node is the set of all nodes that have a directed edge towards that node. On initialization, all problems start in the non-learned knowledge state and start with an initial un-normalized weight w_a = w_i. The belief ZPD is initialized to the set of problems that have no prerequisites. To select the next problem, the weights \( w_a \) of the problems in the ZPD are normalized \( w_{a, n}\) to ensure a proper probability distribution \( p_a \) over the problems: $$ w_{a, n} = \frac{w_a}{\sum_{a' \in ZPD} w_{a'}}$$ Once a problem, problem a, is selected and presented to the student, the correctness of the student answer is recorded as \(C_{a, i}\) where \(i\) represents it is the \( i^{th}\) time problem \( a\) has been presented. The reward (r_{a, i}) of problem a at this timestep is calculated by the approximated gradient of the performance of the student on that problem and this

    Mdps

    Markov Decision Process have been extensively studied and form the basis of a wide range of reinforcement learning algorithms.

    Imagine you are extremely sleepy early in the morning at work and you need to figure out how to stay awake. Further, imagine you (the agent) doesn't have the prerequisite knowledge of what kind of beverages you can drink to stay awake.

    So, we can be in either a sleepy state or a wide-awake state. Let this be denoted as the following: \( S = \{ sleepy, awake \}\). Further, we can take the following consumption actions: \( A = \{ Walnuts, Turkey, Rice, Coffee\}\). Now, let's denote the chances of us transferring from a sleepy state to an awake state be the following: \( P(s | a) \), where \( a \in A \text{ and } s \in S \). Essentially, we need to learn a policy such that given our sleepy state, which action to take in order to to stay awake (i.e., our reward). Here, the policy is nothing but the probability to take a particular action in a given state i.e., \( \pi(s, a)\) is the probability of taking action \(a\) in state \( s\).

    Now, we can define the value function as the following: $$ V^{\pi}(s) = \mathbb{E}\{r_{t+1} + \gamma r_{t+2} + \gamma^{2}r_{t+3} + \dots | s_t = s, \pi\}$$ The intuition behind the above equation is that if a particular policy \(\pi\) is followed, the cumulative discounted reward would be \(V^{\pi}(s)\). Here, \(V^{\pi}(s)\) is also known as the value of \(s\) under the policy \( \pi \).

    Now, we have the tools to define the problem statement: What is the optimal policy, i.e., policy, \( \pi^{*}\) such that the \(V^{\pi^{*}}(s)\) is maximized \(\forall s \in S\).

    Let's try to decompose the value function a little bit: $$ V^{\pi}(s) = \mathbb{E}[\sum_{k=0} \gamma^{k}r_{t+k+1} | s_t = s, \pi]$$ This can be further broken down into the the one step look ahead and all possible combinations of the sequence. $$ V^{\pi}(s) = \mathbb{E}[r_{t+1} + \sum_{k=1} \gamma^{k}r_{t+k+1} | s_t = s, \pi]$$. Now, we know that the expectation can be stated as the following: $$ V^{\pi}(s) = \sum_{a \in A} \pi(s, a)R(s, a) + \pi(s, a) \gamma \sum_{s^{'} \in S}P(s'|s, a)V^{\pi}(s')$$. The above is the famous Bellman Equation.

    Further, just the way assigned a value to a state, we can assign a value to an action-value pair \( (s, a)\). This value is defined as the following: $$ Q^{\pi}(s, a) = \mathbb{E}\{r_{t+1} + \gamma r_{t+2} + \dots | s_{t} = s, a_{t} = a, \pi\}$$

    Now, we can do a similar expansion as above to get the following: $$ Q^{*}(s, a) = R(s, a) + \gamma \sum_{s' \in S}P(s'|s, a) max_{a' \in A_{s'}}Q^{*}(s', a')$$

    Value Function Approximation: As an example DP algorithm we consider value iteration. This iterative update is also known as backup because we are approximating the value of the currents tate by backing up information from possible successor states. $$V_{k+1}(s) = max_{a \in A_s}[R(s, a) + \gamma \sum_{s' \in S} P(s'|s, a) V_{k}(s')]$$. Similarly, we can approximate the Q value function by using iterative backup updates: $$Q_{k+1}(s, a) = R(s, a) + \gamma \sum_{s' \in S} P(s'|s, a) max_{a' \in A_{s'}}Q_{k}(s', a')$$

    The following snippet is an example of policy evaluation using the aforementioned iterative backups method.