In this post I will prove that the strategy that is presented in the previous post
is indeed an optimal strategy in the sense that it minimize the expected number
of button-pressing for reaching a certain song.
One of the ways to show this is using the model of decision tree. Every algorithm/strategy that has a final number of states (in our case the states represented by the number of the song) and a final number of decisions to make (in our case there are three decisions in every state - pressing 'prev'/'next'/'rdm') can be presented as a rooted tree.
'tree' means a connected graph with no cycles.
'rooted' means that there is a special node (vertex) in the tree that we consider to be the common ancestor of all other nodes. Rooting a tree induces a natural direction on the tree edges: from the root to its neighbors, from its neighbors to the rest of their neighbors and so on... (see pic 1).
For example: consider the case where there are only three songs, we start from song no. 1 and want to reach song no. 3. We decide to press 'rmd' one time and then to 'prev'/'next' until we reach the song.
then the desired decision tree will look like :
Pic 2. Simple Decision Tree. |
I have decided to color random decision with red and deterministic decision with blue;
Also note that random-decision nodes have 3 descendants (representing all the nodes that the random pressing might lead to) while deterministic-decision nodes have only one descendant.
Before I start the proof I need to present a few definitions:
Definition: A decision tree that represent a successful strategy for reaching a song (will reach the song with a finite expected number of decisions) shall be called Finite Pressing Tree (FPT).
Definition: The average height (AH) is defined in the following recursive manner:
The average height of a leaf (node with no descendants) is 0.
The average height of a FPT is the 1+ the average of the average height of its root's descendants.
For example the average height of the tree in Pic. 2 is \(AH = 1 +\cfrac{1}{3}\cdot(2+2+1)=\cfrac{8}{3}\).
Observation: The average height of FPT is simply the expected number of pressing of the strategy it represents.
Definition: The set of FPTs that their initial state is \(i\) and end state \(j\) shall be denoted with \(T(i,j)\).
Definition: For any \(i\) and \(j\), \(OptT(i,j)\) is the set of all \(T\in T(i,j)\) such that for any \(U\in T(i,j)\) \(AH(T)\leq AH(U)\).
Theorem: For any \(i\) and \(j\), \(OptT(i,j)\) is not empty, that is there exists a optimal strategy.
I actually don't need this theorem, but it makes the proof much more easier. One way for proving this theorem is to define a certain topology on \(T(i,j)\)- such that this topology would be compact and the AH function would be continuous- Existence of a minima is immediate.
We have two important lemmas in to order prove our goal. The first one is the observation that you should not use 'deterministic' buttons before random buttons:
Lemma 1: Suppose that \(T\in T(i,j)\) and throughout the algorithm that \(T\) represents we press a deterministic 'prev/next' button that is represented by a node \(d\) of the tree. Suppose further that one of the descendants of this node is a random pressing ('rdm') node, then \(T\notin OptT(i,j)\).
proof: We can construct another tree \(U\in T(i,j)\) such that \(AH(U) < AH(T)\). The construction is the following: follow the route of node \(d\) untill we reach a 'rdm' node \(r\) (remember that all the buttons are deterministic, there is only one path) - To generate the new tree simply replace the node \(d\) with the node \(r\). (i.e throw away the deterministic path). Note that this new graph is still in \(T(i,j)\). Moreover, we only reduced one of its branches' height so the overall average height is decreased (see Pic 3).
Pic 3. Every decision tree that uses deterministic buttons before non-deterministic buttons could be reduced to a tree with a strictly smaller average height.
We conclude that with optimal strategy if we press a deterministic button then we must press deterministic buttons all the way till we reach our goal.
Lemma 2: Suppose that \(T\in T(i,j)\) and throughout the algorithm that \(T\) represents we reach a node \(v\) with state \(k\), then the sub-tree \(K\) that its root is \(v\) is a FPT and \(K\in OptT(k,j)\).
proof: The proof is straightforward. The average height of \(T\) should be some convex combination that one of its terms is the average height of \(K\) - if \(K\) is not optimal then \(T\) is not optimal.
Now we can glue everything together to prove the main goal:
Let \(T\in OptT(i,j)\) . We know from the previous post that pressing 'rdm' all the time is not an optimal strategy. Therefore there must be a node of the tree \(v\) where we press a deterministic button; Suppose that the state represented by the node is \(k\) - we know from Lemma 1 that all the decision following this node are deterministic all the wall until \(j\). We search for other occurrences of \(k\) in the decision tree. From Lemma 2 we know that if there is an occurrence then its sub-decision tree should be optimal, note that the pressing deterministic buttons all the way is also an optimal strategy! Therefore, for every vertex of \(T\) that represents state \(k\) (regardless of its branch) we can replaces its sub-tree with a deterministic branch and \(T\) should still be an optimal FPT.
Altogether we have that an optimal strategy should be of the form :
A. Press 'rdm' untill \( Distance(i,j) < f(n) \).
B. Press 'prev/next' to reach \(j\).
How should \(f(n)\) look like? The expected value of number of button-pressing \(N\) should be (see previous post) \(\mathbb{E}N = \cfrac{n}{2f(n)}+\cfrac{f(n)}{2}\).
Minimizing the above formula yields \(f(n) = \sqrt{n} \).
\(\square\)