Of Ravens and Writing Desks - על עורבים ושולחנות כתיבה : December 2014

Friday, December 12, 2014

The mathematics of listening to songs - Part 2

In this post I will prove that the strategy that is presented in the previous post

is indeed an optimal strategy in the sense that it minimize the expected number

of button-pressing for reaching a certain song.

One of the ways to show this is using the model of decision tree. Every algorithm/strategy that has a final number of states (in our case the states represented by the number of the song) and a final number of decisions to make (in our case there are three decisions in every state - pressing 'prev'/'next'/'rdm') can be presented as a rooted tree.

Pic 1. Directed Rooted Tree

'tree' means a connected graph with no cycles.

'rooted' means that there is a special node (vertex) in the tree that we consider to be the common ancestor of all other nodes. Rooting a tree induces a natural direction on the tree edges: from the root to its neighbors, from its neighbors to the rest of their neighbors and so on... (see pic 1).

For example: consider the case where there are only three songs, we start from song no. 1 and want to reach song no. 3. We decide to press 'rmd' one time and then to 'prev'/'next' until we reach the song.
then the desired decision tree will look like :

Pic 2. Simple Decision Tree.

I have decided to color random decision with red and deterministic decision with blue;

Also note that random-decision nodes have 3 descendants (representing all the nodes that the random pressing might lead to) while deterministic-decision nodes have only one descendant.

Before I start the proof I need to present a few definitions:

Definition: A decision tree that represent a successful strategy for reaching a song (will reach the song with a finite expected number of decisions) shall be called Finite Pressing Tree (FPT).

Definition: The average height (AH) is defined in the following recursive manner:

The average height of a leaf (node with no descendants) is 0.

The average height of a FPT is the 1+ the average of the average height of its root's descendants.

For example the average height of the tree in Pic. 2 is \(AH = 1 +\cfrac{1}{3}\cdot(2+2+1)=\cfrac{8}{3}\).

Observation: The average height of FPT is simply the expected number of pressing of the strategy it represents.

Definition: The set of FPTs that their initial state is \(i\) and end state \(j\) shall be denoted with \(T(i,j)\).

Definition: For any \(i\) and \(j\), \(OptT(i,j)\) is the set of all \(T\in T(i,j)\) such that for any \(U\in T(i,j)\) \(AH(T)\leq AH(U)\).

Theorem: For any \(i\) and \(j\), \(OptT(i,j)\) is not empty, that is there exists a optimal strategy.

I actually don't need this theorem, but it makes the proof much more easier. One way for proving this theorem is to define a certain topology on \(T(i,j)\)- such that this topology would be compact and the AH function would be continuous- Existence of a minima is immediate.

We have two important lemmas in to order prove our goal. The first one is the observation that you should not use 'deterministic' buttons before random buttons:

Lemma 1: Suppose that \(T\in T(i,j)\) and throughout the algorithm that \(T\) represents we press a deterministic 'prev/next' button that is represented by a node \(d\) of the tree. Suppose further that one of the descendants of this node is a random pressing ('rdm') node, then \(T\notin OptT(i,j)\).

proof: We can construct another tree \(U\in T(i,j)\) such that \(AH(U) < AH(T)\). The construction is the following: follow the route of node \(d\) untill we reach a 'rdm' node \(r\) (remember that all the buttons are deterministic, there is only one path) - To generate the new tree simply replace the node \(d\) with the node \(r\). (i.e throw away the deterministic path). Note that this new graph is still in \(T(i,j)\). Moreover, we only reduced one of its branches' height so the overall average height is decreased (see Pic 3).

Pic 3. Every decision tree that uses deterministic buttons before non-deterministic buttons could be reduced to a tree with a strictly smaller average height.

We conclude that with optimal strategy if we press a deterministic button then we must press deterministic buttons all the way till we reach our goal.

Lemma 2: Suppose that \(T\in T(i,j)\) and throughout the algorithm that \(T\) represents we reach a node \(v\) with state \(k\), then the sub-tree \(K\) that its root is \(v\) is a FPT and \(K\in OptT(k,j)\).

proof: The proof is straightforward. The average height of \(T\) should be some convex combination that one of its terms is the average height of \(K\) - if \(K\) is not optimal then \(T\) is not optimal.

Now we can glue everything together to prove the main goal:

Let \(T\in OptT(i,j)\) . We know from the previous post that pressing 'rdm' all the time is not an optimal strategy. Therefore there must be a node of the tree \(v\) where we press a deterministic button; Suppose that the state represented by the node is \(k\) - we know from Lemma 1 that all the decision following this node are deterministic all the wall until \(j\). We search for other occurrences of \(k\) in the decision tree. From Lemma 2 we know that if there is an occurrence then its sub-decision tree should be optimal, note that the pressing deterministic buttons all the way is also an optimal strategy! Therefore, for every vertex of \(T\) that represents state \(k\) (regardless of its branch) we can replaces its sub-tree with a deterministic branch and \(T\) should still be an optimal FPT.

Altogether we have that an optimal strategy should be of the form :

A. Press 'rdm' untill \( Distance(i,j) < f(n) \).

B. Press 'prev/next' to reach \(j\).

How should \(f(n)\) look like? The expected value of number of button-pressing \(N\) should be (see previous post) \(\mathbb{E}N = \cfrac{n}{2f(n)}+\cfrac{f(n)}{2}\).

Minimizing the above formula yields \(f(n) = \sqrt{n} \).

\(\square\)

Thursday, December 11, 2014

The mathematics of listening to songs - Part 1

"What is mathematics?" The Professor asked us in the first class of Set Theory. None of us could find a satisfying answer. Finally he replied : "Mathematics is the study of patterns".

What particularly nice about this definition is that it suggests that one could find mathematics in many areas, not only in science and engineering but also in our every day life - if only we will listen and pay attention for the right clues; I would like to post here an example where an every daily boring trip with a car can be mathematically fascinating.

Suppose you are driving your car alone in the night listening to your favorite songs in your mp3 player. You are listening now to song number 32 ('Ramona' by Beck Hansen from the Scot Pilgrim's soundtrack) the song is over and you are in the mood for song number 103 ('39 by Queen).

A conventional mp3-player has three buttons for controlling the songs flow-

'prev' button, 'next' button and a 'rdm' (random) button. (see pic)

Assuming that you have a circular list of 300 songs in the mp3-player, how should you press the button to reach the song?

The obvious way is to press 'next' button 71 times but remember that you are a safe driver and want to avoid accidents. Pressing the same button so many times might be dangerous (and tedious). Is there a strategy that allows you to press buttons as little as possible?

Close inspection of two strategies

In order to understand what do I mean by 'as little as possible', let us analyze two strategies:

Suppose that we have a circular list of \(n\) songs. Assume that we are now in song \(i\) and that we want to reach song \(j\).

Strategy 1 - Pressing 'prev/next' to reach j.
Since the list is circular there are two paths between any two songs - and the maximal distance should be \(\cfrac{n}{2}\). Now, assuming that \(i\) and \(j\) are uniformly distributed (i.e are arbitrary) their distance should be uniformly distributed between \(1,...,\cfrac{n}{2}\) (not counting the case where \(i=j\)). This would give us an expected number of \(\cfrac{n}{4}\) pressings.

Strategy 2 - Pressing 'rdm' until the destination.
This strategy is also easy to analyse - In each press, the odds of reaching the desired song are \(p=\cfrac{1}{n}\). The number of button-pressing is a geometric random variable with parameter \(p\), it is known that its expected value is \(\cfrac{1}{p}=n\).

Note that the number of button-pressing in both strategies yield a linear result that is, they belong to the class of \(O(n)\).

Now we can state the problem: Does there exist a strategy that yields an expected number of button pressing that is asymptotic better than \(O(n)\)?
.
.
.
The answer to this problem is yes!

Optimal Strategy -
A. Press 'rdm' until \(Distance(i,j)<\sqrt{n}\).
B. Press 'prev/next' to reach j.

Let us calculate the expected time for this strategy: Suppose \(T\) is the number of button-pressing for reaching song \(j\). Note that \(T\) is a random variable and that we can decompse it to two random variables \(T=R+U\), where:
\(R\) is the number of button-pressing for reaching the set \(A=\{k\in{1...n}|Distance(k,j)\leq\sqrt{n}\}\) (note: \(|A|=2\sqrt{n}\)).
\(D\) is the number of 'prev/next' button-pressing.

The probability for reaching the set \(A\) each time we press the 'rdm' button is \(p=\cfrac{|A|}{n}=\cfrac{2}{\sqrt{n}}\) - hence \(R\) should be a geometric random variable and \(\mathbb{E}R=\cfrac{1}{p}=\cfrac{\sqrt{n}}{2}\).

After reaching the set \(A\) we can be in any one of the songs \(j-\sqrt{n},..,j+\sqrt{n}\) and the probability to be in any one of these songs is uniform, so \(D\) should be uniformly distributed in the set \(\{1,...,\sqrt{n}\}\}\) and,as the case of strategy 1, \(\mathbb{E}D=\cfrac{\sqrt{n}}{2}\).
altogether \(\mathbb{E}T=\mathbb{E}R+\mathbb{E}D=\sqrt{n}\), which is \(O(\sqrt{n})\).

Proving that this strategy is optimal is quite hard and technical. I will discuss it in my next post.

Saturday, December 6, 2014

אות חיים

אז חזרתי עוד לכתוב כאן מתוך שעמום.

עשיתי כאן טיפה שינויים כי אני רוצה שהפרסומים שלי שעוסקים במתמטיקה יהיו גם נגישים לקורא באנגלית.

מבדיקה זריזה גיליתי שהשם 'ravens and writing desks' הוא לא כל כך מקורי - יש לפחות ארבעה בלוגים שונים שנושאים את השם הזה או וריאציה כלשהי שלו. מצד אחד זה מעודד שיש כל כך הרבה מעריצים של לואיס קארול, אבל מצד שני זה מבאס בגלל אובדן היחודיות. שקלתי לשנות אותו אבל אז הבנתי שאני פשוט לא מסוגל לחשוב על שם אחר שמתאים יותר.

זה הזכיר לי שלפני שש שנים בערך אימצתי מ'תנו לחיות לחיות' כלבה בשם 'סנדי'. כשהגיע הרגע לשנות את השם שלה ממש רציתי לקרוא לה 'קוסט' (Coset) מכמה סיבות:

א. קוסט זה שם של אובייקט מתמטי מ'תורת החבורות'.

ב. קוסט (Cosette) זה גם שם של אישה בצרפתית. זה ידוע לי כי זה זה שם של אחת הגיבורות ברומן הידוע 'עלובי החיים' של ויקטור הוגו.

בסופו של דבר אני שמח שהחברים שלי עצרו אותי (כלומר, צחקו עליי מספיק) כדי שאני אבחר בשם הזה ובסוף נשארתי עם השם הסטנדרטי, 'סנדי'.

אז הנה שוב אני כותב כאן כאיזה סוג של תרפיה בבלוג שמספר האנשים הכולל שקרא אותו כנראה אינו עולה על מספר האצבעות ביד של בארט סימפסון. שיהיה לי בהצלחה.

Monday, December 1, 2014

Picking marbels from a jar - rigorous proof

Hurray! I have returned :)

Let's start from an easy one:
The following is a well known puzzle. I have encountered it several times in my life.

(cited from:
http://www.techinterview.org/post/526363745/red-marbles-blue-marbles/)

"You have two jars, 50 red marbles, 50 blue marbles. you need to place all the marbles into the jars such that when you blindly pick one marble out of one jar, you maximize the chances that it will be red."

Solution:
The solution to this puzzle is to throw one red marble to one of the jars (say, w.l.o.g, jar A) and to throw all other marbles (49 reds and 50 blues) to the other jar (jar B). If we do so the probability to pick a red marble is:

\(P(picking\, a\, red\, marble)=\)

\(P(picking\, a\, red\, marble|choosing\, jar\, A)\cdot P(choosing\, jar\, A)+\)

\(P(picking\, a\, red\, marble|choosing\, jar\, B)\cdot P(choosing\, jar\, B)=\)
\(1\cdot\frac{1}{2}+\frac{49}{99}\cdot\frac{1}{2}\sim0.74\)

This solution is also well known and easy to come up with, however most of the places and web sites that solve this puzzle fail to post a convincing proof for this solution. Indeed trying to find a rigorous explanation for it might be seem a little tedious, but it is essential -from my experience facts that seem 'obviously true' can fall apart easily with close inspection.

Proof of Solution
The method of the proof is the following:
We start with any ordering of the marbles in the jars (we call this ordering 'ordering 1'( and show that the probability to draw a red marble in this ordering will always be smaller or equal than the probability to draw a red marble with the ordering of the solution (we call this ordering- 'ordering 2').

We use the following notations:
(These notation are all with respect to ordering 1)
The number of blue marbles in jar A - \(\textrm{N}_{blue}^{A}\)

The number of red marbles in jar A - \(\textrm{N}_{red}^{A}\)

The number of blue marbles in jar B - \(\textrm{N}_{blue}^{B}\)

The number of red marbles in jar B - \(\textrm{N}_{red}^{B}\)

W.l.og - we assume that \(\textrm{N}_{red}^{A}\leq \textrm{N}_{red}^{B}\).
We also need to assume that \(0<\textrm{N}_{red}^{A}\). Because if one of the jar doesn't have any red marbles then the probability of getting a red marble is \(\frac{1}{2}\cdot\frac{50}{50+\textrm{N}_{blue}^{B}}\leq\frac{1}{2}\) which is smaller than the probability of our solution.

Now let us move all the blue marbles from jar A to jar B. We call this new ordering - 'ordering 3'.

Lemma 1

First I will show that :
\(\frac{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}\geq\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}}\) (1)

Proof of Lemma 1
This inequally is valid if and only if both the l.h.s and the r.hs. complementaries to 1 will satisfy the opposite inequallity:

\(\frac{\textrm{N}_{blue}^{B}}{\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}\leq\frac{\textrm{N}_{blue}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}}\).

Note that the denominator of the l.h.s is greater than the denominator of the r.h.s- hence this inequality holds.

We will show that:

\(P(picking\, a\, red\, marble\, with\, ordering\, 1)\leq P(picking\, a\, red\, marble\, with\, ordering\, 3)\)

Proof:

\(P(picking\, a\, red\, marble\, with\, ordering\,1)=P_{1}=\)

\(\frac{1}{2}\cdot\frac{\textrm{N}_{red}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}+\frac{1}{2}\cdot\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}}\)

\(P(picking\, a\, red\, marble\, with\, ordering\,3)=P_{3}=\)

\(\frac{1}{2}\cdot1+\frac{1}{2}\cdot\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}\)

We take the difference between \(P_{3}\) and \(P_{1}\) and get :

\(P_{3}-P_{1}=\)

\(\frac{1}{2}\cdot\left(1-\frac{\textrm{N}_{red}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}\right)+\frac{1}{2}\cdot\left(\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}-\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}}\right)\)

\(=\frac{1}{2}\cdot\left(\frac{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}-\frac{\textrm{N}_{red}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}\right)+\frac{1}{2}\cdot\left(\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}-\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}}\right)\)

\(=\frac{1}{2}\cdot\left(\frac{\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}\right)+\frac{1}{2}\cdot\left(\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}-\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}}\right)\)

\(\geq\frac{1}{2}\cdot\left(\frac{\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}\right)+\frac{1}{2}\cdot\left(\frac{\textrm{N}_{red}^{B}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}-\frac{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}\right)\)

Where the last inequality holds because of Lemma 1 ( we subtract a smaller quantity). Carring on :

\(=\frac{1}{2}\cdot\left(\frac{\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}\right)-\frac{1}{2}\cdot\left(\frac{\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}\right)\).

Now all we need to show is that :\(\left(\frac{\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{A}+\textrm{N}_{blue}^{A}}\right)\geq\left(\frac{\textrm{N}_{blue}^{A}}{\textrm{N}_{red}^{B}+\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}}\right)\).
but this inequality holds since \(\textrm{N}_{blue}^{B}+\textrm{N}_{blue}^{A}\geq\textrm{N}_{blue}^{A}\), and \(\textrm{N}_{red}^{B}\geq \textrm{N}_{red}^{A}\) because of our assumption. Therefore the denominator of the l.h.s is smaller than the denominator of the r.h.s.

We have proven the 'hard part'. We are only left with the relatively easy part of proving that moving from ordering 3 to ordering 2 can only increase the odds of getting a red marble. I leave this proof for the reader.