## Morris: World of Britain 2: Proof and Paradox

In working out the proof for World of Britain I came across a paradox.  Maybe smarter Math Factorites can help me out?  My sanity could depend on it.

In the puzzle you have five different tasks.  On each day one of these tasks is given at random.  How long do you expect it to take to get all five tasks?

First consider a simple case.  Suppose some event has a probability, p, of happening on any one day.  Let’s say that E(p) is the expected number of days we have to wait for the event to happen.  For example if p=1 then the event is guaranteed to happen every day and so E(p)=1.

How can we calculate E(p)?

Andy does an experiment.  He will wait for the event to happen and record how many days it took.  He will do this several times, for long enough to ensure that he gets an answer that is as accurate as he needs.  He will keep going for N days in total.  Afterwards he will take the average of all of his wait times to get an estimate for E(p).

The average he calculates is the total of all the wait times divided by the number of occurrences of the event.  But we can estimate both of these values and therefore estimate his value for E(p).  The total of all of the wait times is going to be about N.  Since the event has a probability of p of occurring on any particular day the number of occurrences will be about p times N.

So Andy’s average will be about N/(pN) which will be about 1/p.

We can make N as large as we like to make this result as accurate as we like.  So we can confidently say that E(p) = 1/p.

Let’s get back to our puzzle about tasks.  We need to wait for the first task, then the second, then the third and so on.  When there are t tasks left then the chance of getting a new one is t/5.

So the total waiting time is

E(5/5) + E(4/5) + E(3/5) + E(2/5) + E(1/5) = 5/5 + 5/4 + 5/3 + 5/2 + 5/1

= 5(1/5 + 1/4 + 1/3 + 1/2 + 1) = 11 5/12 = 11.4166666… days

You may recognise the harmonic series!

So that’s the proof, now where’s the paradox?

Bob visits Andy when he is able and waits with him until the event occurs.  Of course Andy may well have been waiting for some time.

If Bob turns up just after the event has happened then he will wait for the same time as Andy.  If he turns up just before the event he will wait for one day.  On average he will wait for about half the time that Andy does.

When the event occurs Bob disappears and comes back when he is next able to.

At the end of the experiment Bob averages all of his wait times to get an estimate for E(p).  He gets an answer which is half of Andy’s!

Now Carol thinks she understands what is going on here.  The problem is that Andy is distorting his results by always starting the clock straight after an event has occurred.  That guarantees him to get the longest possible wait times.

She thinks the only way to get an accurate answer is to look at the wait time starting from each of the N days and then average these N wait times.

To calculate this she needs to make an assumption.  She knows that the event occurs every 1/p days on average.  She assumes they happen regularly every 1/p days.

Let’s say they happen every n days where n = 1/p.  Remember Andy’s value for E(p) is 1/p = n.  Bob’s value is half that, so about n/2.

The wait time will vary between 1 and n.  The average wait time will be n(n+1)/2n = (n+1)/2.

So Carols estimate for E(p) is about n/2 whereas Andy’s was 1/p = n.  So Carol is agreeing with Bob.

I can tell you that Andy had the right value and that the logic I used for Bob and Carol was flawed.

Can you see where?

1. ### Sean said,

August 17, 2009 at 5:55 pm

You assume that Andy and Bob are counting the same events.  Bob might miss some of Andy’s events.  For example, let’s say Bob shows up for the first time on day 3.  Andy witnessed event A on day 1 and event B on day 2 when Bob wasn’t there.   Event A occurs again on day 3 in the presence of both Andy and Bob.  This is the first occurrence of event A for Bob, but the second for Andy.  Therefore it is a non-event for Andy.  Andy ignores it and keeps waiting.  Bob on the other hand counts this new event and then leaves.  When Andy is done counting all five unique events, Bob may still have 2 or 3 or even 5 to go.
If you stop Bob’s clock when he is not present you can see that from his perspective, the calculation of his average wait time is the same as Andy’s.
There is also a second mistake in your reasoning.  For the moment, forget that Bob and Andy may be counting different events, let’s just count Andy’s events.  It is not necessarily true that over the long-run, Bob will on average appear mid-way between Andy’s observation of events.   It depends on how you define the process that generates Bob’s random appearances.  It is not enough to say that Bob appears at random times – you have to be more specific.  For example, if Bob decides to reappear every time an atom of Uranium-238 decays somewhere in the universe, he will always reappear almost immediately after he leaves.  So his average wait time will be very close to Andy’s.   (He will also be very unlikely to miss any of Andy’s events).
If on the other hand, the random process causes Bob to reappear a little less frequently, like every time that a boy is born and named Robert in California (about once every 1.5 days), he will be more likely to appear somewhere near the mid-point between consecutive observances as you assumed, though he might also miss some of Andy’s events.
If the random process is very long like say watching a particular atom of Uranium-238, then he will probably miss all of Andy’s events.

2. ### Sean said,

August 17, 2009 at 6:10 pm

I used a bad example.  The name Robert occurs every 1.5 days so Bob would still be showing up fairly soon after he left as compared with Andy’s average wait time of over 11 days.
Let’s change the name in my second example to Ernest.  There’s one born about every 11 days in California which is closer to Bob’s average wait time.

3. ### Stephen Morris said,

August 18, 2009 at 3:29 pm

I like your thinking Sean.  Have we come up with a new way of detecting nuclear tests I wonder?  I never knew the true importance of being Ernest!

You’re right, I should clarify exactly what Andy and Bob are up to.

In that example there is only one event, not the five in the main problem.  The event can only happen once each day, lets say it always happens at noon.  Bob always turns up at 10am.  Andy waits for the full duration of the test regardless of how many times the event happens.

How does Bob decide when to turn up?  Okay, let’s go with your idea that he turns up when an Ernest is born in California.  Are people born Ernest or are they named later?  I was going to go with christenings but that excludes people who aren’t christened.  Let’s say Bob checks the births section in the Californian press each morning and visits Andy when he reads the anouncement of a brand new Ernest.

The answer to the puzzle is the paradox is that Bob is more likely to turn up during one of Andy’s long waits than during one of his shorter ones.  On average Bob will wait half as long as Andy for any event that Bob sees, but he will miss a lot of events that Andy sees which follow on quickly from the previous event.  I have done the maths and this does work out.  Unlikely as it seems Andy and Bob will have the same average wait time.

4. ### Sean said,

August 19, 2009 at 12:25 am

Thanks.  The clarification is helpful, and it seems like you have worked out the right explanation for the apparent paradox.  But there are still a couple of points you made that probably should be discussed further.

“The event can only happen once each day, let’s say it always happens at noon.  Bob always turns up at 10 a.m.”

This throws me off a little.  So Bob’s appearances and the events are each randomly occurring on 24-hour cycles, but slightly shifted.  This is the same as if Bob’s random appearances always occur at noon (i.e., in sync with the random events) but then we always add two hours to his wait time.  So that causes a problem right there.  Bob’s average wait time is going to be 2 hours longer than Andy’s.  What if Bob has a brother who always shows up at 9:30?  Then he will observe exactly the same events that Bob observes since those can occur only at 12:00 but his wait time will be 1/2 hour longer than Bob’s every time.  I think you have to make the assumption that the time of day is the same for both Bob’s appearances and the events, or alternatively, that the probabilities of Bob appearing or an event occurring are equal at all times throughout the day.

“…Bob is more likely to turn up during one of Andy’s long waits than during one of his shorter ones.”

This seems obvious, but might be a little trickier than it seems.  Yes, a longer wait for Andy means a larger window of opportunity for Bob, but if there are more short intervals than long intervals Bob might be more likely to show up during a short period.  For example, let’s say that for the first 100 days there are 20 intervals of 5 days each.  And say the next 100 days is just one long interval.  Bob may be just as likely to show up during several short intervals as he is during the one long interval.  But, you say, that is just because I constructed this artificially skewed distribution of events.  The thing is, waiting times for independent events are usually said to follow what is known as a Poisson distribution which is inherently skewed.  This means that if you pick an arbitrary regular period of time, like months or years, and count the random events in each period, you will observe that more periods will contain an above-average number of events, and fewer have a below-average number of events.  In other words, there will be more short intervals than long intervals.  So technically I think your statement may not be true but your reasoning was basically sound.  It’s not that Bob hits more long intervals than short intervals.  It’s that Bob misses more short intervals than he does long intervals.  Hope the distinction I’m trying to make is not too confusing.

“On average Bob will wait half as long as Andy for any event that Bob sees…”

Maybe.  This is true over the long run if the rate for Bob’s random generator is as long or longer than the rate for the event random generator, which I think is what you intended.  If Bob showed up more frequently he would usually show up somewhere in the first half of Andy’s wait, like in my first example from yesterday.  You were clear about using the Ernest random number generator for Bob, and I think you meant for that to correspond with the average rate of occurrence for the events also.

5. ### Stephen Morris said,

August 19, 2009 at 12:09 pm

Thanks Sean,

Gosh, I can see I’ll have to be much more careful in my future puzzles.

Andy and Bob are counting whole days.  Really they are counting the number of ‘slots’ in which the event could occur.  So if Bob turns up at 10am and the event occurs at noon the same day then he would count 1 day.  If it helps we can have Bob turn up just after noon so his waiting time will actually be whole days.

This puzzle is based on ‘World of Warcraft’.  In that game there are ‘quest givers’ who have five different quests to give out.  The quest is chosen at the start of each day, i.e. at midnight, and is available throughout that day.  A player can take the quest at any time during the day but can only take one quest each day.  I found myself wondering how long it would take me to get all the quests, and that turned out to be more interesting than actually playing the game!

A different puzzle would be to have the event occur at any time and measure the actual time Andy and Bob wait.  In this case we could have multiple events in a single day.  As you say this is modelled by the poisson distribution.

You are right that a short wait will occur more often.  I used the phrase ‘one of’ to get round this.

As you point out the actual distribution of Andy’s wait times (the skew) makes a big difference.  Independant events will have a particular distribution.  Any other distribution indicates some connection between the events and in this case Bob will get a different answer to Andy.

I have calculated the distribution of independent events and put them in a spreadsheet which you can access here.

I assume we do the experiment over 10,000 days with the event occuring every five days on average.

In the third tab, ‘Calculating the Values’, r is the number of days that Andy waits.  Column D is the number of times that Andy will wait for this many days.

In the first tab, ‘Large Example’, I use these values to calculate Andy and Bobs’ expected wait times.  For Bob I calculate the probability of him appearing during Andy’s wait and what his wait is likely to be.  Note that I am adding up ‘expected wait times’ for Bob, not actual wait times.

At the top I have Andy and Bobs’ wait times which are 4.975 and 4.913 respectively.  They are slightly out partly becuase of rounding and partly because I only go up to an Andy wait time of 30 days.  To be truly accurate I should go up to infinity.

On the third point:

Yes, I am assuming that Bob turns up much less often than the event.  It is important each time Bob turns up is clearly not connected to the last time he turned up.