## Morris: The Kate Bush Conjecture

Thanks to my second favourite math radio programme, More or Less, for this wonderful new insight into music and maths.

In her song, Pi, Kate Bush sings the first one-hundred and fifty or so digits of the celebrity number.

This is what she sang:

3.14159265358979323846264338327950288419716939937510582319749

44592307816406286208 8214808651328230664709384460955058223

You don’t need me to point out the wrong digits do you?  Good.  Then we can move on.

This has led to the Kate Bush Conjecture.  Since Pi contains an infinite sequence of digits which never repeat surely the sequence Kate sings must occur somewhere!  She never says she is starting at the beginning.

The Weak Kate Bush Conjecture says:

The sequence Kate sings exists somewhere in the decimal expansion of Pi.

The Strong Kate Bush Conjecture says:

Kate could have sung any finite sequence of digits and it would exist somewhere in the decimal expansion of Pi.

If Pi were a random sequence of digits then both conjectures are true.  But Pi isn’t random, it is a well-defined number so we can’t make any assumptions.  Instinctively I think it must be true, but that isn’t good mathematics, we need to prove it!

For example the following number is infinite and non-repeating but it doesn’t satisfy either conjecture:  0.01001100011100001111…

If the strong conjecture is true then every finite sequence exists in Pi.  And they each exist an inifinite number of times since they can occur in an infinite number of longer sequences.  Think about that, an infinite number of sequences each occurs an infinite number of times.

Everything that can be encoded digitally would occur within Pi.  That would include the complete works of Shakespeare, naturally, and also the note you left for the milkman last Tuesday, and those poems you wrote when you were five.

Every religious book, all cannons and all translations, both forwards and backwards.  Every prayer, every satanic chant and every children’s song.

That picture of the cosmic microwave background, the observations of Tycho Brahe and all of Kepler’s notes; and the results from CERN that will prove the Higgs-Boson (come on guys!)

It would include everything on your iPod, every episode of Math Factor and an alternative Math Factor with Groucho Marx.

Every album by Bob Dylan, or Kate Bush.  And everything you’ve sung in the shower.

It would include this post and all the comments you will make, or think about making.

It would include every thought and idea you have ever had, or ever will have, or ever could have.  (Gödel may have something to say about that)

And you thought Kate Bush was just a singer.

And you thought Pi was just a number.

1. ### strauss said,

May 26, 2009 at 5:35 pm

Hi folks, very amusing posts lately!

I thought the digits were worse than they are, until I realized my browser is hiding lots of them behind the sidebar!

Gosh, now I’d really like to know something about that idea; there certainly are some deep subtleties — like, what is randomness anyway, and what does _appearing_ to be random mean?

It can’t just be non-repeating-ness: the fibonacci sequence Edmund recently wrote about never quite repeats, but is highly structured.

—-

One very interesting idea of ‘randomness’ is the concept of ‘program’ complexity (Kolmagorav-Chaitin, perhaps Leibnitz, complexity) I have a great 90 minute interview with Greg Chaitin that I haven’t used, simply because it doesn’t fit into the Math Factor format. But in it, we discuss, at length, this idea:

Given a string, we can ask, what is it’s absolute shortest _description_, in some formalized language. So for example, a million 1’s is a very long string, but look at that, I just gave a very pithy description of it! Not very random, that.

But a scrambley, white noisy sort of string of 0’s and 1’s is not going to have a precise description that is much shorter than the string itself.

And indeed! This is the notion of randomness Chaitin et al adopt. (Chaitin himself gives priority to non other than the 17th century philosopher and mathematician Gottfried Leibnitz)

BUT then, what do we make of a sequence of digits like the decimal expansion of pi? Oddities of Chaitin’s perspective are that one can’t really _measure_ randomness, the decimal expansion of pi has a low complexity (a short program can generate the digits), and we can’t really say much about this problem.

Quite stimulating.

—-

One last point: Conway has remarked that one can’t _actually_ compute the digits of pi with much certainty!! When we calculate a constant like pi, we are actually making successive approximations, more and more finely, obtaining more and more digits. But what if, as could really occur, in principle, we see a long sequence of …99999’s At that moment, we just won’t know if the balance will tip and the 9’s will roll over into 0’s!! We won’t know what the digits of pi really are, until we get past that point.

There’s a remarkable sequence of six 9’s after the 10^-740’s place, but of course we know what comes next– the series doesn’t roll over.

2. ### Jonathan Lundell said,

May 26, 2009 at 7:01 pm

The Chaitin interview might not fit the broadcast format, but why not put it up as a podcast?

“Given a string, we can ask, what is it’s absolute shortest _description_, in some formalized language. … But a scrambley, white noisy sort of string of 0’s and 1’s is not going to have a precise description that is much shorter than the string itself.”

There are, though, a whole lot of noisy-looking strings that do have compact descriptions, say with some high-quality pseudo-random number generator. What would it take, I wonder, to prove that some number does not have a compact representation, for some definition of “compact”?

3. ### Stephen Morris said,

May 27, 2009 at 6:36 am

On randomness:

I once heard that six shuffles gives a very random shuffle but five doesn’t.  I wondered how you measured how random a deck of cards was.   I think the answer is that if the shuffling process gives an even probability distribution for all possible arrangements of the cards, then the shuffling process is very random.  Of course a random shuffle will have a 1/52! chance of arranging all the  cards in order of rank and suit, and that arrangement would then be random.

By extension you could say that Pi is ‘random’ if picking a string of digits of a certain length at an arbitrary starting position gave every possible sequence an equal chance of occuring.

On the radio they mentioned normal numbers and said Pi was normal in base 10.  I’ve just googled normal numbers.  This seems to be a similar idea, rather better thought through.  A normal number is one where the number of occurences of a particular sequence is inversely proportional to the number of possible sequences (you take the limit over longer and longer sections).
It’s easy to show that there are lots of normal numbers (I guess you could show that almost no numbers weren’t normal), but difficult to show that any particular number is normal.
As far as I can tell no-one has proved that Pi is normal but it is believed to be.   The same is true for sqrt(2) and e.

A weaker condition is disjunctive, which requires every possible sequence to appear but not necessarily evenly.  That is the Strong Kate Bush Conjecture.

4. ### jyoak said,

May 27, 2009 at 12:24 pm

I suspect that the claims about six shuffles really just amount to saying that patterns which are easily recognized by humans unassisted by computers are effectively eliminated.  I’ve heard the claim in the form that the deck is random with six “perfect shuffles” meaning that the deck is broken exactly in half and then re-combined with perfect alternations.  Obviously this isn’t meaningful in any mathematical sense of randomness.  You could argue the entropy in the shuffle is introduced through pressure being unevenly applied such that one or two or three or whatever cards would be taken from each division during the recombining stage and then assuming that to be random by studying how many shufffles are required to gain a uniform distribution, but if someone has done either such an empirical study or the analysis, I haven’t heard of it.

This has an interesting ramification on tournament duplicate bridge.  In duplicate bridge, the major element of luck is removed like this:  You have a number of tables, let’s say ten, with 20 partnerships.  10 sit “east-west” and the others sit “north-south.”  When everyone sits down, about four decks of cards are shuffled and put into trays called “boards” that keep the hands distinct.  Those hands are played and scored, but at the culmination of every hand, the cards are returned to the board with the hands retained.  After all tables are done, the trays are passed to the next (let’s say) lowered number table and the east-west pair stands and goes the next higher-numbered table.  (Sometimes frills are necessary to make this work out perfectly, obviously.)

The idea is that each east-west pair plays all or nearly all of the north-south pairs and each partnership plays all, or nearly all, of the boards.

Instead of tallying the raw scores obtained in various hands, you are scored in comparison to other players holding the same cards as you at different times.  You get one point for every player you out-did and a half point for every one that you tied.  Thus if each board was played 9 times, and you did the very best of any partnership when holding those cards, you’d get an 8.  Average is a 4, etc.  You then get a winner east-west and a winner north-south, and even an overall winner in the sense that one partnership beat their peers by the largest margin.

OK… all that for this:  This works very well with 40 or so folks.  But at large tournaments, they want to compare thousands of people.  Starting in the 80’s they did this by generating random hands on a computer.  Then instead of shuffling, racking boards and playing at the first table, the players would make the hands what a printout said they should be, rack the boards and immediately pass the boards to the next table.  At that point, they’d start playing.  This way, though you would only face 10 or so partnerships, you could be scored against an arbitrarily large group of similar pairs.

The problem emerged that the pseudo-random hands from the computer tend to have dramatically different characteristics from manually shuffled hands.  The most easily noticed is that you tend to have less even distribution of suits in each hand.  Distributions of cards in suits started less frequently containing 4-3-3-3 and 4-4-3-2 and such would more often contain 6- and 7-card suits.  Since the nature of your hand is communicated to partners by bidding conventions which are very old, the bidding practices were suddenly slightly less optimized for communication.  Differentiating between 3 and 4 and 5 card suits is well handled in standard bidding systems.  Less so for 6 and almost never with 7.

More naive players believed the computer hands to be non-random or gimicked to make “interesting” hands.  Less naive players came to appreciate how ineffective manual shuffling was at generating randomness, at least as practiced.

I was a competitive player back then and haven’t played at all in almost 20 years.  I wonder if bidding practices have changed slightly to adjust to more random distributions.

5. ### jyoak said,

May 27, 2009 at 12:27 pm

About the strings of nines, I recall a start that Richard Feynman memorized pi out to a sufficient length to get a string of nines (maybe four?) so that he could recite it and end with “nine-nine-nine-nine and so on.”  :-)

6. ### jyoak said,

May 27, 2009 at 12:45 pm

I guess it is silly to be making these all separate comments as this isn’t threaded anyway, but old habits die hard… and it trades my looking long-winded for being prolific, which might be an improvement.  :-)

Stated informally, in computing what we’re usually after with random numbers is something that generates numbers in a range with uniform distribution obtained from a source of entropy which is both arbitrary to what we’re doing and impossible (we hope!) for someone else to guess (particularly when cryptography is the application, which is often.)

All sorts of sources have been used for this.  One of the common ones is mouse movements.  When you need a strong, large random number in generating a cryptographic key, you’ll often be invited to wiggle your mouse about.  Particulars of timing and direction are retained and then formulae are applied to this data to smooth it to give a properly uniform distribution.  Because these algorithms are so chaotic, and the exact details of your mouse movement would be so impossible to guess, the result is a good random number.  Another common source is measuring exact voltage levels which is impossible to guess on dirty power.

Perhaps the most interesting application of all of this that I know about is the online casinos.  Sites like PartyPoker and PokerStars “shuffle” the deck many millions of times per day.  This has to be extremely random and is in very high volume.  They maintain physical devices that measure levels of background radiation and use this as an entropy source to be flattened out and made uniform.  This is probably overkill, but because their customer base is very concerned about it and their many-billion-dollar businesses depend on the faith in that randomness, they do it.  It would be very hard, even if you knew how they make the distribution uniform, to guess exact levels of radiation in their server locations in the Isle of Mann.  :-)

7. ### Andy said,

May 28, 2009 at 2:12 pm

The more interesting point with PartyPoker et al is that they can’t really prove to you that it’s being done randomly. It would be easy for them to benefit some select player just a little bit, such that it would be almost impossible to notice (since it’s too hard to gather large amounts of data).

I wonder why this doesn’t bother more people?

8. ### Sue VanHattum said,

September 4, 2009 at 9:58 am

Thanks for an intriguing discussion of random number generators. I’ve used the random number generators in Excel and on my TI calculator. It always bothers me when I see the same sequence show up on all my students calculators.

For example, on my Ti-83 plus: Clear all ram, rand gives .9435 then .9083. Repeat, and you get the same two numbers. Every computer (including calculators) has a clock chip in it, right? Why can’t they use the clock chip somehow to get a random starting point? I don’t know enough about either random number generation or programming at this level to understand why that would be difficult.

9. ### Patrick Stein said,

September 21, 2009 at 10:15 am

Sue, The random number generators used in most computer applications are Pseudo-Random Number Generators.  They start with some fixed “seed” and algorithmically generate the next random number from there.  Some systems start with a more variable seed, like the number of seconds since 12:59:59pm on December 31st, 1969 or the number of milliseconds since the computer last rebooted or some information gathered based on network events, keystrokes, and disk accesses.  Generally though, they tend to start with a fixed seed, but give you the option of setting it initially yourself.
This is actually very convenient for debugging purposes.  One could argue that it would be better if the default were for it to get you started fairly randomly and let you reset it if you need some repeatable sequence.  But…
It looks like on the TI-83, you can assign any number you like to the “rand” variable.  (See page 93 of this manual: http://education.ti.com/downloads/guidebooks/graphing/83p/83m\$book-eng.pdf).  I would recommend having your students seed it with their birthday or something if you want them to have different sequences.

10. ### Sue VanHattum said,

October 10, 2009 at 11:14 pm

Thanks, Patrick.
>One could argue that it would be better if the default were for it to get you started fairly randomly and let you reset it if you need some repeatable sequence.
That’s the way I think it ought to be. Folks who know the least will see a random sequence when they ask for it then.