Science Forums

# The Starburst Challenge

## Recommended Posts

In 1968, in the SF story “Gold at the Starbow's End" (which was expanded in 1983 to become the novel “Starburst”), the late Fred Pohl makes an interesting mathematical conjecture.

With its fictional context parenthesized, here is what I term “the Starburst Conjecture”:

1) A lengthy, meaningful text is written in everyday, understandable English. (the text, presumable several tens of pages long, explains how to solve many pressing engineering problems, including controlled, small-scale fusion power)

2) This text is encoded into a large integer using an easily decodable scheme (“godelization” – see below)

3) This integer is written as an expression using ordinary arithmetic operators. The expression is much shorter than the original text. ( The “Starburst number”, (3.875*12^26)! + 1973^854 +331^852 + 17^2008 + 3^9606 + 2^88 – 78, where “!” is the factoral unary operator, e.g: 5! = 1*2*3*4*5 = 120)

According to accepted information theory, this may be impossible. Roughly stated, information theory says that an arbitrary random integer cannot be “compressed” into an shorter integer enough that the length of the compressed integer + the length of the program required to decompress it back into the original integer are less than the original integer. This is commonly called “the Compression Challenge”, and has withstood decades of attempts to refute it. It’s important to note that the Compression Challenge doesn’t say that no integer can be compressed, just that one with a lot of randomness – and thus, a lot of information, can’t. For example, the integer 10^(10^100), a huge integer, can obviously be described by this very short expression, and, like the Starburst Number, decompressed using a program consisting of just the rules of arithmetic. The integer in the Starburst Conjecture is not an arbitrary random number – the author can carefully compose and tweak the text and encoding scheme that generates it - but neither is it as “easy” number like 10^(10^100).

Here’s my challenge to number-crunching-inclined Hypographers: prove the Starburst Conjecture by providing an example of any meaningful English text, encoding scheme, and expression satisfying the rules. Strange spacing, punctuation, capitalization, and spelling is OK. All that’s required to win is that the final expression be many times shorter than the original text. Of course, a formal proof or disproof wins, too.

PS:

1) Nobody claims that Pohl’s “Starburst number” can really be expanded into a document containing the secret of controlled fusion, or any readable text – it’s fictional, intended for illustration purposes only. This is not a challenge to discover any hidden information in the number (3.875*12^26)! + 1973^854 +331^852 + 17^2008 + 3^9606 + 2^88 - 78.

2) “Godelization” is a symbol-to-integer encoding scheme. A strength is that it can be used with any number of symbols, which you don’t have to know in advance. A weakness is that it quickly generates very large integers. I neither recommend not don’t recommend using it for this challenge, but include it in this PS to provide background on Pohl’s original example.

It works as follows:

1. Beginning with 1, assign a number to each symbol used. For example, 1=SPACE, 2=A … 27=Z, 28=PERIOD is a valid assignment. Any order can be used, and new symbols may be added as needed.

2. Initialize G = 1

3. For the first symbol, multiply G by the first prime raised to the power of the symbol (G=G*2^symbol).

4. Repeat for the 2nd symbol and prime, and so on, until all symbols in the message have been encoded.

Example: “HELLO WORLD” encodes to 2^9*3^6*5^13*7^13*11^16*13*17^24*19^16*23^19*29^13*31^5 = 16201674618135353117026999186798460736594220862673259904985261019983866762874187928744427378031051569864204341340130271834489331890198819946635625000000000

##### Share on other sites

___The challenge is dealt & I meet the small blind. Now I need to firm up a few points before the flop. (Forgive the obscure poke references; it's late).

___First if we may, I call for a set message length in characters, rather than the general "tens of pages". A double space page is rougly 250 words & say an average word is 5 letters = 1250 characters. Since spaces count, add 250 for 1500 characters per page. Now let's use 20 pages for a message length of 30,000 characters for encryption.

___Sooooo tired...Stopping here for the flop.

##### Share on other sites
Here’s my challenge to number-crunching-inclined Hypographers: prove the Starburst Conjecture by providing an example of any meaningful English text, encoding scheme, and expression satisfying the rules. Strange spacing, punctuation, capitalization, and spelling is OK. All that’s required to win is that the final expression be many times shorter than the original text. Of course, a formal proof or disproof wins, too.
First if we may, I call for a set message length in characters, rather than the general "tens of pages".
Please feel free to use a starting text of any length, the only restriction being that it is something that a reasonable English-speaking reader would recognize as meaningful. Let’s take that to imply a minimum length of about 250 characters.

Let “the final expression be many times shorter than the original text” be taken to mean “the final expression be less than 1/5 the length of the original text”, but let’s be flexible with this requirement. I’d be excited to see any result where the final expression is even about the same size as the starting text.

An important freedom is that the starting text may be altered – spaces or nonsense characters inserted, words strangely capitalized, slightly misspelled, etc. – to yield an encoded integer better suited for reduction to a shorter expression.

I suspect this is a very hard challenge, so let’s impose a minimum of rules.

##### Share on other sites

___ :eek: Just rying to get my shell around it. Do I properly understand that besides the 1/5 which is the compressed message, you then add the characters in the encryption program too?

___Another question I have; how important is the difference here between the information in the message & the message itself?

___I seem to recall an anecdote of similar circumstance to this; a professor assigned a hard problem in compression to a bunch of young students who didn't know it was a hard problem. As it turned out, one of these "naive" young guys produced a solution by using a binary probability tree, thus solving a long standing problem.

___I'm all in! :eek:

##### Share on other sites

Don’t worry about the size of the encoder/decoder program – just assume it’s trivially small compared to the original text. Just don’t cheat by having a “decoder” that can produce the starting message with little or no input.

The difference between the common-sense information in the starting message and its formal information (entropy) is unimportant. The message can contain lots of extra “noise” information, as long as a reasonable reader can distinguish the noise from the “signal”. For example: “Mary had a little lamb”; “mARy haD a LiTtLe lamB”; and “mary … had …… a litle … lamb” are all legitimate.

##### Share on other sites

___Acknowledged. To the bat cave! Er... to the turtle cave. :eek:

##### Share on other sites

___In rereading the original description, "It’s important to note that the Compression Challenge doesn’t say that no integer can be compressed, just that one with a lot of randomness – and thus, a lot of information, can’t"; I have to say I agree.

___For example, I believe it's impossible to compress anomalous Strange Numbers & their ilk. :eek:

___In any case the written numerals , which we casually refer to as numbers, themselves constitute a compression algorithm for representing amount. Staying with position based systems using zero, the higher the base you use, the fewer positions required to encode the amount. Conversley, the lower the base, the more positions required.

___Just thinking out loud :eek: :eek:

##### Share on other sites
• 6 months later...
In 1968, in the SF story “Gold at the Starbow's End" (which was expanded in 1983 to become the novel “Starburst”), the late Fred Pohl makes an interesting mathematical conjecture.

Here’s my challenge to number-crunching-inclined Hypographers: prove the Starburst Conjecture by providing an example of any meaningful English text, encoding scheme, and expression satisfying the rules. Strange spacing, punctuation, capitalization, and spelling is OK. All that’s required to win is that the final expression be many times shorter than the original text. Of course, a formal proof or disproof wins, too.

___I saw this thread mentioned elsewhere & thought to revisit it. I have in mind something Katabatak. Cogitating. :hihi:

##### Share on other sites
• 11 months later...

Something new in this perhaps, by way of a fractal compression scheme developed -as best I can tell - by Prof. Ian Stewert. I heard it described in a recent PBS show entitled Colours of Infinity, hosted by Arthur C. Clarke. In short, a given image (your information) is subjected to an analysis that gives a fractal equation specific to that image. Then, the equation is given back the initial conditions and it generates missing detail. For a spacecraft for example, this means less data is required without loss of detail and so image compression. Or so they said...:hihi: :cup:

Here is a link to Ian's homepage, and a Google of 'Prof. Ian Stewart' turns up many resources to his work.

##### Share on other sites
• 3 years later...

I'm interested in giving this one a go, but I'd rather not spend a lot of time reinventing the wheel. The OP is five years old - has anyone worked their way down blind alleys they can help me avoid?

information theory says that an arbitrary random integer cannot be “compressed” into an shorter integer enough that the length of the compressed integer + the length of the program required to decompress it back into the original integer are less than the original integer.

The important word here being "arbitrary". A million-digit number can be represented as p1^p2^p3, where p1, p2, p3 are each only a hundred or so digits long. This could be done with many different million-digit numbers, but the vast majority wouldn't work.

Pohl's Starburst number would have been generated by some sort of artificial intelligence, which would be able to see immediately that changing "and" into "&", or similar rewording, would make the problem tractable and the result easy.

I can't do it that way, so what I'm proposing to do is this:

Convert the original text into a numeric, probably base-60.

Calculate and test various "candidate" values, to see which one comes closest.

Subtract the result from the starting number (or subtract the starting number from the result, if closer).

Record the expression so far and loop back to calculate the next one.

I've come up with three candidate sets: polygonal, factorial, exponential, represented as nΔ, n!, n#. We're familiar with n!, and it's used in the OP. It's a good idea: a single number and an operator - just two characters giving a wide range of values. The two I'm planning to add go like this:

nΔ would be the set (3-triangular, 4-square, 5-pentagonal, 6-hexagonal...) (6, 16, 35, 66...). A set that climbs slower than factorial, for fine control.

n# is n^n. This one climbs very fast, and would be the "coarse" values.

If I'm working in base-60, that gives me 180 candidate values without going into double digits. Would double digits be better? Or would I be better off taking each candidate number and multiplying it - 2*n!, 3*n!, 4*n! That gives 10,800 possibilities. Or cross-multiply for 180*180=32,400? Or both, giving 1,944,000? That's quite a lot of possibilities off six characters of calculation. But it needs to be. If a 2-character expression doesn't chip 2 characters off the total, it isn't worth doing. A 4-character expression needs to be accurate to 4 characters. And so on.

It's only a thought experiment at the moment - something to help me wake up as I stroll to work in the mornings. I suspect that Craig is right, that most chunks of text can't be compressed into a simple equation, but it'd be worth finding out for sure.

If anyone has had a go and retired hurt, I'd be interested to find out where the stumbling blocks are. Or if anyone wants to lend a hand (ideas for more candidate sets, for example) I'd be fascinated!

##### Share on other sites
...

If I'm working in base-60, that gives me 180 candidate values without going into double digits.

mmm...that doesn't sound right to me. i think in base 60 you will have 60 single characters, including a zero. :(

...

...If anyone has had a go and retired hurt, I'd be interested to find out where the stumbling blocks are. Or if anyone wants to lend a hand (ideas for more candidate sets, for example) I'd be fascinated!

this is rather a donkey-before-the-cart thingy for me & so a bit out of my lane. :hyper: :hihi: i do however think that last bit i found on using fractals looked promising. :( :smile:

##### Share on other sites
mmm...that doesn't sound right to me. i think in base 60 you will have 60 single characters, including a zero. :(

picky-picky! Ok.... 59 single characters in 3 forms: polygonal, factorial, exponential, total 177, not 180. And 1Δ = 1! = 1# = 1, so 174 candidate values. Better? :(

##### Share on other sites
picky-picky! Ok.... 59 single characters in 3 forms: polygonal, factorial, exponential, total 177, not 180. And 1Δ = 1! = 1# = 1, so 174 candidate values. Better? ;)

much better! i'm deliciously satisfied. :hyper: not so much that you are drawing me in though. :hihi: :smile: ok, so part of the decryption/decompression would be saying which set/expression is used? or do you plan to use all three together? we could add a single digit at the front, or wherever, (base 60 or whatever) that indicates how many padding digits we added to the message, & that way we could narrow "candidate values" to fit our expressions rather than simply trying to find a fit. :(

over... & up...:ud: :(

##### Share on other sites
I'm interested in giving this one a go ...

I’m delighted!

This is one of the first threads I smatarted at hypography, and one of my favorite mathematical puzzles. My hope was, and remains, that from somewhere on the vast internet, someone would come forth with an ingenious or lucky flash of brilliant intuition to give even just one successful response to the challenge, or even a plodding, methodical, computationally monstrous approach to it. So far, nobody has – nor has anyone even sketched a proof that a successful response is impossible.

Welcome to the hunt!

... but I'd rather not spend a lot of time reinventing the wheel. The OP is five years old - has anyone worked their way down blind alleys they can help me avoid?

The problem has been, for me and everyone I’ve know who approached it, intractable – every alley explored has proven a dead end, but perhaps, to continue the metaphor, because we’ve all overlooked hidden doorways.

I can, however, outline one of the key difficult of the problem:

As I outlined previously, there are 2 obvious ways to convert a string of characters from a limited character set:

positional (in this case, base 27) – ie: “HELLO WORLD” $\to 8 +(5 +(12 +(12 +(15 +(0 +(23 +(15 +(18 +(12 +4 \cdot 27) \cdot 27) \cdot 27) \cdot 27) \cdot 27) \cdot 27) \cdot 27) \cdot 27) \cdot 27) \cdot 27 = 920321254041092 = 2^2 \cdot 23 \cdot 37 \cdot 241 \cdot 1451\ cdot 773153$

or

Godelized (which works with even an unlimited character sets) – ie: “HELLO WORLD” $\to 2^9*3^6*5^13*7^13*11^16*13*17^24*19^16*23^19*29^13 *31^5 =$ 16201674618135353117026999186798460736594220862673259904985261019983866762874187928744427378031051569864204341340130271834489331890198819946635625000000000

As Donk notes, obvious short expressions that evaluate to large numbers include factorials (n!), and exponents (eg: $n^n$). However, factoral have prime factorizations of mostly small primes, while exponents have prime factorizations with n occurrences of each of the prime factors of n. Neither is well suited for either kind numeric representation of the example “HELLO WORLD”, or other strings I’ve tried.

To “hit” one of these numbers, some “distance” must be added to the short expressions. In my explorations, however, and as information theory predicts, I was unable to find a distance that could itself be written as a short expression. In nearly all the cases I’ve tried, the “distance” term ends up being as long or longer than the starting string.

I’ve not yet tried polygonal numbers (eg: $n \Delta$). Since polygonal number consist of the sum of 2 terms ($\left(\frac{s}2-1 \right)n^2 - \left(\frac{s}2-2 \right)n$), they’re promising – but until one of us has won this challenge – which is only a first step toward a more general, significant challenge – if it’s a good path or a blind alley remains to be seen.

The important word here being "arbitrary"

As I think Donk has surmised, that the source string is not arbitrary, but can be carefully selected from a large collection, is, I think, key to the challenge.

Thus, to Donk’s general program

Convert the original text into a numeric, probably base-60.

Calculate and test various "candidate" values, to see which one comes closest.

Subtract the result from the starting number (or subtract the starting number from the result, if closer).

Record the expression so far and loop back to calculate the next one.

I think an important step is Change and reconvert the original text.

“Meaningful English text” (from the challenge rules in post #1) is a large and seemingly almost unlimitedly extensible domain, because one can insert a lot of nonsense/noise into an English text without losing its meaning (eg: “1 cnn i*&^ser\$%t alota nonscents/noyes in2 a Englsh text wo losing ixs mng”).

So, to all, good luck, and get crunching and intuiting :) :thumbs_up

##### Share on other sites

I've yet to wrap my head around exactly what you're asking for, but it seems like you're trying to compress arbitrary sequences of words? The problem is that an effective encoding hasn't been found?

Seems very much like the Hutter Prize or vice versa. I suppose you mean this compression challenge or this one The Million Random Number Challenge.

Enjoy,

Ian

##### Share on other sites
I've yet to wrap my head around exactly what you're asking for, but it seems like you're trying to compress arbitrary sequences of words? The problem is that an effective encoding hasn't been found?

Seems very much like the Hutter Prize or vice versa. I suppose you mean this compression challenge or this one The Million Random Number Challenge.

Here's the theory I'm working on:

• Any piece of text can be converted into a number.
• Many numbers can be expressed as an equation (e.g. Craig's "Starburst number", (3.875*12^26)! + 1973^854 +331^852 + 17^2008 + 3^9606 + 2^88 – 78)
• Most numbers can't be converted into such an expression.
• BUT, you can insert a "null" character anywhere in the original string. It would then be a different number, which might convert better. If not, move the null to a different point and try again. Once all insertion points have been tried, do it again with two nulls in all possible combinations. And so on, to infinity and beyond...

It's theoretically do-able, though it probably depends on luck. If I'm lucky, the original expression might convert well. If not, and if the unconvertibles outweigh the convertibles by a huge factor, it might take a few million years to get to the answer. But I won't know until I try, will I? :)

##### Share on other sites
• 2 weeks later...

I haven't forgotten this one, but I'm having to play with numbers somewhat larger than I'm used to, like this 466-digit monster:

1,285,358,318,908,397,602,470,209,924,419,166,156,055,239,252,558,516,010,395,938,369,998,774,343,996,685,742,089,786,962,579,490,563,061,063,114,787,872,069,073,122,365,424,326,389,621,775,746,128,049,046,153,839,237,062,032,514,459,805,690,410,043,909,675,534,270,085,037,917,881,294,453,916,469,817,665,512,662,265,859,373,410,344,277,271,083,270,342,984,665,495,951,942,329,084,741,660,338,602,493,311,358,287,585,323,043,560,115,686,211,301,391,913,612,921,345,823,949,790,682,479,902,366,475,309,148,602,877,694,603,617,973,626,389,141,815,122,513,335,959,623,763,382,619,898,720,804,485,492,408,275,604,248,046,875 :):eek_big::)

Are there any mad mathematicians around who can confirm that 296,395^85 multiplies out to the above? My calculator won't go that high <grin>

It's currently the largest of my "candidate" numbers, and if I have this one right, the others probably are as well.

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.