Statistical Time Server

Boerseun · January 25, 2008

Had myself a funny thought just now, and want to know you guys and gals' take on this:

Let's say that nobody in the world knows what time it is.

Ignoring for a second such critical connection issues like time servers, etc., let's say that all internet users are still connected, and they've set their computer time to how late they think it is.

The only information at their disposal is what they see out their windows: It's either night, the sun's coming up, it's day, or the sun's setting. They have no way at all to whittle the time down to anything closer than that. Also, let's assume there's no jokers out there who'll confuse daytime and nighttime.

So, if there was a server somewhere in the world which takes their computer time as they log on, and get an average, won't the average be pretty much on the nose for the actual time, right down to the second? (Keeping in mind time zones, etc.) And with right down to the second, I mean very, very damn accurate!

Statistics are pretty accurate with only a few thousand samples being representative of millions, now with millions of internet users knowing only "Day, dusk, night, dawn", spanning the globe, you should be able to get damn close to the actual time, not?

Your samples will look like this:

...

#12,345,123: DAY

#12,345,124: NIGHT

#12,345,125: DAWN

...

And your average will be: 9:23am

(...or somethign to that effect.)

But how accurate?

I reckon very, veeeerrrrrryyy accurate.

Buffy · January 25, 2008

Do they get to know what the date is too? Or do they have to guess that based on how much snow there is outside?

We know that polls are just a collection of statistics that reflect what people are thinking in 'reality.' And reality has a well known liberal bias, :confused:

Buffy

freeztar · January 25, 2008

That's an interesting question Boerseun.

I'm not quite sure it would be as accurate as you claim, but I do think it would be fairly accurate.

I can't think of any good way of computing this off the top of my head, so I'll leave it to speculation right now. :shrug:

Boerseun · January 28, 2008

So, any Stats nerds out there to help me?

Qfwfq · January 28, 2008

I don't get how your result follows from the sample but I'd say you would need to have the longitude of each user that is reporting DAY/NIGHT. Latitude would help too, if the time of year is also known.

Boerseun · January 28, 2008

From the IP address, you can deduce the country. Knowing which country, you'll know how many time zones there are that's possible for the particular user. If he says "night", then you compare that to other countries.

Let's say that you've got millions of US IP's all telling you "Night", and lots of European IPs telling you "Night" as well. Then, the first Eastern European IP tells you "Dawn". Then you know that the time is falling between certain limits: It can't be any later than X, because day is breaking in Eastern Europe. It can't be any earlier than Y, because the biggest part of Western Europe is still reporting "Night". And so on. You'll be able to get closer to an actual time by comparing world-wide times in the same manner.

The more IPs reporting how late they "think" it is based on very crude observations (night, dawn, day & dusk), and the more IP ranges reporting in, the closer you'll get to what I reckon a very good approximation of the actualy time will be.

Seasons can also be deduced, and doesn't have to be known. If a place like, say, Israel (GMT+2) reports "night", and many more IPs in South Africa (also GMT+2) reports "Day", then it's likely to be summer in the Southern hemisphere. Taking note of the daily change in the rate at which the numbers report in different hemispheres in the same timezone, you should be able to pretty closely get the date, too. It will take a bit of time to get comparable lists, though.

Qfwfq · January 28, 2008

I can see what you mean but if you only know the country of each log item you would also need to have the distribution of participant density weighted by probability of participation. Knowing time zones is less relevant, you are sampling their local solar time anyway.

Buffy · January 28, 2008

Assuming your sample is large enough, your error is going to be proportional to the inaccuracies of some of the key measurements:

If you're using IP addresses, then you've got to deal with the fact that these are highly inaccurate indicators of location. How close is your farm to your ISP?

If people were reporting based on some GPS input, then you could have remarkably accurate results. But of course if you had GPS, you'd already have accurate time down to milliseconds, so we'll take that one out of the game as "unfair."

If the input data really is simply "Day", "Night", "Dawn", "Dusk" you're probably only going to be able to get any accuracy out of paying attention to the "Dawn" and "Dusk" reports, which probably have a "logical range"--the time of day that people use those terms--of plus or minus a half an hour. Through large samples, you might get that down to a few minutes of accuracy at best.

But I think you should take a hint from these "prediction markets" that are now floating around (check out http://ppx.popsci.com): If your input was "what is the time right now?" with the response being down to exact minute guesses with a large enough sample size, I think you could get your error range down to seconds!

Don't just do something, stand there, :hihi:

Buffy

Boerseun · May 14, 2008

Bit of a bump, but what the hey...

Let's say you've got two IP's reporting in. The one is from the Namibia range, and he reports night.

So, with only 1 sample, you know than it can't be any earlier than 18:00 and no later than 6:00 in Windhoek.

So, with a single IP, you've whittled 12 hours off the clock, towards your final ideal of figuring out the time.

Next IP comes in, from, let's say the Mozambique IP range. He's the first to report "Dawn". This now means that it cannot be any earlier than 4:00 in Windhoek, because Namibia is 2 hours behind Mozambique. It can't, however, be 6:00 yet, because Dawn is only now being reported 2 hours ahead of Windhoek. So, time is somewhere between 04:00 and 06:00. From only two samples.

But now, a separate mechanism can be introduced:

To whittle down time even closer, Let's say dawn is at six. And let's say all users in a specific range is sitting in one time zone. So, the first to report Dawn should be in the eastern part of the zone, the last to report should be western. Your percentages should, of course, be adapted for population distribution through every state & time zone... So, take the 60 minutes between 6:00 and 7:00 as a percentage of reporting users in that particular zone. So, when 50% users in a zone say "night" and the other 50% report "Dawn", you could mark it down as 6:30 (or, if 75% of users in that particular zone lives in the Eastern part and only 25% in the Western part, when 50% have reported "Dawn", you mark it as 6:15). And then comparing these zone reports and the time difference between their reports with each other, you should be able to get it pretty close, I guess.

Why you'd want to do it, however, is anybody's guess...:)

Sign In

Statistical Time Server

Recommended Posts

Boerseun

Link to comment

Share on other sites

Buffy

Link to comment

Share on other sites

freeztar

Link to comment

Share on other sites

Boerseun

Link to comment

Share on other sites

Qfwfq

Link to comment

Share on other sites

Boerseun

Link to comment

Share on other sites

Qfwfq

Link to comment

Share on other sites

Buffy

Link to comment

Share on other sites

Boerseun

Link to comment

Share on other sites

Join the conversation

Browse

Activity