eclectic_boy: (Default)
[personal profile] eclectic_boy
You already know about googlewhacking, finding pairs of words that only bring up a single hit when looked for together on google. I decided that's not as good a measure as I'd like of how uncorrelated two words really are, because it doesn't take into account the base rarity of the individual words -- it's not really that odd that two words would only return a single google hit if each of them by themselves are only on a small number of pages. What's more intriguing, to me anyway, are words that by themselves are quite common, but which nevertheless rarely appear together.

So I'm defining a number I call the Google Correlation of two words:

GC(x,y) = #hits for x y / ( (#hits for x) * (#hits for y) )

Most pairs of words will have quite small GCs (except for stormy petrals, I suppose!), but the question is, how small can you get? What's the smallest nonzero GC you can find?

Any googlewhack will have a numerator of 1, but won't necessarily be that tiny because its denominator may also be small. For example, one of the recent finds at googlewhack.com, Jotted cruddiness,
has a GC of 1 / (795000 * 2740) = 4.59073589 * 10^-10.

But my best so far, Autofocus Creole, is down at 685 / (14700000 * 5570000) = 8.36600349 * 10^-12.

So have at it, and share your finds! Anyone who wants to help me make a webpage for people to report these, I'll share all the credit with you once it becomes a fad :^)

Date: 2007-01-13 05:04 am (UTC)
From: [identity profile] carnap.livejournal.com
Now that "Autofocus Creole" seems to be the name of the game, you need to start tracking the google correlation of those two terms to see how popular the game is. . .

February 2014

S M T W T F S
      1
2345678
9101112131415
16171819202122
2324 25262728 

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 25th, 2026 02:16 pm
Powered by Dreamwidth Studios