eclectic_boy: (Default)
[personal profile] eclectic_boy
You already know about googlewhacking, finding pairs of words that only bring up a single hit when looked for together on google. I decided that's not as good a measure as I'd like of how uncorrelated two words really are, because it doesn't take into account the base rarity of the individual words -- it's not really that odd that two words would only return a single google hit if each of them by themselves are only on a small number of pages. What's more intriguing, to me anyway, are words that by themselves are quite common, but which nevertheless rarely appear together.

So I'm defining a number I call the Google Correlation of two words:

GC(x,y) = #hits for x y / ( (#hits for x) * (#hits for y) )

Most pairs of words will have quite small GCs (except for stormy petrals, I suppose!), but the question is, how small can you get? What's the smallest nonzero GC you can find?

Any googlewhack will have a numerator of 1, but won't necessarily be that tiny because its denominator may also be small. For example, one of the recent finds at googlewhack.com, Jotted cruddiness,
has a GC of 1 / (795000 * 2740) = 4.59073589 * 10^-10.

But my best so far, Autofocus Creole, is down at 685 / (14700000 * 5570000) = 8.36600349 * 10^-12.

So have at it, and share your finds! Anyone who wants to help me make a webpage for people to report these, I'll share all the credit with you once it becomes a fad :^)
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

February 2014

S M T W T F S
      1
2345678
9101112131415
16171819202122
2324 25262728 

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 24th, 2026 02:10 pm
Powered by Dreamwidth Studios