Re: Biggest difference between Google estimates and actual returns?



Paul Blay wrote:
> I've come across a search term that returns a fairly impressive
> 87,400 estimated hits but only returned 429 actual pages
> including duplicates. Anybody spotted a larger discrepancy?
>
> It was going to be my first 'insta-submission' word based on the
> 87,400 - it's lucky I spotted the mismatch.
>
> (In case anybody is wondering - here's the search link
> http://www.google.co.uk/search?q=%22%E3%82%A8%E3%83%83%E3%83%81%E3%81%8F&hl=en&lr=&safe=off&start=990&sa=N&filter=0
> ) YMMV

I tried this with both Google and Yahoo ... Google estimated 4,130 but
returned only 633; Yahoo estimated 4,130 and returned 807. (Presumably
both would have returned more if the search had been specified to also
return similar pages.)

What's going on here? Quite a few things. And I can only speak in
generalities because I happen to work for one of those companies.

You might have noticed that Yahoo and Google have stopped the "my index
is bigger than your index" boasting. One problem (besides the massive
parallelism that makes any count a bit fuzzy): do you count mirrors or
almost-copies when you're counting results? (In the example quoted
above, both Google and Yahoo decided that 3/4 or so of the query
results were duplicates).

Having said this, you can amuse yourself by querying for "the" and "a"
and seeing how many results you get.

And then there's web spam. There are people who try to tweak things to
get themselves higher on the search results page. The good guys are
called SEOs (search engine optimizers) and concentrate on making their
pages look good so that the search engines will rank them well. The bad
guys use whatever tricks they can, such as "link farms" and "directory
pages". If you do a query and find that most of your results are either
irrelevant or are pages that contain lists of words or lists of ads,
you've stumbled into some web spam. (Yes, both Google and Yahoo are
fighting these guys, but it's an arms race.)

See also http://www.theregister.co.uk/2005/08/16/google_yahoo_junk/ for
a bit more on this (the debunking starts about half-way through the
article).

So, when you're using google-hits to decide which phrase is more
common, just remember that you might have instead wandered into some
piles of 塵, not properly sorted into burnable, plastics, and other.

And you might want to also try yahoo-hits for a second opinion (it also
has advanced search for restricting by language, site, etc.).

- peter

.



Relevant Pages

  • http://snofreh19.007gb.com/msn-plus2a/map.html msn plus log hacking
    ... http://snofreh15.007gb.com/yahoo-chd0/harlerediase.html cards yahoo ... http://snofreh15.007gb.com/yahoo-chd0/fangati.html msn mesenger 7 o ... http://snofreh15.007gb.com/yahoo-chd0/vesthask.html google calendar ... http://snofreh15.007gb.com/yahoo-chd0/rin.html msn instant messenger ...
    (sci.space.policy)
  • Google, Yahoo, Microsoft Set Common Voice Abroad
    ... Google, Yahoo, Microsoft Set Common Voice Abroad ... Principles Aim to Define Conduct With Nations That Restrict Speech, ... Lack Privacy Protections and Censor Search Results ...
    (soc.culture.romanian)
  • Re: Structuring informational content for commercial site
    ... >would be good to use subsubdirectories or not as much as it concerns search ... >>> I think that Yahoo, for example, does a better job than Google ... I don't totally agree with "at giving better rank to pages which have ...
    (alt.internet.search-engines)
  • OT: Google search counts are just estimates
    ... Their intent was to test whether Yahoo really did consistently return more ... results than Google. ... finding many "word list" Web sites, which include long lists of words to ... In order to return results quickly, Google's computers start by scanning the ...
    (alt.usage.english)
  • Re: Structuring informational content for commercial site
    ... >>>At giving a better rank to pages which have high valuable content. ... and I have sites/pages that rank well in Yahoo ... rolled out their new search engine - I ranked higher on them than I ... Google just took me a little while ...
    (alt.internet.search-engines)

Loading