Re: Google counts
jim_breen_at_hotmail.com
Date: 03/16/05
- Next message: Paul Blay: "Re: Google counts"
- Previous message: B. Ito: "Re: ...koiten janee"
- In reply to: Paul Blay: "Re: Google counts"
- Next in thread: Paul Blay: "Re: Google counts"
- Reply: Paul Blay: "Re: Google counts"
- Messages sorted by: [ date ] [ thread ]
Date: 16 Mar 2005 11:08:54 GMT
Paul Blay <ranma@saotome.demon.co.uk> dixit:
>Unfortunately most of that does not apply to Japanese text searches.
The underlying assumptions do though.
>As discussed in various threads before the significant factors in Japanese
>text searching are
>1. Either including hl=ja in the search url or a kana $B$N(B to make sure
>it is Japanese pages that are included.
>2. Including " marks around dubious / rare words / phrases - or they
>will often be split into two words for the search.
>3. Being aware that _if_ you have hl=ja set then certain search terms
>will be 'merged' as recognized alternative spellings (for example
>$B%@%$%d%b%s%I(B vs $B%@%$%"%b%s%I(B)
Yes, a page in language-X is parsed and segmented by a language-X parser
before indexing. AFAIK for Japanese Google uses a parser from Basis
Technologies (y?) which in turn uses a parse dictionary from Jack
Halpern's company. For some reason I am not surprised to find
$B%@%$%d%b%s%I(B & $B%@%$%"%b%s%I(B combined.
>I don't say that Altavista is a better search engine - but it seems to
>be more reliable for relative word counts.
I found that a while ago when doing some research on the spread of
$BCzG+8l(B, $B7I8l(B, etc. across the .jp 2nd-level domains. Google may be
very good at getting the most relevant pages to the front, but its
page-hit stats are pretty flakey.
-- Jim Breen http://www.csse.monash.edu.au/~jwb/ Computer Science & Software Engineering, Monash University, VIC 3800, Australia $B%8%`!&%V%j!<%s(B@$B%b%J%7%eBg3X(B
- Next message: Paul Blay: "Re: Google counts"
- Previous message: B. Ito: "Re: ...koiten janee"
- In reply to: Paul Blay: "Re: Google counts"
- Next in thread: Paul Blay: "Re: Google counts"
- Reply: Paul Blay: "Re: Google counts"
- Messages sorted by: [ date ] [ thread ]