Re: kanjidic parser in Perl?
- From: Gabor Farkas <gabor@xxxxxxxxxxxxxx>
- Date: Mon, 15 Aug 2005 22:06:56 +0200
Ben Bullock wrote:
"Gabor Farkas" <gabor@xxxxxxxxxxxxxx> wrote in message news:1f516$42ffa6b0$55d8898b$6438@xxxxxxxxxxxxxxxxxxxxxxxxxxx
Ben Bullock wrote:
David Alexander Ranvig wrote:
Ben Bullock <usenet@xxxxxxxxxx> writes:
| Does anyone know of a parser for Jim Breen's kanjidic written in | Perl?
<URL: http://search.cpan.org> is a nice tool for finding all things perl. Maybe you can use the module Lingua::JP::Kanjidic by Simon Cozens?
Thanks for the tip. I had a look at it, and it seems to need some work. Doesn't parse all the fields in the dictionary yet, unfortunately. I'll try editing it up a bit.
sorry, but what are you trying to achieve? i mean, what advanced features you need from a kanjidic parser?
for me it seems that 2-3 lines of perl (mostly splits) would completely parse the kanjidic database for you...
Thanks for your input.
very funny ;))
the problem is that i can read perl relatively well, but i'm not good enough to be able to write it (i use python to solve text processing problems)
but i know that perl can split a string into by the whitespace, and that should be enough for your needs (if you know perl).
in python, the code would look approximately like:
d = {}for line in open('kanjidic.txt'):
line = line.split()
kanji = line[0]
english = []
readings = []
for item in line[1:]:
if not item[0].isalphanum():
if item[0] == '{':
english.append( item[1:-1] )
else:
readings.append(item)
d[kanji] = (readings,english)
i'm pretty sure that this can be translated line-by-line to perl...
gabor .
- Follow-Ups:
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- References:
- Re: kanjidic parser in Perl?
- From: Gabor Farkas
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- Prev by Date: SLJ FAQ Comment
- Next by Date: Re: SLJ FAQ Comment
- Previous by thread: Re: kanjidic parser in Perl?
- Next by thread: Re: kanjidic parser in Perl?
- Index(es):
Relevant Pages
|