Re: kanjidic parser in Perl?



Ben Bullock wrote:

"Gabor Farkas" <gabor@xxxxxxxxxxxxxx> wrote in message news:1f516$42ffa6b0$55d8898b$6438@xxxxxxxxxxxxxxxxxxxxxxxxxxx


Ben Bullock wrote:

David Alexander Ranvig wrote:

Ben Bullock <usenet@xxxxxxxxxx> writes:

| Does anyone know of a parser for Jim Breen's kanjidic written in
| Perl?

<URL: http://search.cpan.org> is a nice tool for finding all things
perl. Maybe you can use the module Lingua::JP::Kanjidic by Simon
Cozens?



Thanks for the tip. I had a look at it, and it seems to need some work. Doesn't parse all the fields in the dictionary yet, unfortunately. I'll try editing it up a bit.



sorry, but what are you trying to achieve? i mean, what advanced features you need from a kanjidic parser?

for me it seems that 2-3 lines of perl (mostly splits) would completely parse the kanjidic database for you...


Thanks for your input.


very funny ;))

the problem is that i can read perl relatively well, but i'm not good enough to be able to write it (i use python to solve text processing problems)

but i know that perl can split a string into by the whitespace, and that should be enough for your needs (if you know perl).

in python, the code would look approximately like:

d = {}

for line in open('kanjidic.txt'):
	line = line.split()
	kanji = line[0]
	english = []
	readings = []
	for item in line[1:]:
		if not item[0].isalphanum():
			if item[0] == '{':
				english.append( item[1:-1] )
			else:
				readings.append(item)
	d[kanji] = (readings,english)


i'm pretty sure that this can be translated line-by-line to perl...

gabor
.



Relevant Pages

  • Re: a good TeX parser for use by software that needs to read TeX?
    ... : read TeX? ... In particular, has anyone used the perl Text::TeX parser, ... In addition, push $found into a list @commands, ...
    (comp.text.tex)
  • Re: Syntax checker wtf?
    ... the parser has no means to detect the error ... I'll note that perl has a similarly flexible syntax, ... example, if you get a runaway unclosed string or regexp operator, ...
    (comp.lang.ruby)
  • Re: Precedence of exponentiation
    ... The parser I'm writing isn't even written in Perl ... (nor does it use yacc), and that's my primary reason for this topic. ... > Digits: Digit | Digit Digits ...
    (comp.lang.perl.misc)
  • Re: Writing a C++ Style Checker
    ... This is going to be seriously hard work. ... you may be able to persuade your compiler to do the ... with a relatively simple parser, leaving the hard work of 'is this ... Perl distribution, which was intended to allow access to C structures by ...
    (comp.lang.perl.misc)
  • Re: How to read and parse a remote XML file with Java
    ... What type of XML parser are you using in Python, PHP and Perl? ... It would be silly to compare a Perl SAX parser with a Java DOM parser or vice versa. ... The EOL transition period is from Dec, 11 2006, until the General Availability of the next Java version, Java SE 7, currently planned for the summer of 2008. ...
    (comp.lang.java.programmer)