Re: kanjidic parser in Perl?
- From: "Ben Bullock" <usenet@xxxxxxxxxx>
- Date: Tue, 16 Aug 2005 18:54:43 +0900
"John J. Chew, III" <jjchew@xxxxxxxxxxxxxxxx> wrote in message news:eYidnUIo__qk-pzeRVn-qA@xxxxxxxxxxxxx
Ben Bullock wrote:
package kdic;
Once you say this you can lose the subsequent "kdic::"s.
Well, I tried it, but I got lots of error messages.
%kdic::codes =
my (%codes) =
'W', 'KOREAN',
Better style to write:
'W' => 'KOREAN',
That's a good tip. I didn't know about that, but it seems like a good way to avoid errors as well.
sub kdic::parse_entry
sub parse_entry ($) {
Always use prototypes.
I have no idea how to do so.
I tried a web search and found a page about Perl prototypes but it was totally incomprehensible to me.
while ($input =~ m/(\{[^\}]+\})/) { # print "$input, $1"; push (@english, $1); $input =~ s/\{[^\}]+\}//; }
More simply:
push(@english, $1) while $input =~ s/({.*?})//;
Sorry, I don't know this regular expression syntax with the question mark; why does this not match all of
{abc}{def}as one string? Also, don't I need to escape { and }? Obviously the original is meant to get {abc} and {def} as two strings. Also, your regular expression contains { and }, the quoting characters for the English "meanings" in kanjidic, but mine doesn't, so the expressions aren't equivalent.
(my $kanji, my $jiscode, my @entries) = split (" ", $input);
More simply:
my ($kanji, $jiscode, @entries) = split(' ', $input);
foreach my $entry (@entries) { my $found = 0; if ($entry =~ m/(^[A-Z]+)(.*)/ ) { if ($kdic::codes{$1}) { $values{$1} = $2; $found = 1; } } elsif ($entry =~ m/([\x80-\xFF]+)/) { push (@japanese, $1); $found = 1; } if ($found == 0) { print "Mystery entry \"$entry\"\n"; } }
More simply:
for my $entry (@entries) { if ($entry =~ /^([A-Z]+)(.*)/ && exists $codes{$1}) { $values{$1} = $2; } elsif ($entry =~ /([\x80-\xFF]+)/) { push(@japanese, $1); } else { print "Mystery entry \"$entry\"\n"; } }
It is simpler but the original code was edited down from something more complex, hence the $found.
though I suspect the elsif clause should read
elsif ($entry =~ /^[\x80-\xFF]+$/) { push(@japanese, $entry); }
Maybe, I'm not sure if there are any strings in kanjidic with mixed kana and other things. Actually I'd probably raise an error at that point just to see what they were.
my $kanjidic = "kanjidic";
Normally, before this you would put a
package main;
to switch back to the default namespace.
Thanks for the tip.
my $KANJIDIC; my $count = 0;
open ($KANJIDIC, $kanjidic) || die;
You can just say
open (my $KANJIDIC, "<$kanjidic") or die;
in modern versions of Perl.
if ( m/^\#/ ) { next; } &kdic::parse_entry ("$_");
Or just
kdic::parse_entry $_ unless /^#/;
If you think about it, that's an extremely bad idea.
Thanks for the mental exercise,
Since you like mental exercise so much, I'll leave it to you to figure out why I don't agree with your last comment.
.
- Follow-Ups:
- Re: kanjidic parser in Perl?
- From: John J. Chew, III
- Re: kanjidic parser in Perl?
- References:
- Re: kanjidic parser in Perl?
- From: Gabor Farkas
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- From: Gabor Farkas
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- From: John J. Chew, III
- Re: kanjidic parser in Perl?
- Prev by Date: Re: Edict Entry : sachi are (?)
- Next by Date: Re: Question: Radicals of 共 in edict
- Previous by thread: Re: kanjidic parser in Perl?
- Next by thread: Re: kanjidic parser in Perl?
- Index(es):
Relevant Pages
|
Loading