Re: kanjidic parser in Perl?
Code critique time, yay!
Ben Bullock wrote:
package kdic;
Once you say this you can lose the subsequent "kdic::"s.
%kdic::codes =
my (%codes) =
'W', 'KOREAN',
Better style to write:
'W' => 'KOREAN',
sub kdic::parse_entry
sub parse_entry ($) {
Always use prototypes.
while ($input =~ m/(\{[^\}]+\})/)
{
# print "$input, $1";
push (@english, $1);
$input =~ s/\{[^\}]+\}//;
}
More simply:
push(@english, $1)
while $input =~ s/({.*?})//;
(my $kanji, my $jiscode, my @entries) = split (" ", $input);
More simply:
my ($kanji, $jiscode, @entries) = split(' ', $input);
foreach my $entry (@entries)
{
my $found = 0;
if ($entry =~ m/(^[A-Z]+)(.*)/ )
{
if ($kdic::codes{$1})
{
$values{$1} = $2;
$found = 1;
}
}
elsif ($entry =~ m/([\x80-\xFF]+)/)
{
push (@japanese, $1);
$found = 1;
}
if ($found == 0)
{
print "Mystery entry \"$entry\"\n";
}
}
More simply:
for my $entry (@entries) {
if ($entry =~ /^([A-Z]+)(.*)/ && exists $codes{$1})
{ $values{$1} = $2; }
elsif ($entry =~ /([\x80-\xFF]+)/)
{ push(@japanese, $1); }
else
{ print "Mystery entry \"$entry\"\n"; }
}
though I suspect the elsif clause should read
elsif ($entry =~ /^[\x80-\xFF]+$/)
{ push(@japanese, $entry); }
my $kanjidic = "kanjidic";
Normally, before this you would put a
package main;
to switch back to the default namespace.
my $KANJIDIC;
my $count = 0;
open ($KANJIDIC, $kanjidic) || die;
You can just say
open (my $KANJIDIC, "<$kanjidic") or die;
in modern versions of Perl.
if ( m/^\#/ )
{
next;
}
&kdic::parse_entry ("$_");
Or just
kdic::parse_entry $_ unless /^#/;
Thanks for the mental exercise,
John
--
John Chew (poslfit on MD) * jjchew@xxxxxxxxxxxxxxxx * http://www.poslfit.com
.