Re: kanjidic parser in Perl?
- From: "John J. Chew, III" <jjchew@xxxxxxxxxxxxxxxx>
- Date: Tue, 16 Aug 2005 16:47:20 -0400
Ben Bullock wrote:
"John J. Chew, III" <jjchew@xxxxxxxxxxxxxxxx> wrote in message news:eYidnUIo__qk-pzeRVn-qA@xxxxxxxxxxxxxBen Bullock wrote:Well, I tried it, but I got lots of error messages.package kdic;Once you say this you can lose the subsequent "kdic::"s.
Could you post one?
sub parse_entry ($) {
Always use prototypes.
Adding the ($) after parse_entry declares that parse_entry takes one scalar argument and enables what the Prototypes section of perlsub(1) describes correctly as "a very limited kind of compile-time argument checking". Using prototypes (and not using the &sub(@args) syntax for subroutine calls, which overrides prototypes) can catch (1) when you change the calling syntax for a sub but forget to change the parameters in one of its invocations, and (2) coding errors based on a misunderstanding of Perl's operator precedence.
When using prototypes, it's also a good idea to declare all of your subs at the beginning of your file
sub parse_entry($);
while ($input =~ m/(\{[^\}]+\})/) { # print "$input, $1"; push (@english, $1); $input =~ s/\{[^\}]+\}//; }
More simply:
push(@english, $1) while $input =~ s/({.*?})//;
Sorry, I don't know this regular expression syntax with the question mark; why does this not match all of
{abc}{def}
as one string?
The ? changes the * from being greedy to being lazy, making it match the first } it finds rather than the last one.
> Also, don't I need to escape { and }?It's not necessary here because {} have a special meaning only
when their contents look like a repetition count or range.
It's arguably better form to escape them anyway, to prevent
gotchas if you happened to edit the contents later on, but I
find it more legible this way.> Obviously the
original is meant to get {abc} and {def} as two strings. Also, your regular expression contains { and }, the quoting characters for the English "meanings" in kanjidic, but mine doesn't, so the expressions aren't equivalent.
The {} are in fact inside the () in your expression as well. If you don't want to extract the {}, then you could of course write
while $input =~ s/{(.*?})}//;It is simpler but the original code was edited down from something more complex, hence the $found.
Fair enough. It's usually possible to replace variables like $found either by rearranging the conditions or using the "next" command, and it always strikes me as being wasteful to dedicate a variable to the cause of testing a condition a second time, but sometimes it's necessary.
Maybe, I'm not sure if there are any strings in kanjidic with mixed kana and other things. Actually I'd probably raise an error at that point just to see what they were.
Right, which is why you should use my pattern (which will raise the error) rather than yours (which won't).
if ( m/^\#/ ) { next; } &kdic::parse_entry ("$_");
Or just
kdic::parse_entry $_ unless /^#/;
If you think about it, that's an extremely bad idea.
I'm guessing because you're doing things other than calling parse_entry, and have omitted them from your message. I would still write
next if /^#/; kdic::parse_entry $_;
You don't need to escape the '#'. You don't need to clutter your screen with the most verbose form of the if statement. You shouldn't use &() because it overrides prototypes sub argument type checking. You don't need to put double-quotes around variables.
John -- John Chew <jjchew@xxxxxxxxxxxxxxxx> http://www.poslfit.com .
- Follow-Ups:
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- References:
- Re: kanjidic parser in Perl?
- From: Gabor Farkas
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- From: Gabor Farkas
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- From: John J. Chew, III
- Re: kanjidic parser in Perl?
- From: Ben Bullock
- Re: kanjidic parser in Perl?
- Prev by Date: Re: the joyo of kanji
- Next by Date: Re: kanjidic parser in Perl?
- Previous by thread: Re: kanjidic parser in Perl?
- Next by thread: Re: kanjidic parser in Perl?
- Index(es):
Relevant Pages
|