Re: kanjidic parser in Perl?



Ben Bullock wrote:
"John J. Chew, III" <jjchew@xxxxxxxxxxxxxxxx> wrote in message news:eYidnUIo__qk-pzeRVn-qA@xxxxxxxxxxxxx
Ben Bullock wrote:
package kdic;
Once you say this you can lose the subsequent "kdic::"s.
Well, I tried it, but I got lots of error messages.

Could you post one?

sub parse_entry ($) {

Always use prototypes.

Adding the ($) after parse_entry declares that parse_entry takes one scalar argument and enables what the Prototypes section of perlsub(1) describes correctly as "a very limited kind of compile-time argument checking". Using prototypes (and not using the &sub(@args) syntax for subroutine calls, which overrides prototypes) can catch (1) when you change the calling syntax for a sub but forget to change the parameters in one of its invocations, and (2) coding errors based on a misunderstanding of Perl's operator precedence.

When using prototypes, it's also a good idea to declare all
of your subs at the beginning of your file

  sub parse_entry($);

    while ($input =~ m/(\{[^\}]+\})/)
    {
#        print "$input, $1";
        push (@english, $1);
        $input =~ s/\{[^\}]+\}//;
    }

More simply:

  push(@english, $1)
    while $input =~ s/({.*?})//;


Sorry, I don't know this regular expression syntax with the question mark; why does this not match all of

{abc}{def}

as one string?

The ? changes the * from being greedy to being lazy, making it match the first } it finds rather than the last one.

> Also, don't I need to escape { and }?

It's not necessary here because {} have a special meaning only
when their contents look like a repetition count or range.
It's arguably better form to escape them anyway, to prevent
gotchas if you happened to edit the contents later on, but I
find it more legible this way.

> Obviously the
original is meant to get {abc} and {def} as two strings. Also, your regular expression contains { and }, the quoting characters for the English "meanings" in kanjidic, but mine doesn't, so the expressions aren't equivalent.

The {} are in fact inside the () in your expression as well. If you don't want to extract the {}, then you could of course write

  while $input =~ s/{(.*?})}//;

It is simpler but the original code was edited down from something more complex, hence the $found.

Fair enough. It's usually possible to replace variables like $found either by rearranging the conditions or using the "next" command, and it always strikes me as being wasteful to dedicate a variable to the cause of testing a condition a second time, but sometimes it's necessary.

Maybe, I'm not sure if there are any strings in kanjidic with mixed kana and other things. Actually I'd probably raise an error at that point just to see what they were.

Right, which is why you should use my pattern (which will raise the error) rather than yours (which won't).

    if ( m/^\#/ )
    {
        next;
    }
    &kdic::parse_entry ("$_");

Or just

kdic::parse_entry $_ unless /^#/;

If you think about it, that's an extremely bad idea.

I'm guessing because you're doing things other than calling parse_entry, and have omitted them from your message. I would still write

  next if /^#/;
  kdic::parse_entry $_;

You don't need to escape the '#'.  You don't need to clutter
your screen with the most verbose form of the if statement.
You shouldn't use &() because it overrides prototypes sub
argument type checking.  You don't need to put double-quotes
around variables.

John
--
John Chew <jjchew@xxxxxxxxxxxxxxxx> http://www.poslfit.com
.



Relevant Pages

  • Re: kanjidic parser in Perl?
    ... takes one scalar argument and enables what the Prototypes section of perlsubdescribes correctly as "a very limited kind of compile-time argument checking". ... Using prototypes syntax for subroutine calls, which overrides prototypes) can catch when you change the calling syntax for a sub but forget to change the parameters in one of its invocations, and coding errors based on a misunderstanding of Perl's operator precedence. ... regular expression contains, the quoting characters for the English "meanings" in kanjidic, but mine doesn't, so the expressions aren't equivalent. ... You shouldn't use &because it overrides prototypes sub ...
    (sci.lang.japan)
  • Re: kanjidic parser in Perl?
    ... they give programmers a false sense of security, because they only catch argument type and number errors under some circumstances and not others. ... that he make a point of including prototyped sub declarations ... I myself have found prototypes to be extremely valuable, ... John Chew http://www.poslfit.com ...
    (sci.lang.japan)
  • Re: Sweetest Accessor?
    ... care for mixing shift and @_ in the same sub -- visually. ... That isn't tied to shifting. ... You can't throw prototypes into an OO design. ...
    (comp.lang.perl.misc)
  • Re: IO::Socket::INET on OSX or TCP stack problem
    ... just declaring vars outside a sub makes them static. ... SG> I am using perl -w - I dont usually, but while I am trying to figure ... SG> perl -w complains if you don't use prototypes. ... perl doesn't complain if you don't use prototypes. ...
    (comp.lang.perl.misc)
  • Re: why doesnt this argument list need a comma after the 1st argument?
    ... I'm reading this very interesting book on Perl (Effective Perl Programming ... that the anonymous subroutine does not require the "sub" keyword, ... Prototypes were introduced to allow you to write ... block (without a comma) like this, and treat it as an anon sub. ...
    (comp.lang.perl.misc)