class Grammar

Formal grammar made up of named regexes

class Grammar is Match {}

Every type declared with grammar, and not explicitly stating its superclass, becomes a subclass of Grammar.

grammar Identifier {
    token TOP       { <initial> <rest>* }
    token initial   { <+myletter +[_]> }
    token rest      { <+myletter +mynumber +[_]> }
    token myletter  { <[A..Za..z]> }
    token mynumber  { <[0..9]> }
}
 
say Identifier.isa(Grammar);                # OUTPUT: «True␤» 
my $match = Identifier.parse('W4anD0eR96');
say ~$match;                                # OUTPUT: «W4anD0eR96␤»

More documentation on grammars is available.

Methods

method parse

method parse($target, :$rule = 'TOP',  Capture() :$args = \(), Mu :$actions = Mu, *%opt)

Parses the $target, which will be coerced to Str if it isn't one, using $rule as the starting rule. Additional $args will be passed to the starting rule if provided.

grammar RepeatChar {
    token start($character) { $character+ }
}
 
say RepeatChar.parse('aaaaaa', :rule('start'), :args(\('a')));
say RepeatChar.parse('bbbbbb', :rule('start'), :args(\('b')));
 
# OUTPUT: 
# ｢aaaaaa｣ 
# ｢bbbbbb｣

If the actions named argument is provided, it will be used as an actions object, that is, for each successful regex match, a method of the same name, if it exists, is called on the actions object, passing the match object as the sole positional argument.

my $actions = class { method TOP($/) { say "7" } };
grammar { token TOP { a { say "42" } b } }.parse('ab', :$actions);
# OUTPUT: «42␤7␤»

Additional named arguments are used as options for matching, so you can for example specify things like :pos(4) to start parsing from the fifth (:pos is zero-based) character. All matching adverbs are allowed, but not all of them take effect. There are several types of adverbs that a regex can have, some of which apply at compile time, like :s and :i. You cannot pass those to .parse, because the regexes have already been compiled. But, you can pass those adverbs that affect the runtime behavior, such as :pos and :continue.

say RepeatChar.parse('bbbbbb', :rule('start'), :args(\('b')), :pos(4)).Str;
# OUTPUT: «bb␤»

Method parse only succeeds if the cursor has arrived at the end of the target string when the match is over. Use method subparse if you want to be able to stop in the middle.

The top regex in the grammar will be allowed to backtrack.

Returns a Match object on success, and Nil on failure.

method subparse

method subparse($target, :$rule = 'TOP', Capture() :$args = \(),  Mu :$actions = Mu, *%opt)

Does exactly the same as method parse, except that cursor doesn't have to reach the end of the string to succeed. That is, it doesn't have to match the whole string.

Note that unlike method parse, subparse always returns a Match object object, which will be a failed match (and thus falsy), if the grammar failed to match.

grammar RepeatChar {
    token start($character) { $character+ }
}
 
say RepeatChar.subparse('bbbabb', :rule('start'), :args(\('b')));
say RepeatChar.parse(   'bbbabb', :rule('start'), :args(\('b')));
say RepeatChar.subparse('bbbabb', :rule('start'), :args(\('a')));
say RepeatChar.subparse('bbbabb', :rule('start'), :args(\('a')), :pos(3));
 
 
# OUTPUT: 
# ｢bbb｣ 
# Nil 
# #<failed match> 
# ｢a｣

method parsefile

method parsefile(Str(Cool) $filename, :$enc, *%opts)

Reads file $filename encoding by $enc, and parses it. All named arguments are passed on to method parse.

grammar Identifiers {
    token TOP        { [<identifier><.ws>]+ }
    token identifier { <initial> <rest>* }
    token initial    { <+myletter +[_]> }
    token rest       { <+myletter +mynumber +[_]> }
    token myletter   { <[A..Za..z]> }
    token mynumber   { <[0..9]> }
}
 
say Identifiers.parsefile('users.txt', :enc('UTF-8'))
    .Str.trim.subst(/\n/, ',', :g);
 
# users.txt : 
# TimToady 
# lizmat 
# jnthn 
# moritz 
# zoffixznet 
# MasterDuke17 
 
# OUTPUT: «TimToady,lizmat,jnthn,moritz,zoffixznet,MasterDuke17␤»