Skip to content

Will Xu's Blog

Parsing with crate LALRPOP

Parser, Rust1 min read

How to write a parser using rust and lalrpop crate?

In this article, we are going to walk through the implementation of parsing a string like this:

1This is a {dog, cat}.

And we want to expand this string into the following two strings:

1This is a dog.
2This is a cat.

As we can see, these strings in the brackets separated by comma (,) are expected to be expanded into different pieces and concatenated separately with the rest of the template as the final string.

And what's more, we would expect the brackets could be nested, like this:

1This is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}.

So this complicated string could be expanded into the following strings according to our rule:

1This is a blue apple.
2This is a blue banana.
3This is a yellow apple.
4This is a yellow banana.
5This is a colorful apple.
6This is a colorful banana.
7This is a green dog.
8This is a green doggy.
9This is a green cat.

How to implement?

These rules work like a little computer programming language. So we can use the common techniques that are used to implement a computer programming language like c, rust, javascript, etc.

We need to parse the string into an Abstract Syntax Tree, then we generate the string results by walking on the Abstract Syntax Tree.

The crate lalrpop is created for this purpose. It can be used to generate a parser by using a rule definition file provided by us.

Here it is:

  • Abstract Syntax Tree definition file: ast.rs
1///
2/// Expr
3/// = ( <Item::Word> | '{' (<Expr>,)* <Expr> '}' )*
4///
5pub struct Expr(pub Vec<Box<ExprItem>>);
6
7/// ExprItem
8pub enum ExprItem {
9 /// Word
10 Word(String),
11
12 /// Candidates = '{' (<Expr>,)* <Expr> '}'
13 Candidates(Vec<Box<Expr>>),
14}
  • Rule definition file: parser.lalrpop
1use std::str::FromStr;
2use crate::ast::{Expr, ExprItem};
3use lalrpop_util::ErrorRecovery;
4
5grammar<'err>(
6 errors: &'err mut Vec<
7 ErrorRecovery<usize, Token<'input>, &'static str>
8 >
9);
10
11pub Expr: Box<Expr> = {
12 <v: (<ExprItem>)*> => Box::new(Expr(v)),
13};
14
15ExprItem: Box<ExprItem> = {
16 <s:r"[^{},]*"> => Box::new(ExprItem::Word(s.to_string())),
17 "{" <v:Comma<Expr>> "}" => Box::new(ExprItem::Candidates(v)),
18};
19
20Comma<T>: Vec<T> = {
21 <v:(<T> ",")*> <e:T> => {
22 let mut v = v;
23 v.push(e);
24 v
25 },
26};

Next, we can write a function to expand the Expr structure.

1use crate::ast;
2use crate::parser;
3
4/// expand a string expression
5pub fn expand_string<'input, S: AsRef<str>>(
6 expression: &'input S,
7) -> Result<
8 Vec<String>,
9 lalrpop_util::ParseError<usize, parser::Token<'input>, &'static str>
10> {
11 let mut errors = Vec::new();
12 let expr_parser = parser::ExprParser::new();
13 let expr_str = expression.as_ref();
14 let expr = expr_parser.parse(&mut errors, expr_str)?;
15
16 let result = expand_expr(&expr);
17
18 Ok(result)
19}
20
21/// expand an ast::Expr expression
22pub fn expand_expr(expr: &ast::Expr) -> Vec<String> {
23 let mut result_list = Vec::<String>::new();
24 let mut stack = Vec::new();
25 expand_expr_impl(expr, 0, &mut stack, &mut |s: &mut Vec<String>| {
26 result_list.push(s.join(""));
27 });
28 result_list
29}
30
31fn expand_expr_impl(
32 expr: &ast::Expr,
33 idx: usize,
34 stack: &mut Vec<String>,
35 callback: &mut dyn FnMut(&mut Vec<String>),
36) {
37 // end of current `expr`
38 if idx == expr.0.len() {
39 callback(stack);
40 return;
41 }
42
43 let item: &ast::ExprItem = &expr.0[idx];
44 match *item {
45 ast::ExprItem::Word(ref val) => {
46 stack.push(val.to_owned());
47 expand_expr_impl(expr, idx + 1, stack, callback);
48 stack.pop();
49 }
50 ast::ExprItem::Candidates(ref candidates) => {
51 if candidates.is_empty() {
52 expand_expr_impl(expr, idx + 1, stack, callback);
53 } else {
54 for candidate in candidates {
55 expand_expr_impl(&candidate, 0, stack, &mut |res| {
56 // continue to expand next item in `expr`
57 // if current `candidate` ends
58 expand_expr_impl(expr, idx + 1, res, callback);
59 });
60 }
61 }
62 }
63 }
64}

Finally, we test our expand_expr function like this:

1fn main() {
2 let expressions = vec![
3 "this is a {dog, cat}.",
4 "this is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}."
5 ];
6
7 for expr_str in expressions.iter() {
8 println!(">> expanding string: {}", expr_str);
9 match expand::expand_string(expr_str) {
10 Err(err) => {
11 println!("expand failure: {}", err);
12 }
13 Ok(ref result_list) => {
14 for candidate in result_list {
15 println!("{}", candidate);
16 }
17 }
18 }
19 }
20}

And it will output:

1>> expanding string: this is a {dog, cat}.
2this is a dog.
3this is a cat.
4>> expanding string: this is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}.
5this is a blue apple.
6this is a blue banana.
7this is a yellow apple.
8this is a yellow banana.
9this is a colorful apple.
10this is a colorful banana.
11this is a green dog.
12this is a green doggy.
13this is a green cat.

Done.