How to write a parser using rust
and lalrpop
crate?
In this article, we are going to walk through the implementation of parsing a string like this:
1This is a {dog, cat}.
And we want to expand this string into the following two strings:
1This is a dog.2This is a cat.
As we can see, these strings in the brackets separated by comma (,
) are
expected to be expanded into different pieces and concatenated separately with
the rest of the template as the final string.
And what's more, we would expect the brackets could be nested, like this:
1This is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}.
So this complicated string could be expanded into the following strings according to our rule:
1This is a blue apple.2This is a blue banana.3This is a yellow apple.4This is a yellow banana.5This is a colorful apple.6This is a colorful banana.7This is a green dog.8This is a green doggy.9This is a green cat.
How to implement?
These rules work like a little computer programming language. So we can use the common techniques that
are used to implement a computer programming language like c
, rust
, javascript
, etc.
We need to parse the string into an Abstract Syntax Tree
, then we generate the string
results by walking on the Abstract Syntax Tree
.
The crate lalrpop
is created for this purpose. It can be used to generate a parser
by using a rule definition file provided by us.
Here it is:
ast.rs
1///2/// Expr3/// = ( <Item::Word> | '{' (<Expr>,)* <Expr> '}' )*4///5pub struct Expr(pub Vec<Box<ExprItem>>);67/// ExprItem8pub enum ExprItem {9 /// Word10 Word(String),1112 /// Candidates = '{' (<Expr>,)* <Expr> '}'13 Candidates(Vec<Box<Expr>>),14}
parser.lalrpop
1use std::str::FromStr;2use crate::ast::{Expr, ExprItem};3use lalrpop_util::ErrorRecovery;45grammar<'err>(6 errors: &'err mut Vec<7 ErrorRecovery<usize, Token<'input>, &'static str>8 >9);1011pub Expr: Box<Expr> = {12 <v: (<ExprItem>)*> => Box::new(Expr(v)),13};1415ExprItem: Box<ExprItem> = {16 <s:r"[^{},]*"> => Box::new(ExprItem::Word(s.to_string())),17 "{" <v:Comma<Expr>> "}" => Box::new(ExprItem::Candidates(v)),18};1920Comma<T>: Vec<T> = {21 <v:(<T> ",")*> <e:T> => {22 let mut v = v;23 v.push(e);24 v25 },26};
Next, we can write a function to expand the Expr
structure.
1use crate::ast;2use crate::parser;34/// expand a string expression5pub fn expand_string<'input, S: AsRef<str>>(6 expression: &'input S,7) -> Result<8 Vec<String>,9 lalrpop_util::ParseError<usize, parser::Token<'input>, &'static str>10> {11 let mut errors = Vec::new();12 let expr_parser = parser::ExprParser::new();13 let expr_str = expression.as_ref();14 let expr = expr_parser.parse(&mut errors, expr_str)?;1516 let result = expand_expr(&expr);1718 Ok(result)19}2021/// expand an ast::Expr expression22pub fn expand_expr(expr: &ast::Expr) -> Vec<String> {23 let mut result_list = Vec::<String>::new();24 let mut stack = Vec::new();25 expand_expr_impl(expr, 0, &mut stack, &mut |s: &mut Vec<String>| {26 result_list.push(s.join(""));27 });28 result_list29}3031fn expand_expr_impl(32 expr: &ast::Expr,33 idx: usize,34 stack: &mut Vec<String>,35 callback: &mut dyn FnMut(&mut Vec<String>),36) {37 // end of current `expr`38 if idx == expr.0.len() {39 callback(stack);40 return;41 }4243 let item: &ast::ExprItem = &expr.0[idx];44 match *item {45 ast::ExprItem::Word(ref val) => {46 stack.push(val.to_owned());47 expand_expr_impl(expr, idx + 1, stack, callback);48 stack.pop();49 }50 ast::ExprItem::Candidates(ref candidates) => {51 if candidates.is_empty() {52 expand_expr_impl(expr, idx + 1, stack, callback);53 } else {54 for candidate in candidates {55 expand_expr_impl(&candidate, 0, stack, &mut |res| {56 // continue to expand next item in `expr`57 // if current `candidate` ends58 expand_expr_impl(expr, idx + 1, res, callback);59 });60 }61 }62 }63 }64}
Finally, we test our expand_expr
function like this:
1fn main() {2 let expressions = vec![3 "this is a {dog, cat}.",4 "this is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}."5 ];67 for expr_str in expressions.iter() {8 println!(">> expanding string: {}", expr_str);9 match expand::expand_string(expr_str) {10 Err(err) => {11 println!("expand failure: {}", err);12 }13 Ok(ref result_list) => {14 for candidate in result_list {15 println!("{}", candidate);16 }17 }18 }19 }20}
And it will output:
1>> expanding string: this is a {dog, cat}.2this is a dog.3this is a cat.4>> expanding string: this is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}.5this is a blue apple.6this is a blue banana.7this is a yellow apple.8this is a yellow banana.9this is a colorful apple.10this is a colorful banana.11this is a green dog.12this is a green doggy.13this is a green cat.
Done.