YARD (Yet Another Recursive Descent Parser)

by Christopher Diggins
part of OOTL.org

Table Of Contents

  1. Introduction
  2. Status and Downloads
  3. A Little Language Theory
  4. BNF - Backus Naur Form
  5. R-D Parsers
  6. Further Reading

Introduction

The YARD parser is an open-source library for constructing high-performance recursive-descent parsers in C++ at compile-time using a syntax approximating EBNF (Extended Backus-Naur Form). The distinguishing features of YARD are:

  • no intermediate code generation step
  • fast compilation
  • fast execution
  • powerful extended BNF syntax
  • small code footprint
  • no third party library dependencies
  • easily extended
  • generic (parses arbitrary data)
  • automated parse tree generation
  • header only

In plain english, the YARD parser is a pattern matching library, plain and simple, but as anyone who has done pattern matching knows, pattern matching is anything but simple.

The YARD library comes with several pre-built pattern matchers, but what is truly interesting is the meta-language it provides for expressing patterns as types. For more on that see the section on grammars, but first I want to provide a bit of theoretical background.

Motivation

Heron was motivated by the need to come up with a parsing engine for a non-trivial language, which was efficient, with a small footprint, and easily modified. I needed the parsing rules to reflect the grammar production rules as closely as possible, and I wanted the generation of templates to be visible to the user. This ruled out expression-template based parsers such as Boost Spirit.

Status and Downloads

The latest source-code of the YARD parser is available as part of the Heron Download. I provide no technical support for YARD, but I have been using it for two years to develop parsers and compilers. I

YARD can only be compiled on highly confroming compilers such as the Visual C++ 7.1 and GCC 3.4 compilers. YARD uses advanced template meta-programming techniques.

A Little Language Theory

For those interested here is some background in patterns and language theory.

Pattern matching is fundamental to the study of computer science, as every algorithmic problem can be expressed in terms of a pattern matching problem. In computer science theory, what I have been casually calling so far a pattern, is known as a formal language, or just language.

A formal language is a set of strings of symbols from an alphabet of symbols.
There are several theoretical classes of languages, some of the more studied are:
  • regular languages
  • context-free languages
  • context-sensitive languages
  • recursively enumerable languages
These classifications of languages are known collectively as the Chomsky hierarchy, named after the acclaimed linguist Noam Chomsky.

Languages are commonly defined as a set of productions (also known as substitution rules or generation rules), known as a grammar.

BNF (Backus-Naur Form) and EBNF

A BNF (Backus-Naur Form) is a formal method of specifying context free grammars (CFG). A BNF is expressed as a set of production rules of one of the three forms:

  X ::== A* // kleene star, repeatedly match A 0 or more times
  Y ::== A | B // union, match A or B
  Z ::== AB // concatenation, match A then B

EBNF standads for extended BNF, and represents an informal set of extensions to BNF, which are designed to make BNF more convenient to work with.

Recursive Descent (R-D) Parsers

The distinguishing characteristic of a R-D parser, is that it employs a recursive brute-force algorithm to attempt to match the various grammar production rules. If a match fails, the parser backs up to the position it was when it attempted to match the failing rule.

Tradtionally R-D parsers were shunned due to efficiency concerns, but that is a misnomer. Their performance can be very good as evidenced by the performance of parsers built using YARD and other successful R-D parsing libraries.

The primary advantage of R-D parsers are their simplicity and elegance.

Further Reading

Related Projects

  • The Biscuit Parser Library - Biscuit is an object-oriented recursive-descent parser generator framework implemented using class templates. Inspired by YARD.
  • Boost Spirit - Spirit is an object oriented recursive descent parser framework implemented using template meta-programming techniques and is part of Boost.

Tutorials

Books


last modified 2005-10-26
by Christopher Diggins
back to http://www.ootl.org