Saturday, July 25, 2009

C♯ Parser


It's taken a long time and a lot of research but I'm finally making headway on the C♯ parser I've been writing (well writing the parser compiler to write...).

Previously, I decided to attempt a backtrack-less, left-most derivation, recursive descent parser. In the end, all the research told me: it's not worth it. Other LL(*) implementations, such as ANTLR, that use recursive descent and error recovery features to handle back-tracking, typically involve large amounts of spaghetti code that make reading or discovering the original language description nigh impossible.

So I thought: there has to be a better way. So far the research into that has been a struggle, there's little information, from what I can tell, on non-recursive descent parsing techniques that are top-down left-most derivation in nature, especially if you're interested in LL(*) vs LL(k).

Another issue with Recursive Descent is Left recursion, in three different varieties: Direct, indirect and hidden. Hidden is the most irksome case, but important nonetheless. To alleviate this entire issue, I shifted gears and started down a deterministic model that would work similar to a DFA for Regular Expressions, except this one would accept recursion.

Instead of pre-compiling this information into the application, due to cyclic model issues, I decided that it would be more appropriate to focus on embedding the logic to compute this directly in the resultant application. With this, the application can delay large computational operations related to look-ahead/follow sets until the information is absolutely necessary (as in, what's next right now?)

Presently I'm hand-writing the logic, using the C# parser as a basis for data, and once I'm done I can bake the logic associated into the parser builder. I'll post more once there's more to be said.