Sunday, March 22, 2009

Look ahead exploration - LL(k)

Well about a week later, and a few dozen failures, I've learned a thing or two.

The initial attempt at a 'syntax tree' yielded a system wherein I had a true tree of the syntax, but the problem was, it didn't do a damn bit of good for a Recursive Descent parser. Extracting the information about which rule was related to what look-ahead was difficult and hackish, at best a guess. Which doesn't work if you don't want a backtracking RD parser.

So I decided to use the Syntax tree in a limited form: It'll express each rule in their flatform with no branching from that point. Next I obtain the full look-ahead for the rule at a given point in the rule, this does depth exploration and tells me the full scope of tokens that are available for a given look, but it doesn't tell me which it unifies on and so on. So I took it a step further, and created a look-ahead path tree. This yields a dataset which informs me of which tokens are used, and what path is necessary to get to them (ie. a stateless transition table that only explores rules relative to a given token). After obtaining the full look-ahead, I condensed the results so that common paths for the same token aren't duplicated (this ensures that the logic build for this set is minimal, and not redundant).

The next step is to utilize the above data with the rule flat-forms and build a more accurate tree of what's necessary, it'll contain the look-ahead information as well as everything else, which should assist in the overall logic construction. I'll have to add in a special case for direct and indirect left recursion, but it should be fine, I hope.

Once this nonsense is done, and I have it recognizing the language, I can focus on integrating the previously structured data sets. If it doesn't work, I'll just rewrite that part until it does.

No comments: