Wednesday, January 20, 2016

Debugger Type Proxies are Awesome

In OILexer, when you define a rule as an alternation of other rules, such as:

A ::=> /* The '>' represents a point of collapse. */
   B |
   C |
   D ;

I was working on OILexer's code gen process and I've always been a little miffed by the level of deep diving into object graphs that I've had to do for left-recursive representations of mathematical operator precedence (maths, * before + and so on)

Sometimes to see that they typed 5, I would have to go n levels deep, where n is the number of precedences defined in the language.

Instead of making a type that represents 'A' and giving it a single child of either a B, C or D, the logical solution was to use inheritance, and if you have a 'D' in place of that 'A', you'd have a derivative of D which is also an A.

This is all well and good, but when you're trying to debug an app, it can create some developer-based bottle necks (time-wise.)

Here's an example of what debugging was like before the automated DebuggerTypeProxy instances:

After automating the debugger type proxies:

You'll quickly notice that getting to the meat of the definition is near immediate by comparison.

The example, in both screenshots, represents the opening concept, just expanded out a bit further.

This does create some nasty object hierarchies but the trade-off of generated types for the simplicity of browsing the object hierarchy makes it worth it in the end.

OILexer - Time for Errors

The next step with OILexer is error handling. I have very (basic) error detection logic in place, but as of right now, it doesn't handle errors at all, it just kind of ignores them. On left-recursive rules it unwinds back to a 'valid' point and continues on.


On non-left recursive rules it marks the rule as having an error and goes from there.


What I'll likely end up doing is keeping track of how far into the stream a particular parse gets and the error collection will be enumerated if the EOF marker hasn't been hit. Then, based on how far the 'bad' rule got, isolate what was valid at that point, and provide a logical error for the results.


I might have to do occasional polling of the error stream as the parse continues and eliminate the false positives.


Once error handling is in place, my next step is to isolate how to detect recursively infinite systems, or more accurately how to reduce them into something that can complete. Certain normal languages, like C#, don't complete, they just spin around the same sets of rules indefinitely.


This is largely due to the rules that share common token patterns that are recursively enumerable in a way where they replicate the same pattern and just get longer, and longer in their depth. An example is relational comparison expressions and generic types. Both use Identifier, Less Than, Identifier, ad nauseum and this results in something that 'does not complete'. The web of rule identities and the logical path to how it got to each is a spectacular mess, trying to put some logic in place that isolates these cases will be a challenge.


Once these are figured out, the likely solution will be to 'parse one, then try the other' Since logically one will complete with success and the other will likely not. The fun part is it can sometimes lead to two valid paths, and the 'longest' winner takes home the prize.


After all this is said and done, I should have enough in place to 'boot strap' the real OILexer, an amalgamation between C#-esque syntax with OILexer's syntax mixed in.


Pattern matching is in itself an integral part of a large number of domains, and simplifying the means to which it can be used is one of my goals.


To that end, I'll need to do a lot of 'preprocessing' work with OILexer's data once that phase is hit, as it's extremely slow at doing its thing. This is largely due to the complex set analysis work it does, but once I have the Metadata Library writer, I could essentially write the tables that represent the OILexer code and pack that into the resulting assembly if it's as new or newer than the source files that make up that portion of the assembly. There'll need to be patch-up work, but it'll effectively be .obj files that C++-style compilers use.