The next step with OILexer is error handling. I have very (basic) error detection logic in place, but as of right now, it doesn't handle errors at all, it just kind of ignores them. On left-recursive rules it unwinds back to a 'valid' point and continues on.
On non-left recursive rules it marks the rule as having an error and goes from there.
What I'll likely end up doing is keeping track of how far into the stream a particular parse gets and the error collection will be enumerated if the EOF marker hasn't been hit. Then, based on how far the 'bad' rule got, isolate what was valid at that point, and provide a logical error for the results.
I might have to do occasional polling of the error stream as the parse continues and eliminate the false positives.
Once error handling is in place, my next step is to isolate how to detect recursively infinite systems, or more accurately how to reduce them into something that can complete. Certain normal languages, like C#, don't complete, they just spin around the same sets of rules indefinitely.
This is largely due to the rules that share common token patterns that are recursively enumerable in a way where they replicate the same pattern and just get longer, and longer in their depth. An example is relational comparison expressions and generic types. Both use Identifier, Less Than, Identifier, ad nauseum and this results in something that 'does not complete'. The web of rule identities and the logical path to how it got to each is a spectacular mess, trying to put some logic in place that isolates these cases will be a challenge.
Once these are figured out, the likely solution will be to 'parse one, then try the other' Since logically one will complete with success and the other will likely not. The fun part is it can sometimes lead to two valid paths, and the 'longest' winner takes home the prize.
After all this is said and done, I should have enough in place to 'boot strap' the real OILexer, an amalgamation between C#-esque syntax with OILexer's syntax mixed in.
Pattern matching is in itself an integral part of a large number of domains, and simplifying the means to which it can be used is one of my goals.
To that end, I'll need to do a lot of 'preprocessing' work with OILexer's data once that phase is hit, as it's extremely slow at doing its thing. This is largely due to the complex set analysis work it does, but once I have the Metadata Library writer, I could essentially write the tables that represent the OILexer code and pack that into the resulting assembly if it's as new or newer than the source files that make up that portion of the assembly. There'll need to be patch-up work, but it'll effectively be .obj files that C++-style compilers use.