Well, after lots of work, I think the predictive logic is to the point where I can finally start the major work of creating the object model, lexer, parser and so on.
As of right now it takes 14 minutes to compute the language's predictions with 105 million comparisons during the course of the Rule NFA and DFA Generation, State->State Ambiguity Prediction resolution, and Follow-State ambiguity resolution.
The dangling else issue is resolved by only concerning itself with situations where what follows on an edge state matches what is required by the calling rule. So, the typical dangling else issue won't be a concern because the parent context doesn't require the information.
Currently OILexer takes three minutes to handle the general case look-ahead prediction, but 11 minutes to do 'Object Model Construction'. In this I also handle the follow edge ambiguities, so I need to expand the steps of the process to time them appropriately.
Behind the scenes in this I also handle the identification of key named elements. If you call out a rule or token by name, it'll create a capture bucket for it. I've yet to bring this all together, but the scaffolding is there.
I would upload and link the file that denotes the transitional points of the language, but the traversable html it generates is 585MB in size. Suffice to say it doesn't currently process the language in x86 mode, I have to compile in x64 mode to give it more leg room.
I'll be posting back as I make more progress, I just know something is going to bite me in the arse, but time will tell.
This is just the parser phase, the process for handling the compiler aspects is taken on by Abstraction itself, which is a static compiler framework. I'm looking forward to the road ahead!
Tuesday, February 24, 2015
T*y♯ - C♯ esque language with OILexer mixed in.
Saturday, February 21, 2015
LL(*) Again
Seems the further I get the farther I seem from my goal.
Decided to further expand the Pseudocode generated by the system and found gobs of bugs and shortcuts that don't make sense (from the view of actually figuring out where to go next, a->b->c, parse b->b->c??)
C'est la vie, you learn by making mistakes, I thought it was going through a C♯-esque language too quickly, turns out, it was!
I've noticed a few things: my grammar was ambiguous in multiple areas, leading to a indeterminate state explosion that was planning on eating all of my system's resources (all 12 GB.)
The path sets that result from constructing just the statements and expressions for the language yield a 133MB file of pseudo code for disambiguation. A majority of this is just tracking the paths that were responsible for reaching that decision point.
My next step is to dig into the T*y♯ language to find out why it's failing assertions on certain predictive logic.