Monday, January 26, 2015

Revived! -- LL(*) prediction.

I'm finally making progress in this project.

Working on predictive LL(*) style parsers which will provide a prediction mechanism any time there is a non LL(1) compliant choice to be made.

Let's say you have a grammar which describes a class member declaration:

ClassMemberDeclaration                   ::=>
    ConstantDeclaration                     |
    FieldDeclaration                        |
    MethodDeclaration                       |
    LanguageDeclaration                     |
    PropertyDeclaration                     |
    EventDeclaration                        |
    IndexerDeclaration                      |
    OperatorDeclaration                     |
    ConstructorDeclaration                  |
    DestructorDeclaration                   |
    StaticConstructorDeclaration            |
    TypeDeclaration                         ;

All of these members are allowed 'Attributes' which begin with a '['.  Typically this would cause an LL(1) parser generator to halt, but here OILexer does the work to guess which path is valid:

// State 298 requires prediction: Predict298OnClassMemberDeclaration.
Within this method, when it encounters the '[' for the attributes it knows that all paths point to 'Attributes' so instead of trying to guess all of the tokens for attributes and then what's after that will disambiguate, it will simply call the ParseAttributes() and disambiguate further from there.

The look-ahead table is symbolic in nature, so be it a token or a rule, it'll go into the stream.

The results of the 'predict' method are the target state that the deterministic machine generated for a given rule would have gone into could it guess properly in one token.  So it not only tells it which rule to call, but where to go next.

The state machines are heavily reduced for now, but as I learn the viability of the process, it'll be better decidable.  There is a caveat, there might be possible incorrect guesses for now, and here's why:
  • Edge states which overlap the owning rule's tokens, the prediction might guess incorrectly.  
I'll be looking into fixing this, but for now I want to construct a parser from this point, and anything beyond that is a bonus.