Newer Older
1 2 3 4
This file contains a series of ideas and remarks that could be in
the TODO list -- except I do not intend to do anything about them,
for now.

POTTIER Francois's avatar
POTTIER Francois committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
* Incompatibility with ocamlyacc/yacc/bison: these tools guarantee
  to perform a default reduction without looking ahead at the next
  token, whereas Menhir does not.
  (See messages by Tiphaine Turpin from 30/08/2011 on.)
  - Changing this behavior would involve changing both back-ends.
  - Changing this behavior could break existing Menhir parsers.
    (Make it a command line option.)
  - This affects only people who are doing lexical feedback.
  - Suggestion by Frédéric Bour: allow annotating a production with %default
    to indicate that it should always be a default reduction. (Not sure why
    this helps, though.)
  - Think about end-of-stream conflicts, too.
    If there is a default reduction, there is no end-of-stream conflict.
      (Do we report one at the moment?)
    If there is a conflict, why do we arbitrarily solve it in favor of
    looking ahead (eliminating the reduction on #)? What does ocamlyacc do?

* Idea following our work with Jacques-Henri: allow asserting that the
  distinction between certain tokens (say, {A, B, C}) cannot influence
  which action is chosen. Or (maybe) asserting that no reduction can take
  place when the lookahead symbol is one of {A, B, C}. The goal is to ensure
  that a lexer hack is robust, i.e., even if the lexer cannot reliably
  distinguish between A, B, C, this cannot cause the parser to become

30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
* Incompatibility with ocamlyacc: Menhir discards the lookahead token when
  entering error mode, whereas ocamlyacc doesn't. (Reported by Frédéric Bour.)

* Look into the format of Bison's tables, and see if we could produce such
  tables upon demand. This could hopefully / possibly be implemented outside
  Menhir using the .cmly API.

* Implement $0, $-1, etc.
  (Along the lines of Bison. They are a potentially dangerous feature, as
   they allow peeking into the stack, which requires the shape of the stack
   to be known / guessed by the programmer.)
  Propose a named syntax, perhaps <x> foo: ...
    where the name x is bound to the value in the topmost stack cell.
  Ensure that the mechanism is type-safe.
    (Is --infer required? I don't think so. Just issue type constraints.)
    (Requires an analysis so that the shape of the stack is known. The
     existing analysis in Invariant may be sufficient.)
  Implement it in both back-ends.
  On top of this mechanism, it is easy to implement mid-rule actions (à la Bison).
  On top of that, it should be easy to implement inherited attributes (à la BtYacc).
  However, my impression so far is that the whole thing is of interest
  mainly to people who are doing lexer hacks. Otherwise, it is much easier
  to just parse, produce a tree, and transform the tree afterwards.
POTTIER Francois's avatar
POTTIER Francois committed
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

* Instead of directly invoking ocamlc (in --infer mode), Menhir could
  dump an file,
  expect the build system to invoke "ocamlc -i" and create .actions.mli,
  and continue (in a second invocation) from there.
  (Suggestion by Fabrice Le Fessant.)

* Try generating a canonical automaton, resolving conflicts, and THEN
  minimizing the automaton. Would the result be close to IELR(1)?
  (Unfortunately, producing a canonical automaton remains a bit costly:
  8 seconds for OCaml's grammar.)

* Why does --canonical --table consume a huge amount of time on a large grammar?
  (3m57s for ocaml.mly, versus 16s without --table)
  Display how much time is spent compressing tables.
POTTIER Francois's avatar
POTTIER Francois committed
69 70 71 72 73 74 75

* Explain end-of-stream conflicts, too.

* If one wishes to assign a priority level to a token, without choosing
  an associativity status (%left, %right, %nonassoc) one should be allowed
  to declare %neutral and obtain an unspecified associativity status
  (causing an error if this status is ever consulted).
POTTIER Francois's avatar
POTTIER Francois committed
76 77 78 79 80 81 82 83

* Since the table back-end does not use the module Invariant,
  it should be possible to save time, in --table mode,
  by not running the analysis in Invariant.
  That would require making Invariant a functor,
  and calling it inside CodeBackend, CoqBackend, and Interpret (complicated).
  Or perhaps just running the computation on demand (using lazy)
  but that makes timing more difficult.