Is anyone looking at this? At first glance it seems straightforward.
Marpa-driven combinator parsing and tackle it.
Post by RockyPost by Jeffrey Kegler@Rocky: This is interesting to me. I've printed out your article and
will read it.
Lukas is right that natively Marpa (and just about any other parser) has
trouble with the "reverse Hollerith code" for lists. It actually makes the
grammar context-sensitive. But I think I might have some interesting ideas
about how to deal with this using Marpa,
Ah yes,... I had forgotten about Hollerith codes in FORTRAN and how it has
long been used as an example of something that is not context free. .
Post by Jeffrey KeglerI'm taking advantage of one of the beautiful things about PEG grammars
- that they can be created and run at run time and therefore tailored to
specific use cases.
Marpa can do this and is actually much better at it. Any grammar you
enter you get is parsed correctly, and any LR-regular grammar is parsed in
linear time.
One thing weird about my particular use case is that because I'm handling
basically 20 different grammars in a hierarchically OO way (to reduce
duplication), many of which are only slightly different, the parsing tool I
have also has a way of *removing *grammar rules. And it has a way of
profiling grammar coverage.
How does Marpa do here? (Providing grammar profiling is in fact pretty
trivial to add).
Post by Jeffrey KeglerI get a good deal of heat (and I suspect many reddit downvotes) for
pointing out that PEG only correctly implements a grammar if it is LL(1).
Go beyond that, and you get a subset of the language that the BNF describes.
Congratulations btw on seeing the power of 2nd order languages. Perhaps
you'll try them out with Marpa and will find them much easier and safer.
Actually this reminds me of a big favor I'd like to ask of someone who
uses Marpa.
Just a little background first to motivate this....
I'll be giving a talk at the Glasgow YAPC 2018, really to promote this
idea of decompiling at runtime to provide better position information.
I somewhat arbitrarily broke the topic up into two parts: describing B::DeparseTree
<https://metacpan.org/pod/B::DeparseTree>and showing how it can be used
in a gdb-like debugger I wrote for Perl Devel::Trepan
<https://metacpan.org/pod/Devel::Trepan>.
As part of an incentive to have a talk presented, I often beef up the code
for whatever I'm going to talk about. And for Devel::Trepan I thought that
a cool addition would be to
replace a lot of hacky code I have certain debugger commands and use Marpa
to simplify some of that. But given the amount of time I have spent and
will have to spend on B::DeparseTree see Rewriting B:Deparse and
Reintroducing B::DeparseTree and (part 1)
<http://blogs.perl.org/users/rockyb/2018/06/introducing-bdeparsetree-and-rewriting-bdeparse-part-1.html>,
I don't think I'll have time to work much on Devel::Trepan. Here's what I
had been
planning to do if I had more time, and I hope someone might help me out
here..
A gdb debugger like Devel::Trepan has (or should have) a notion of a
"location"; these are used in gdb "list" and "breakpoint" commands. But
building on that, are the concepts of a list range (in the "list"
command), and a breakpoint location (in the "break" command - it adds an
optional condition). These are all well defined in gdb.
In the corresponding Python gdb-like debugger, I have already done this.
So I already have grammars for these
<https://github.com/rocky/python3-trepan/blob/master/trepan/processor/parse/parser.py#L104-L146>
.
There is one deviation I make from gdb parsing. Rather than use some sort
of semantic lookup to distinguish between a function name and a file path,
I require the former be suffixed with "()". Having used this for a while it
really isn't a burden to add the () when needed and it makes things simpler
and cleaner looking.
I'd be eternally grateful if someone would volunteer to write the Marpa
equivalent grammar and enough of the semantics to capture the result
information in some sort of structure. To make things easier you don't have
to integrate this into Devel::Trepan if you don't want, it can just be a
standalone program. But if you want to add it it that's fine too ;-)
Having this done would cement Marpa and Devel::Trepan together, and I can
make a plug for it at YAPC 2018. (I was going to make a plug for it, anyway
as something I had intended to do. Of course it is better show that it's
been done rather than mention it as a todo item.)
Thanks everyone.
Post by Jeffrey KeglerOn Fri, Jun 22, 2018 at 8:30 AM, Lukas Atkinson <
Post by Lukas AtkinsonHi Rocky,
I don't think there's a way to avoid all those parser states.
However, trying to parse a BUILD_LIST_1000 rule seems very unusual to
me, since you'd have to create a BUILD_LIST<N> rule for all N.
I first scan opcodes for those that list the number of operands they
have, then custom grammar rules are
created for those.
It's clearly not workable to spell out all those rules in the grammar.
Post by Lukas AtkinsonBut you can't generalize to a pseudo-BNF rule like build_list(N) ::=
<expr>^N "BUILD_LIST_"<N> because that is not a context-free grammar.
You're thinking about this probably from a different but historically
more prevalent perspective. I'm building a custom grammar based on the
input that happens to appear. I'm taking advantage of one of the beautiful
things about PEG grammars - that they can be created and run at run time
and therefore tailored to specific use cases.
Post by Lukas AtkinsonTrying to directly parse an assembly-like notation back into an
AST-like structure is not a normal use for a parser, and you will have
difficulties with things like jumps, loops, and dup instructions.
Although I need to redo how control flow is done, in fact there has been
a decompiler for Python for a decade and half which works off of an Earley
algorithm parser. The deparser that I've taken over currently works for
python from verison 1.3 circa 1995 to the present 3.7. If one were always
to stay within "normal use", progress would be serioulsy impeded. I'm
imagining someone telling Benjamin Franklin that flying a kit in a
thunderstorm is not the "normal use" of Electricity.
At any rate you can read about this novel use of deparsing in
http://rocky.github.io/Deparsing-Paper.pdf
This is a case were not using Marpa and hand-rolling a program will be
Post by Lukas Atkinsonmuch easier (possibly using Marpa to parse the disassembled opcodes into a
data structure, but that sounds like overkill). This program would mostly
manipulate AST fragments on a stack, and can easily deal with
variable-arity operations like BUILD_LIST_N (by popping the top n
expressions from the output stack).
Actually, that's what the competing deparsers do. They are not as good
though :-)
Post by Lukas AtkinsonPost by RockyTo be clear on one thing: it's not simply that there are 1,000 states
at before the reduction, but that at each scan step, all of them get
updated. So we have quadratic behaviour: after seeing second expr we update
1 prior state, then on seeing another expr we update 2 prior states, and
then 3 prior states and so on. So in general we'll have n*(n-2)/2 updates.
ick.
Post by RockyHere's a parsing odditiy that came up in a doing this Python
decompiler, and I wonder if there's a good solution for it.
Python bytecode like a number of other bytecode is polish prefix so
the operator comes after all of the operands are seen.
To handle long lists of data in Python you can write
x = [1, 2, 3, 10, 20, ... ]
and the bytecode in some versions of Python might be
LOAD CONST 1
LOAD CONST 2
LOAD_CONST 10
LOAD_CONST 20
...
BUILD_LIST_1000
In pathelogical situations these lists can get arbitrarily long.
The build list has a count of how many operands it expects which
above I fold into the nonterminal. There is a grammar rule that gets
expr ::= build_list_1000
build_list_1000 ::= expr expr expr ... BUILD_LIST_1000
Finally I get to the problem: In an early algorithm parser, to
recognize this there would be 1000 states just for each place where the dot
could be in parsing build_list_1000. How would reduce the number of
intermediate states needed in parsing?
In theory there's always a solution using procedural parsing, but
this can get quite involved I think. Above the expressions were constants
simple constants but in practice and expression could be an aribitrary
expression which is somehting grammars are good at capturing and that I'd
not want to have to redo writing procedural code.
.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "marpa parser" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/marpa-parser/CExOShs6khU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.