Discussion:
Having fun with Clojure and Monads ...
Meikel Brandmeyer
2008-07-15 20:14:36 UTC
Permalink
Dear Clojurians,

... well fun, but no success. I seeked to conquer the world and failed
badly. What Haskell does with Hindley-Milner and type classes, blocks
the way in Clojure. We need a tag to dispatch on in a bind multimethod.
Clojure's metadata seemed to be perfect, but then.. they are not
supported by all types. And it seems I'm not alone as Marijn Haverbeke
states in his post [1]. He basically encountered the same issues in
Common Lisp.

So I decided to shift one gear down and start with something specific.
And because I'm a lucky guy I found a paper on Parsec [2], which was
exactly what I was looking for.

So back to the fun: parsers combinators for Clojure.

What are parser combinators actually? Compared to eg. yacc and friends,
which need a separate generation phase, such parsers are implemented
directly in the language. As objects of first-class they can be combined
with functions acting on parsers to create new parsers.

The basic building blocks are 'result' and 'bind'. '(result x)' simply
creates a parser, which returns always 'x' w/o consuming any input.

'(bind p f)' returns a parser, which matches the parser 'p' against the
input. On success it calls the function 'f' with the result of 'p'. 'f'
should return again a parser which is in turn applied to the input.
Whatever this parser returns is returned by bind.

This sounds a little complicated. Why the heck would we want to do
something like this? Well. One application is chaining several parsers
together. Assume 'letter' is parser, which matches an alphabetical
character like eg. \A. Then define a new parser 'two-letters'.

(def two-letters
(bind letter
(fn [c1]
(bind letter
(fn [c2]
(result (str c1 c2)))))))

What happens when the new 'two-letter' parser is applied to some input?
First the 'letter' parser is applied. It returns its result, let's call
it c1. This result is passed on to the first fn. Note the inner 'bind'
also returns a parser and so does the fn. So our requirement is
satisfied. This parser is applied the remaining input. In particular
that means, that the second 'letter' is applied to the input. We get the
result c2, which is again passed on to the second fn. And here something
strange happens. With the two results c1 and c2 we construct on the fly
a new parser with 'result', which returns the string consisting of c1
and c2, no matter to which input it is applied and without consuming
anything of this input. Remember: this parser is immediately applied to
the remaining input. Hence two-letters returns the string consisting of
c1 and c2.

Magic, but ugly. And complicated. Where has all the fun gone? Well.
Probably to Haskell. There it has the 'do' notation. With that, one can
write the above as follows. (Don't nail me down on the actual
correctness, but it should come close.)

do { c1 <- letter
; c2 <- letter
; result [c1] ++ [c2]
}

Clear, concise, nice. Ah, sugar sweet sugar. But wait a minute. Clojure
has a lisp-like language. It has macros. So one rather simple macro
later, we get a construct which fits nicely into the usual Clojure
appearance: 'let-bind'.

(let-bind [c1 letter
c2 letter
(result (str c1 c2)))

Clear, concise, nice. But this time in Clojure.

This construct already gives us some nice possibilities. Consider the
usual tag pair example.

(let-bind [tag open-tag
tc tag-content
_ (close-tag tag)]
(result [tag tc]))

Please read this carefully. Do you see the nice feature? Yep,
'close-tag' constructs a parser on the fly using the previously found
tag from 'open-tag'. So we get matching the _right_ end tag for free.
Short and elegant.

I'm excited to explore this area further.

I'm sorry for this long post about a topic probably no one is interested
in, but implementing this was so much fun. Finding out that the Clojure
solution is in no way more verbose than the Haskell way and that it fits
perfectly into the language was very exciting. Especially, since there
were some road blocks in the beginning. I just had to write something
about it.

Sincerely
Meikel

[1] Marijn Haverbeke:
"Why monads have not take the Common Lisp world by storm"
http://marijn.haverbeke.nl/monad.html

[2] Daan Leijen:
"Parsec - Direct Style Monadic Parser Combinators for the Real World"
http://research.microsoft.com/users/daan/parsec.html
--
|\ _,,,---,,_
/,`.-'`' -. ;-;;,_
|,4- ) )-,_..;\ ( `'-'
'---(_/--' `-'\_) fL http://kotka.de
Matt Revelle
2008-07-16 05:59:29 UTC
Permalink
Good stuff. Not nearly as fun as doing it yourself but just found
jParsec, a parser combinator library for Java.

http://jparsec.codehaus.org/
Post by Meikel Brandmeyer
Dear Clojurians,
... well fun, but no success. I seeked to conquer the world and failed
badly. What Haskell does with Hindley-Milner and type classes, blocks
the way in Clojure. We need a tag to dispatch on in a bind
multimethod.
Clojure's metadata seemed to be perfect, but then.. they are not
supported by all types. And it seems I'm not alone as Marijn Haverbeke
states in his post [1]. He basically encountered the same issues in
Common Lisp.
So I decided to shift one gear down and start with something specific.
And because I'm a lucky guy I found a paper on Parsec [2], which was
exactly what I was looking for.
So back to the fun: parsers combinators for Clojure.
What are parser combinators actually? Compared to eg. yacc and
friends,
which need a separate generation phase, such parsers are implemented
directly in the language. As objects of first-class they can be combined
with functions acting on parsers to create new parsers.
The basic building blocks are 'result' and 'bind'. '(result x)' simply
creates a parser, which returns always 'x' w/o consuming any input.
'(bind p f)' returns a parser, which matches the parser 'p' against the
input. On success it calls the function 'f' with the result of 'p'. 'f'
should return again a parser which is in turn applied to the input.
Whatever this parser returns is returned by bind.
This sounds a little complicated. Why the heck would we want to do
something like this? Well. One application is chaining several parsers
together. Assume 'letter' is parser, which matches an alphabetical
character like eg. \A. Then define a new parser 'two-letters'.
(def two-letters
(bind letter
(fn [c1]
(bind letter
(fn [c2]
(result (str c1 c2)))))))
What happens when the new 'two-letter' parser is applied to some input?
First the 'letter' parser is applied. It returns its result, let's call
it c1. This result is passed on to the first fn. Note the inner 'bind'
also returns a parser and so does the fn. So our requirement is
satisfied. This parser is applied the remaining input. In particular
that means, that the second 'letter' is applied to the input. We get the
result c2, which is again passed on to the second fn. And here
something
strange happens. With the two results c1 and c2 we construct on the fly
a new parser with 'result', which returns the string consisting of c1
and c2, no matter to which input it is applied and without consuming
anything of this input. Remember: this parser is immediately applied to
the remaining input. Hence two-letters returns the string consisting of
c1 and c2.
Magic, but ugly. And complicated. Where has all the fun gone? Well.
Probably to Haskell. There it has the 'do' notation. With that, one can
write the above as follows. (Don't nail me down on the actual
correctness, but it should come close.)
do { c1 <- letter
; c2 <- letter
; result [c1] ++ [c2]
}
Clear, concise, nice. Ah, sugar sweet sugar. But wait a minute. Clojure
has a lisp-like language. It has macros. So one rather simple macro
later, we get a construct which fits nicely into the usual Clojure
appearance: 'let-bind'.
(let-bind [c1 letter
c2 letter
(result (str c1 c2)))
Clear, concise, nice. But this time in Clojure.
This construct already gives us some nice possibilities. Consider the
usual tag pair example.
(let-bind [tag open-tag
tc tag-content
_ (close-tag tag)]
(result [tag tc]))
Please read this carefully. Do you see the nice feature? Yep,
'close-tag' constructs a parser on the fly using the previously found
tag from 'open-tag'. So we get matching the _right_ end tag for free.
Short and elegant.
I'm excited to explore this area further.
I'm sorry for this long post about a topic probably no one is
interested
in, but implementing this was so much fun. Finding out that the Clojure
solution is in no way more verbose than the Haskell way and that it fits
perfectly into the language was very exciting. Especially, since there
were some road blocks in the beginning. I just had to write something
about it.
Sincerely
Meikel
"Why monads have not take the Common Lisp world by storm"
http://marijn.haverbeke.nl/monad.html
"Parsec - Direct Style Monadic Parser Combinators for the Real World"
http://research.microsoft.com/users/daan/parsec.html
--
|\ _,,,---,,_
/,`.-'`' -. ;-;;,_
|,4- ) )-,_..;\ ( `'-'
'---(_/--' `-'\_) fL http://kotka.de
Michael Reid
2008-07-16 12:52:52 UTC
Permalink
Meikel,

Far from being something that nobody is interested in, I think this is
great work. I'm not expert, but I have seen a lot of interesting and
elegant ideas floating around in the Haskell community. Personally, I
have found the type-system to be too high a challenge for me to really
"get" Haskell fully, so bringing some of these things into the Clojure
space is both interesting and I think worthwhile.

Parser combinators have always seemed much more appealing to me than
the yacc codegen approach.

/mike.
Post by Matt Revelle
Good stuff. Not nearly as fun as doing it yourself but just found
jParsec, a parser combinator library for Java.
http://jparsec.codehaus.org/
Post by Meikel Brandmeyer
Dear Clojurians,
... well fun, but no success. I seeked to conquer the world and failed
badly. What Haskell does with Hindley-Milner and type classes, blocks
the way in Clojure. We need a tag to dispatch on in a bind
multimethod.
Clojure's metadata seemed to be perfect, but then.. they are not
supported by all types. And it seems I'm not alone as Marijn Haverbeke
states in his post [1]. He basically encountered the same issues in
Common Lisp.
So I decided to shift one gear down and start with something specific.
And because I'm a lucky guy I found a paper on Parsec [2], which was
exactly what I was looking for.
So back to the fun: parsers combinators for Clojure.
What are parser combinators actually? Compared to eg. yacc and friends,
which need a separate generation phase, such parsers are implemented
directly in the language. As objects of first-class they can be combined
with functions acting on parsers to create new parsers.
The basic building blocks are 'result' and 'bind'. '(result x)' simply
creates a parser, which returns always 'x' w/o consuming any input.
'(bind p f)' returns a parser, which matches the parser 'p' against the
input. On success it calls the function 'f' with the result of 'p'. 'f'
should return again a parser which is in turn applied to the input.
Whatever this parser returns is returned by bind.
This sounds a little complicated. Why the heck would we want to do
something like this? Well. One application is chaining several parsers
together. Assume 'letter' is parser, which matches an alphabetical
character like eg. \A. Then define a new parser 'two-letters'.
(def two-letters
(bind letter
(fn [c1]
(bind letter
(fn [c2]
(result (str c1 c2)))))))
What happens when the new 'two-letter' parser is applied to some input?
First the 'letter' parser is applied. It returns its result, let's call
it c1. This result is passed on to the first fn. Note the inner 'bind'
also returns a parser and so does the fn. So our requirement is
satisfied. This parser is applied the remaining input. In particular
that means, that the second 'letter' is applied to the input. We get the
result c2, which is again passed on to the second fn. And here something
strange happens. With the two results c1 and c2 we construct on the fly
a new parser with 'result', which returns the string consisting of c1
and c2, no matter to which input it is applied and without consuming
anything of this input. Remember: this parser is immediately applied to
the remaining input. Hence two-letters returns the string consisting of
c1 and c2.
Magic, but ugly. And complicated. Where has all the fun gone? Well.
Probably to Haskell. There it has the 'do' notation. With that, one can
write the above as follows. (Don't nail me down on the actual
correctness, but it should come close.)
do { c1 <- letter
; c2 <- letter
; result [c1] ++ [c2]
}
Clear, concise, nice. Ah, sugar sweet sugar. But wait a minute. Clojure
has a lisp-like language. It has macros. So one rather simple macro
later, we get a construct which fits nicely into the usual Clojure
appearance: 'let-bind'.
(let-bind [c1 letter
c2 letter
(result (str c1 c2)))
Clear, concise, nice. But this time in Clojure.
This construct already gives us some nice possibilities. Consider the
usual tag pair example.
(let-bind [tag open-tag
tc tag-content
_ (close-tag tag)]
(result [tag tc]))
Please read this carefully. Do you see the nice feature? Yep,
'close-tag' constructs a parser on the fly using the previously found
tag from 'open-tag'. So we get matching the _right_ end tag for free.
Short and elegant.
I'm excited to explore this area further.
I'm sorry for this long post about a topic probably no one is
interested
in, but implementing this was so much fun. Finding out that the Clojure
solution is in no way more verbose than the Haskell way and that it fits
perfectly into the language was very exciting. Especially, since there
were some road blocks in the beginning. I just had to write something
about it.
Sincerely
Meikel
"Why monads have not take the Common Lisp world by storm"
http://marijn.haverbeke.nl/monad.html
"Parsec - Direct Style Monadic Parser Combinators for the Real World"
http://research.microsoft.com/users/daan/parsec.html
--
|\ _,,,---,,_
/,`.-'`' -. ;-;;,_
|,4- ) )-,_..;\ ( `'-'
'---(_/--' `-'\_) fL http://kotka.de
Meikel Brandmeyer
2008-07-17 21:40:27 UTC
Permalink
Hello,

On Wed, 16 Jul 2008 08:52:52 -0400
Post by Michael Reid
Far from being something that nobody is interested in, I think this is
great work. I'm not expert, but I have seen a lot of interesting and
elegant ideas floating around in the Haskell community. Personally, I
have found the type-system to be too high a challenge for me to really
"get" Haskell fully, so bringing some of these things into the Clojure
space is both interesting and I think worthwhile.
I have not much experience with Haskell (although claiming to know a
bit, but that's already quite pushing the facts). OCaml I know a bit
better, but not for sufficiently large projects. So take this with a
grain of salt:

The type system helps you. Yes. It really does. The inferencing of
Haskell and OCaml helps you. If the compiler tells you something about
the types of your function, which you don't expect. Then you made a
mistake. (Yes. The compiler might also be wrong, but face it: blame
yourself first.) Only in edge cases like eg. optional and named
arguments, one sees that this is not handled very well by the
inferencing system.

It is different than say Clojure, but also quite interesting. Pattern
matching, currying, Functors (resp. Type Classes). You should really try
again. ... Hmmm.. Writing this on the Clojure list... Hmmm... Ok. Sorry,
Rich. ;)
Post by Michael Reid
Parser combinators have always seemed much more appealing to me than
the yacc codegen approach.
I only knew the yacc approach. It kind of started annoying me at work.
When I started to read about monads, I found out about parser
combinators and liked the idea immediately.

Sincerely
Meikel
Stuart Sierra
2008-07-17 14:36:58 UTC
Permalink
Really cool, Meikel. I never knew there was another way to do parsers
besides grammars & generators. Neat stuff.

A question: Would it be correct to say that the Parsec approach
produces both a lexer and a parser together?
-Stuart
Post by Meikel Brandmeyer
Dear Clojurians,
... well fun, but no success. I seeked to conquer the world and failed
badly. What Haskell does with Hindley-Milner and type classes, blocks
the way in Clojure. We need a tag to dispatch on in a bind multimethod.
Clojure's metadata seemed to be perfect, but then.. they are not
supported by all types. And it seems I'm not alone as Marijn Haverbeke
states in his post [1]. He basically encountered the same issues in
Common Lisp.
So I decided to shift one gear down and start with something specific.
And because I'm a lucky guy I found a paper on Parsec [2], which was
exactly what I was looking for.
So back to the fun: parsers combinators for Clojure.
What are parser combinators actually? Compared to eg. yacc and friends,
which need a separate generation phase, such parsers are implemented
directly in the language. As objects of first-class they can be combined
with functions acting on parsers to create new parsers.
The basic building blocks are 'result' and 'bind'. '(result x)' simply
creates a parser, which returns always 'x' w/o consuming any input.
'(bind p f)' returns a parser, which matches the parser 'p' against the
input. On success it calls the function 'f' with the result of 'p'. 'f'
should return again a parser which is in turn applied to the input.
Whatever this parser returns is returned by bind.
This sounds a little complicated. Why the heck would we want to do
something like this? Well. One application is chaining several parsers
together. Assume 'letter' is parser, which matches an alphabetical
character like eg. \A. Then define a new parser 'two-letters'.
    (def two-letters
      (bind letter
            (fn [c1]
              (bind letter
                    (fn [c2]
                      (result (str c1 c2)))))))
What happens when the new 'two-letter' parser is applied to some input?
First the 'letter' parser is applied. It returns its result, let's call
it c1. This result is passed on to the first fn. Note the inner 'bind'
also returns a parser and so does the fn. So our requirement is
satisfied. This parser is applied the remaining input. In particular
that means, that the second 'letter' is applied to the input. We get the
result c2, which is again passed on to the second fn. And here something
strange happens. With the two results c1 and c2 we construct on the fly
a new parser with 'result', which returns the string consisting of c1
and c2, no matter to which input it is applied and without consuming
anything of this input. Remember: this parser is immediately applied to
the remaining input. Hence two-letters returns the string consisting of
c1 and c2.
Magic, but ugly. And complicated. Where has all the fun gone? Well.
Probably to Haskell. There it has the 'do' notation. With that, one can
write the above as follows. (Don't nail me down on the actual
correctness, but it should come close.)
    do { c1 <- letter
       ; c2 <- letter
       ; result [c1] ++ [c2]
       }
Clear, concise, nice. Ah, sugar sweet sugar. But wait a minute. Clojure
has a lisp-like language. It has macros. So one rather simple macro
later, we get a construct which fits nicely into the usual Clojure
appearance: 'let-bind'.
    (let-bind [c1 letter
               c2 letter
      (result (str c1 c2)))
Clear, concise, nice. But this time in Clojure.
This construct already gives us some nice possibilities. Consider the
usual tag pair example.
    (let-bind [tag open-tag
               tc  tag-content
               _   (close-tag tag)]
      (result [tag tc]))
Please read this carefully. Do you see the nice feature? Yep,
'close-tag' constructs a parser on the fly using the previously found
tag from 'open-tag'. So we get matching the _right_ end tag for free.
Short and elegant.
I'm excited to explore this area further.
I'm sorry for this long post about a topic probably no one is interested
in, but implementing this was so much fun. Finding out that the Clojure
solution is in no way more verbose than the Haskell way and that it fits
perfectly into the language was very exciting. Especially, since there
were some road blocks in the beginning. I just had to write something
about it.
Sincerely
Meikel
    "Why monads have not take the Common Lisp world by storm"
   http://marijn.haverbeke.nl/monad.html
    "Parsec - Direct Style Monadic Parser Combinators for the Real World"
   http://research.microsoft.com/users/daan/parsec.html
--
  |\      _,,,---,,_
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---(_/--'  `-'\_)  fL                                http://kotka.de
Meikel Brandmeyer
2008-07-17 21:38:10 UTC
Permalink
Hello,

On Thu, 17 Jul 2008 07:36:58 -0700 (PDT)
Post by Stuart Sierra
Really cool, Meikel. I never knew there was another way to do parsers
besides grammars & generators. Neat stuff.
I also have mainly experience with yacc & co. But considering the
combinator libraries for Haskell, Java (as Matt pointed out), Ruby,
Python, ... (and hopefully soon Clojure), there seems to be something
with the idea.
Post by Stuart Sierra
A question: Would it be correct to say that the Parsec approach
produces both a lexer and a parser together?
Hmm.. Since I have no formal CS education this is an interesting
question for me. (Whether it is really interesting, I cannot judge.)

So what is actually the difference between a lexer and a parser?
Stressing the wikipedia servers a bit, gives:

Lexer: reads a stream of characters and produces a token stream
Parser: reads a stream of tokens and produces a syntax tree (amongst others)
Blob: reads a stream of Xs and produces a Y

The grammar is composed of different rules:
expr ::= num "+" num

The tokens are composed of different ... well ... rules:
num ::= digits "." digits

Ok. Obviously I'm too stupid to understand the difference between a
Lexer and a Parser.

... Re-reading the lexer entry in the german wikipedia I found the
following sentence (freely translated): "A scanner is a special *parser*
and ..." Hmmm.. There is maybe no difference?

But I also read in some post on the Net (don't remember where), that it
is obviously of advantage to split the two steps. Why this is "obvious"
remained trivial or easy to see or something of that type.

The more I think about it, the more I think that this distinction is
somewhat artificial. So for me the question is less whether Parsec
combines Lexer and Parser, but more whether there is something to
combine or whether there was only one thing to begin with.

But as I said: I have no formal CS education whatsoever. So this might
be totally wrong.

Sincerely
Meikel
lpetit
2008-07-18 10:44:54 UTC
Permalink
Hello,
[...]
Lexer:  reads a stream of characters and produces a token stream
Parser: reads a stream of tokens     and produces a syntax tree (amongst others)
[...]
A lexer sees the characters in sequence, for example for a stream of
characters "(defn toto..." it will consume the left paren, the letter
d, the letter e, ..., the space, the letter t, ... and it will read
all the source code this way, producing as an output a sequence of
tokens : the token LEFT_PAREN, a token of type symbol and value
"defn" , a token of type symbol and value "toto") ...

The parser will consume the output of the lexer and constructing a
hierarchy : a left paren creates a new subform, ... => the output will
be a hierarchical tree of forms, with token atoms as their leeves.
This is also known as an "Abstract syntax tree".

A difference I think I see between a two-pass process : lexer, then
parser, and a parser "a la" Parsec, is that you can not contextualise
the lexer. It must be pre-defined, and can not adapt to the context.

When you do lexing/parsing in parallel, you can at certain points of
the parsing change the rules of the lexer.

My 0.02€,

--
Laurent

Continue reading on narkive:
Loading...