Discussion:
On the importance of recognizing and using maps
Rich Hickey
2009-03-08 17:53:34 UTC
Permalink
It's great to see all of the Clojure libraries springing up as people
pull Clojure towards their application domains, and shape it for the
styles of programming they prefer.

In looking at some of the libraries, I am a bit concerned that maps
are not being used when the logical entity is in fact a map.

If you like Clojure, I would hope one of the things you like about it
is that 'everything works with everything' to as great an extent as
possible. This falls out of the fact that Clojure very much takes to
heart this Alan Perlis quote:

"It is better to have 100 functions operate on one data structure than
10 functions on 10 data structures."

And recasts it as:

It is better to have 100 functions operate on one data abstraction
than 10 functions on 10 data structures.

Recently it was said in a thread:

"You can do a lot of that in Clojure, too, but, unless I'm mistaken,
there are some arbitrary limits as things stand right now. ... you can
have any kind of structure you want, as long as it's a map, set, or
sequence."

I know people usually think of collections when they see vector/map/
set, and they think classes and types define something else. However,
the vast majority of class and type instances in various languages are
actually maps, and what the class/type defines is a specification of
what should be in the map. Many of the languages don't expose the
instances as maps as such and in failing to do so greatly deprive the
users of the language from writing generic interoperable code.

Classes and types usually create desert islands. When you say:

//Java
class Foo {int x; int y; int z;}

--Haskell
data Foo = Foo {x :: int, y :: int, z :: int}

you end up with types with a dearth of functionality. Sure, you might
get hashCode and equals for free, or some other free stuff by deriving
from Eq or Show, but the bottom line is you are basically starting
from scratch every time. No existing user code can do anything useful
with your instances.

Logically, instances of these classes/types are similar:

{:class Foo :x _ :y _ :z _}
{:data Foo :constructor Foo :x _ :y _ :z _}

i.e., they are maps. Note that what seems to be special, type/
constructor/class, is merely just a privileged attribute. If we choose
to represent similar things as maps in Clojure, we get a lot of
benefits - all existing map-based functionality will work with our
instances right out of the box.

So, I want to look at 3 of the contrib libraries, not to single them
out, nor to criticize them, but to warn about reduced utility and
interoperability due to choosing not to use maps.

First up is contrib.sql, where insert-rows and insert-values both take
a vector of column names followed by vectors of unlabeled values that
must be in the same order as the corresponding columns. I would hope
never to have such fragile things as those vectors in my programs.
OTOH, the query and update APIs take and return maps.

Next is contrib.datalog, where add-tuples looks like this:

(add-tuples db-base
[:employee :id 1 :name "Bob" :position :boss]
[:employee :id 2 :name "Mary" :position :chief-
accountant]
...)

I can't help thinking that making the relation (:employee) special is
an onion from prior systems that we shouldn't propagate. I don't want
to have such vectors in my program. Clojure.contrib Datalog does the
right thing in getting away from positional components - go all the
way and recognize relation as just another attribute and a tuple as a
map:

{:relation :employee :id 1 :name "Bob" :position :boss}

Making relation non-special is the key to making the logic system
readily useful to those who weren't thinking about designing
specifically for it.

Finally we have contrib.types, an algebraic data type system where
constructors generate vectors of unnamed components as instances.
Again, unnamed positional components are an onion of some algebraic
data type systems, there's no need to repeat that.

I'd very much like to see these libraries be interoperable, e.g. to
store ADTs in a database or query them with Datalog, and I know that
would be possible if they were all using maps consistently.

I guess I want to advocate - don't merely replicate the things with
which you are familiar. Try to do things in the Clojure way. If your
logical structure is a mapping of names to values, please use a map.
Positional data is fragile, non-self-descriptive and unmanageable
after a certain length - look at function argument lists. Note that
using maps doesn't preclude also having positional constructors, nor
does it dictate a space cost for repeating key names - e.g. structmaps
provide positional constructors and shared key storage.

I really appreciate the work everyone is doing, just trying to
maintain 'everything works with everything' with a nudge towards more
consistent use of maps. Don't build your API on an island.

Thanks,

Rich
Stuart Sierra
2009-03-08 19:02:08 UTC
Permalink
Post by Rich Hickey
I really appreciate the work everyone is doing, just trying to
maintain 'everything works with everything' with a nudge towards more
consistent use of maps. Don't build your API on an island.
And a good nudge it is! This reminds me of Steve Yegge's article on
the "Properties Pattern":
http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html

And, not to toot my own horn (honk!), two pieces I wrote:
http://stuartsierra.com/2009/01/10/data-are-data
http://stuartsierra.com/2006/07/05/the-only-data-structures-youll-ever-need

I don't even use struct-maps in Clojure, just plain 'ol maps, sets,
vectors, and lists. They really do provide everything you need.

-Stuart Sierra
Dan
2009-03-08 19:47:39 UTC
Permalink
Post by Rich Hickey
I guess I want to advocate - don't merely replicate the things with
which you are familiar. Try to do things in the Clojure way. If your
logical structure is a mapping of names to values, please use a map.
I tend to replace every instance of creating classes with creating
structs which, if I understood correctly, are maps too. Good habit or
should structs not be abused?
David Nolen
2009-03-08 20:03:01 UTC
Permalink
Structs are maps with shared keys and positional constructors as Rich
mentions in the original post. I think Rich is saying that maps should
indeed "be abused" ;) By building all "higher level" structures on top of
them, consumers are guaranteed not only your custom functionality, but all
the functionality guaranteed by the language itself.
Post by Dan
Post by Rich Hickey
I guess I want to advocate - don't merely replicate the things with
which you are familiar. Try to do things in the Clojure way. If your
logical structure is a mapping of names to values, please use a map.
I tend to replace every instance of creating classes with creating
structs which, if I understood correctly, are maps too. Good habit or
should structs not be abused?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
To unsubscribe from this group, send email to clojure+***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Phil Hagelberg
2009-03-08 20:13:45 UTC
Permalink
Post by Dan
Post by Rich Hickey
I guess I want to advocate - don't merely replicate the things with
which you are familiar. Try to do things in the Clojure way. If your
logical structure is a mapping of names to values, please use a map.
I tend to replace every instance of creating classes with creating
structs which, if I understood correctly, are maps too. Good habit or
should structs not be abused?
I'm pretty sure structs are only appropriate for when you need to eek
the absolute last iota of performance out of a collection, in which case
they can provide greater speed than maps. But since the list of keys is
fixed, it means it's more effort to add or rename a key than it is with
a map.

You shouldn't trade that flexibility for speed until (0) you are pretty
sure the keys are not going to change soon and (1) you know you can't
get the speed you need from maps. Neither of these are true when you're
just starting out on a piece of code.

-Phil
Dan
2009-03-08 22:13:19 UTC
Permalink
Post by Phil Hagelberg
I'm pretty sure structs are only appropriate for when you need to eek
the absolute last iota of performance out of a collection, in which case
they can provide greater speed than maps. But since the list of keys is
fixed, it means it's more effort to add or rename a key than it is with
a map.
Not really, I can assoc and dissoc as I wish and leave blank values I
wish. Any function can treat it as a map.
Post by Phil Hagelberg
You shouldn't trade that flexibility for speed until (0) you are pretty
sure the keys are not going to change soon and (1) you know you can't
get the speed you need from maps. Neither of these are true when you're
just starting out on a piece of code.
I don't use it for performance reason but for semantic ones. For
instance, in my code, I have:

(defstruct polygon :points :color)

This line tells me when I reread that polygon is significant concept
and that its attributes should be points and color. I'm relatively
confident this isn't going to change soon and if it does, I'll just
have to change the defstruct and the places that create polygons. Not
a significant burden.

Even if there was not performance implications, I'd use structs.
However, that might be the wrong thing to do so that's why I'm asking.
Shawn Hoover
2009-03-09 01:13:44 UTC
Permalink
Post by Dan
Post by Phil Hagelberg
I'm pretty sure structs are only appropriate for when you need to eek
the absolute last iota of performance out of a collection, in which case
they can provide greater speed than maps. But since the list of keys is
fixed, it means it's more effort to add or rename a key than it is with
a map.
Not really, I can assoc and dissoc as I wish and leave blank values I
wish. Any function can treat it as a map.
Close... you can assoc new keys into a struct instance, but you can't dissoc
any of the basis keys.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
To unsubscribe from this group, send email to clojure+***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Stephen C. Gilardi
2009-03-09 02:10:58 UTC
Permalink
Post by Shawn Hoover
Close... you can assoc new keys into a struct instance, but you
can't dissoc any of the basis keys.
That's right.

Given:
user=> (defstruct foo :a :b)
#'user/foo
user=> (def t (struct foo 3))
#'user/t

dissoc of a basis key throws an exception:

user=> (dissoc t :a)
java.lang.Exception: Can't remove struct key (NO_SOURCE_FILE:0)

I wonder if it's important to throw in this case or if it would be
more in keeping with the description:

"struct maps act just like maps, except they store their basis keys
efficiently"

if dissoc would associate nil ("nothing") with the key instead:

user=> (dissoc t :a)
{:a nil, :b nil}

Doing so would make the value associated with :a become the same as if
it had never been initialized, just like :b in this case.

Perhaps the choice between an exception and assoc'ing "nothing" comes
down to the distinction between:

"dissoc means remove this key from this map"
(where throwing an exception is clearly correct), and

"dissoc means remove any value associated with this key from this map"
(where assoc'ing nil might be preferable).

--Steve
Rich Hickey
2009-03-09 02:24:01 UTC
Permalink
Post by Stephen C. Gilardi
Post by Shawn Hoover
Close... you can assoc new keys into a struct instance, but you
can't dissoc any of the basis keys.
That's right.
user=> (defstruct foo :a :b)
#'user/foo
user=> (def t (struct foo 3))
#'user/t
user=> (dissoc t :a)
java.lang.Exception: Can't remove struct key (NO_SOURCE_FILE:0)
I wonder if it's important to throw in this case or if it would be
"struct maps act just like maps, except they store their basis keys
efficiently"
user=> (dissoc t :a)
{:a nil, :b nil}
Doing so would make the value associated with :a become the same as if
it had never been initialized, just like :b in this case.
Perhaps the choice between an exception and assoc'ing "nothing" comes
"dissoc means remove this key from this map"
(where throwing an exception is clearly correct), and
"dissoc means remove any value associated with this key from this map"
(where assoc'ing nil might be preferable).
dissoc definitely means the former. A mapping to nil is not no
mapping.

(contains? (dissoc m k) k) -> false

Rich
Mark Allerton
2009-03-08 20:36:23 UTC
Permalink
I'm kind of a newb to these parts, but I disagree somewhat with Phil
that structmaps are only useful as a performance optimization.

It also seems to me that because they make it convenient to create
positional constructors for map structures, they make life much easier
to concisely build data structures in a declarative style, without
needlessly repeating keys in collection structures - something that
will always be somewhat error prone. Some of the "DB rowset" examples
in Rich's post are great applications for a structmap - why write
:name on every row?

That said, there are obviously limits beyond which positional
constructors cause more problems than they solve, but I don't think
that's a reason to recommend against positional constructors (and
structmaps) wholesale.

..Mark..
Post by Phil Hagelberg
Post by Dan
Post by Rich Hickey
I guess I want to advocate - don't merely replicate the things with
which you are familiar. Try to do things in the Clojure way. If your
logical structure is a mapping of names to values, please use a map.
I tend to replace every instance of creating classes with creating
structs which, if I understood correctly, are maps too. Good habit or
should structs not be abused?
I'm pretty sure structs are only appropriate for when you need to eek
the absolute last iota of performance out of a collection, in which case
they can provide greater speed than maps. But since the list of keys is
fixed, it means it's more effort to add or rename a key than it is with
a map.
You shouldn't trade that flexibility for speed until (0) you are pretty
sure the keys are not going to change soon and (1) you know you can't
get the speed you need from maps. Neither of these are true when you're
just starting out on a piece of code.
-Phil
Stephen C. Gilardi
2009-03-09 01:05:22 UTC
Permalink
Post by Rich Hickey
First up is contrib.sql, where insert-rows and insert-values both take
a vector of column names followed by vectors of unlabeled values that
must be in the same order as the corresponding columns. I would hope
never to have such fragile things as those vectors in my programs.
For large data sets with a regular structure, insert-rows and insert-
values use the jdbc interface very efficiently. They are also
convenient building blocks for other functions to use.
Post by Rich Hickey
OTOH, the query and update APIs take and return maps.
clojure.contrib.sql now includes:

(defn insert-records
"Inserts records into a table. records are maps from strings or
keywords (identifying columns) to values."
[table & records]
(doseq [record records]
(insert-values table (keys record) (vals record))))

Here's an example from clojure.contrib.sql.test:

(defn insert-records-fruit
"Insert records, maps from keys specifying columns to values"
[]
(sql/insert-records
:fruit
{:name "Pomegranate" :appearance "fresh" :cost 585}
{:name "Kiwifruit" :grade 93}))

--Steve
Rich Hickey
2009-03-09 01:38:01 UTC
Permalink
Post by Stephen C. Gilardi
Post by Rich Hickey
First up is contrib.sql, where insert-rows and insert-values both take
a vector of column names followed by vectors of unlabeled values that
must be in the same order as the corresponding columns. I would hope
never to have such fragile things as those vectors in my programs.
For large data sets with a regular structure, insert-rows and insert-
values use the jdbc interface very efficiently.
Do you have a case where the map-unpacking dominates the I/O time? Or
is this just a speculative optimization?

I want to be clear, just because things come in maps doesn't mean you
can't have a higher-performance insert-uniform-records that takes maps
with identical sets of keys.
Post by Stephen C. Gilardi
They are also
convenient building blocks for other functions to use.
Hmm...
Post by Stephen C. Gilardi
Post by Rich Hickey
OTOH, the query and update APIs take and return maps.
(defn insert-records
"Inserts records into a table. records are maps from strings or
keywords (identifying columns) to values."
[table & records]
(doseq [record records]
(insert-values table (keys record) (vals record))))
(defn insert-records-fruit
"Insert records, maps from keys specifying columns to values"
[]
(sql/insert-records
:fruit
{:name "Pomegranate" :appearance "fresh" :cost 585}
{:name "Kiwifruit" :grade 93}))
Thanks!

Rich
Stephen C. Gilardi
2009-03-09 02:37:55 UTC
Permalink
Post by Rich Hickey
Do you have a case where the map-unpacking dominates the I/O time? Or
is this just a speculative optimization?
I was talking about the distinction between sending N value sets
across the JDBC interface in one call vs. in N calls. Unpacking maps
on the Clojure side and making the same one call is a good idea. Thanks.
Post by Rich Hickey
I want to be clear, just because things come in maps doesn't mean you
can't have a higher-performance insert-uniform-records that takes maps
with identical sets of keys.
Good point.

To offer the most efficiency in unpacking, the API could include:

insert-records
each record treated independently

insert-uniform-records
all subsequent records contain at least all the keys of the first
unpack with select-keys

insert-structs
all records are structs with the same basis
unpack with vals

My current thinking is that insert-structs doesn't offer enough
benefit over insert-uniform-records to be worth including.

--Steve
Brian Carper
2009-03-09 01:59:05 UTC
Permalink
Post by Rich Hickey
In looking at some of the libraries, I am a bit concerned that maps
are not being used when the logical entity is in fact a map.
One time I find myself abusing vectors where maps would be better is
in a situation where I have to retrieve key/value pairs in the order
they were inserted.

For example I'm working on a DSL for cascading stylesheets, and maps
should be perfect for specifying property declarations (right down to
curly braces for maps coincidentally and nicely matching the curly
braces in literal CSS).

But (css [:div {:padding "1px" :padding-left "5px"}]) won't work
because the order is lost. Depending on the order "padding" and
"padding-left" show up in the final CSS, the meaning changes. Not
sure what the best Clojure idiom would be in this case. Vectors that
look like maps are the best I've come up with.

So "ordered maps" would be useful. They have been endlessly re-
invented in the Ruby community, for example. I think in upcoming Ruby
1.9, maps that preserve their insertion order are the default, in
spite of obviously worse performance compared to true hashed maps.
Paul Stadig
2009-03-09 11:05:41 UTC
Permalink
Post by Brian Carper
Post by Rich Hickey
In looking at some of the libraries, I am a bit concerned that maps
are not being used when the logical entity is in fact a map.
One time I find myself abusing vectors where maps would be better is
in a situation where I have to retrieve key/value pairs in the order
they were inserted.
This is probably a good example (ordered pairs) of when the logical
entity is in fact not a map. I don't think Rich is advocating that
everything is a nail, because we have this great hammer.

I think using maps gives you much more flexibility. I don't disagree,
but I have trouble imagining exactly how it works sometimes. I guess I
just have too much of a Ruby/Java mindset. I keep thinking of a type
hierarchy and multimethods. How should you write the dispatch
function? I guess you could add a tag to the maps and use that to
dispatch. Or should you use a set of keys to dispatch (i.e. if the map
has :center, and :radius it is a circle, if it has :length and :width
it is a rectangle)? That can get messy for something slightly more
complicated. I guess you could just write a predicate for each type,
or a get-type function that does the check, but it still seems more
complicated than just declaring some classes. It's just a different
way of thinking for me that I have to get used to.

When reading Stu's book I found it interesting that you could declare
arbitrary type hierarchies using 'derive, so I know there are corners
of Clojure that I have not explored.


Paul
Steve Fisher
2009-03-09 07:04:35 UTC
Permalink
Post by Brian Carper
Post by Rich Hickey
In looking at some of the libraries, I am a bit concerned that maps
are not being used when the logical entity is in fact a map.
One time I find myself abusing vectors where maps would be better is
in a situation where I have to retrieve key/value pairs in the order
they were inserted.
For example I'm working on a DSL for cascading stylesheets, and maps
should be perfect for specifying property declarations (right down to
curly braces for maps coincidentally and nicely matching the curly
braces in literal CSS).
But (css [:div {:padding "1px" :padding-left "5px"}]) won't work
because the order is lost.  Depending on the order "padding" and
"padding-left" show up in the final CSS, the meaning changes.  Not
sure what the best Clojure idiom would be in this case.  Vectors that
look like maps are the best I've come up with.
How about an ArrayMap?

user=> (def x (array-map :a 11 :b 22 :c 33 :d 44 :e 55))
#'user/x
user=> x
{:a 11, :b 22, :c 33, :d 44, :e 55}
user=> (keys x)
(:a :b :c :d :e)
user=> (vals x)
(11 22 33 44 55)
mikel
2009-03-09 05:44:16 UTC
Permalink
Post by Rich Hickey
"You can do a lot of that in Clojure, too, but, unless I'm mistaken,
there are some arbitrary limits as things stand right now. ... you can
have any kind of structure you want, as long as it's a map, set, or
sequence."
I said that.
Post by Rich Hickey
From my point of view there are two different topics here that might
tend to get confused. One is a sort of academic musing about protocol
versus structure, how languages like Smalltalk and C++ conflate the
two, and how it is possible, in langauges like Haskell and, to a
lesser extent, Clojure, to completely separate them. On that spectrum,
I prefer the Haskell/Clojure end over the Smalltalk/C++ end, have have
preferred it for at least 15 years--since before there were clear
examples of language support for that approach.

The other topic is about the role of user-defined datatypes in Clojure
code. Maps and sequences are great and I use them all over the place.
Sometimes I want a convenient way to say that a datum will have these,
and *only* these, fields in it. I can do that in Clojure; the code to
write to do it for a particular case isn't very hard to write. On the
other hand, if two or three such cases arise in a project, then that
code becomes boilerplate. The boilerplate ends up getting factored out
into common functions or macros, and those functions or macros then
amount to an ad hoc typesystem. So, one way or another, Clojure
projects end up with user-defined types, whether Clojure provides the
mechanisms for defining them or the users do.

Clojure doesn't have an analog of defclass or of keyword arguments in
functions. I read and understood your reasons for not including those
features. I have no quibble with your decisions in those areas, as
long as Clojure doesn't actively prevent me from implementing things
like them.

I end up reinventing things with the flavor of defclass because I want
to explicitly spell out the layout of a set of structured values. I
want to be able to say, "exactly these fields, and no others", because
I'm describing values that some other code is going to be using, and I
want myself and other programmers to be able to look at the code later
and see those structures explicitly spelled out. When we have to
change some of them, I want code that depends on them to be able to
say, "hey! fix me!"

Similarly, I end up reinventing keyword arguments because I want to be
able to spell out explicitly in the source that a function will accept
any of a specified set of arguments and no others. I want to be able
to straightforwardly and definitely answer the question, "hey, what
were the allowed set of inputs to this function again?"

Clojure doesn't have to provide these facilities (though I wouldn't
mind if it did); it just needs to stay out of my way when I decide I
need to add them.
Mark Engelberg
2009-03-09 06:19:28 UTC
Permalink
Post by mikel
Clojure doesn't have to provide these facilities (though I wouldn't
mind if it did); it just needs to stay out of my way when I decide I
need to add them.
Yeah, as much as I like maps, I feel like there are several common
uses cases for maps that require more work in Clojure than other
languages. The most obvious example is tagged structs. In Clojure,
you need to do a defstruct, and then make your own custom constructor
that adds the tag, possibly another custom constructor that emulates
struct-map but adds the tag, and possibly a predicate that tests for
the tag.

I particularly like the way the Mozart language makes tagged structs
(they call them records) one of the core data structures that the
language is built around.
See section 3.6 at http://www.mozart-oz.org/home/doc/tutorial/node3.html

I can see why the existing system is more flexible (you can have more
than one tag, or no tag, or make the tag part of the metadata, or use
different labels for the tag other than :tag or :type), but I keep
feeling like 90% of the time I'd be happy to just use a standard
tagged struct. The good news is that it's easy to write a macro to do
all the boilerplate. The bad news is that everyone will write
different macros that tag these structures in different ways, and it's
not clear to me how well code written based on different tagging
standards will coexist.
mikel
2009-03-09 06:43:29 UTC
Permalink
Post by Mark Engelberg
Post by mikel
Clojure doesn't have to provide these facilities (though I wouldn't
mind if it did); it just needs to stay out of my way when I decide I
need to add them.
Yeah, as much as I like maps, I feel like there are several common
uses cases for maps that require more work in Clojure than other
languages.  The most obvious example is tagged structs.  In Clojure,
you need to do a defstruct, and then make your own custom constructor
that adds the tag, possibly another custom constructor that emulates
struct-map but adds the tag, and possibly a predicate that tests for
the tag.
That reminds me; struct-maps don't work with *print-dup*-style
serialization/deserialization; is that intentional for some reason, or
should I report it as an issue?
Post by Mark Engelberg
I particularly like the way the Mozart language makes tagged structs
(they call them records) one of the core data structures that the
language is built around.
See section 3.6 athttp://www.mozart-oz.org/home/doc/tutorial/node3.html
I can see why the existing system is more flexible (you can have more
than one tag, or no tag, or make the tag part of the metadata, or use
different labels for the tag other than :tag or :type), but I keep
feeling like 90% of the time I'd be happy to just use a standard
tagged struct.  The good news is that it's easy to write a macro to do
all the boilerplate.  The bad news is that everyone will write
different macros that tag these structures in different ways, and it's
not clear to me how well code written based on different tagging
standards will coexist.
Yup.
Rich Hickey
2009-03-09 12:54:26 UTC
Permalink
Post by mikel
Post by Mark Engelberg
Post by mikel
Clojure doesn't have to provide these facilities (though I wouldn't
mind if it did); it just needs to stay out of my way when I decide I
need to add them.
Yeah, as much as I like maps, I feel like there are several common
uses cases for maps that require more work in Clojure than other
languages. The most obvious example is tagged structs. In Clojure,
you need to do a defstruct, and then make your own custom constructor
that adds the tag, possibly another custom constructor that emulates
struct-map but adds the tag, and possibly a predicate that tests for
the tag.
That reminds me; struct-maps don't work with *print-dup*-style
serialization/deserialization; is that intentional for some reason, or
should I report it as an issue?
No, it's a known thing. In order to support efficient use as constants
struct bases need to be implemented in an AOT-friendly way akin to how
fns and proxies are now.

Rich
Rich Hickey
2009-03-09 12:46:44 UTC
Permalink
Post by Mark Engelberg
Post by mikel
Clojure doesn't have to provide these facilities (though I wouldn't
mind if it did); it just needs to stay out of my way when I decide I
need to add them.
Yeah, as much as I like maps, I feel like there are several common
uses cases for maps that require more work in Clojure than other
languages. The most obvious example is tagged structs. In Clojure,
you need to do a defstruct, and then make your own custom constructor
that adds the tag, possibly another custom constructor that emulates
struct-map but adds the tag, and possibly a predicate that tests for
the tag.
I particularly like the way the Mozart language makes tagged structs
(they call them records) one of the core data structures that the
language is built around.
See section 3.6 athttp://www.mozart-oz.org/home/doc/tutorial/node3.html
I can see why the existing system is more flexible (you can have more
than one tag, or no tag, or make the tag part of the metadata, or use
different labels for the tag other than :tag or :type), but I keep
feeling like 90% of the time I'd be happy to just use a standard
tagged struct. The good news is that it's easy to write a macro to do
all the boilerplate. The bad news is that everyone will write
different macros that tag these structures in different ways, and it's
not clear to me how well code written based on different tagging
standards will coexist.
I understand. I've pushed hard not to have the language dictate an
object system, and especially not to tie important capabilities like
polymorphic dispatch and hierarchy to a specific type system. That
said, I don't want to make the common cases hard or confusing.

There is now a supported convention for tagging, the :type metadata,
and the type function which returns either the :type meta or the Class
if none. This should provide a common way to handle the basic case - a
single type tag. These :type tags can participate in the hierarchy
system and thus provide a lot of flexibility.

I'd like to see what people do with that and drive any further common
utilities from shared needs.

Rich
mikel
2009-03-09 13:52:38 UTC
Permalink
Post by Rich Hickey
Post by Mark Engelberg
Post by mikel
Clojure doesn't have to provide these facilities (though I wouldn't
mind if it did); it just needs to stay out of my way when I decide I
need to add them.
Yeah, as much as I like maps, I feel like there are several common
uses cases for maps that require more work in Clojure than other
languages.  The most obvious example is tagged structs.  In Clojure,
you need to do a defstruct, and then make your own custom constructor
that adds the tag, possibly another custom constructor that emulates
struct-map but adds the tag, and possibly a predicate that tests for
the tag.
I particularly like the way the Mozart language makes tagged structs
(they call them records) one of the core data structures that the
language is built around.
See section 3.6 athttp://www.mozart-oz.org/home/doc/tutorial/node3.html
I can see why the existing system is more flexible (you can have more
than one tag, or no tag, or make the tag part of the metadata, or use
different labels for the tag other than :tag or :type), but I keep
feeling like 90% of the time I'd be happy to just use a standard
tagged struct.  The good news is that it's easy to write a macro to do
all the boilerplate.  The bad news is that everyone will write
different macros that tag these structures in different ways, and it's
not clear to me how well code written based on different tagging
standards will coexist.
I understand. I've pushed hard not to have the language dictate an
object system, and especially not to tie important capabilities like
polymorphic dispatch and hierarchy to a specific type system. That
said, I don't want to make the common cases hard or confusing.
There is now a supported convention for tagging, the :type metadata,
and the type function which returns either the :type meta or the Class
if none. This should provide a common way to handle the basic case - a
single type tag. These :type tags can participate in the hierarchy
system and thus provide a lot of flexibility.
I'd like to see what people do with that and drive any further common
utilities from shared needs.
What I'm presently doing about my wish for explicit declaration of
data layouts looks like this:

(define-model ::named-thing :required-keys [:name])

(define-model ::aged-thing :required-keys [:age])

(define-model ::person :extends [::named-thing ::aged-thing]
:required-keys [:weight]
:policy :strict)

user> (with-model ::person :name "Fred")
{:age nil, :weight nil, :name "Fred"}

user> (model (with-model ::person :name "Fred"))
:user/person

user> (with-model ::person :name "Fred" :eyes :blue)
java.lang.Exception: Model :user/person allows only these keys: #
{:age :weight :name}, but found these: #{:eyes} (NO_SOURCE_FILE:0)


with-model creates a map from the named model. model returns the model
used to create the map (the model name is in the map's metadata).

The :policy argument optionally tells define-model that no keys except
those listed (and those collected from the models in the :extends
cause) are permitted in maps created by with-model.

Models completely ignore dispatching and provide no support for
inheritance. You can of course create whatever hierarchy you want
using derive with the model names.

Having tried both the built-in multifunctions and my own
implementation of CLOS-style generic functions, I tend to want to use
my own. I can more flexibly and easily customize dispatching with
them, and they know how to choose the most specific method without
user intervention in situations that cause MultiFn dispatch to demand
hand-tweaking with prefer-method. (I wouldn't be surprised if gf
dispatch is substantially slower than MultiFn dispatch right now,
though.)

Models are completely orthogonal to generic functions; either facility
can be used without the other, or they can be used together.
Konrad Hinsen
2009-03-11 17:00:31 UTC
Permalink
Post by Rich Hickey
I know people usually think of collections when they see vector/map/
set, and they think classes and types define something else. However,
the vast majority of class and type instances in various languages are
actually maps, and what the class/type defines is a specification of
what should be in the map. Many of the languages don't expose the
instances as maps as such and in failing to do so greatly deprive the
users of the language from writing generic interoperable code.
My own experience is mostly with Python. Python objects are indeed
essentially maps (Python calls them dictionaries). But even though it
is easy to obtain the map equivalent of any object (object.__dict__),
I hardly see this being done. Python programmers tend to use maps and
objects in very different ways, and that includes experienced
programmers who are very well aware that objects are just maps plus a
type tag plus a set of methods.

One reason why generic everything-is-a-map code is not very common is
that the majority of object definitions include specific constraints
on the maps, most of them of the form "the map must have a key :x
whose value is an integer". The object's methods don't make sense for
maps that don't satisfy the constraints, and most generic map
operations don't make sense on most objects because they are unaware
of the constraints and in particular don't satisfy them in their
return values.

The one area where I have seen uses of object.__dict__ is low-level
data massaging protocols, like serialization or storage in databases.
And that is indeed a good reason to use just a few fundamental data
structures to represent everything.
Post by Rich Hickey
Finally we have contrib.types, an algebraic data type system where
constructors generate vectors of unnamed components as instances.
Again, unnamed positional components are an onion of some algebraic
data type systems, there's no need to repeat that.
Positional arguments do have their uses. In particular when there is
only one argument, it would be an unnecessary pain to have to name
it. On the other hand, I do see your point of having a uniform
internal representation in the form of maps.
Post by Rich Hickey
I'd very much like to see these libraries be interoperable, e.g. to
store ADTs in a database or query them with Datalog, and I know that
would be possible if they were all using maps consistently.
One problem I see with storing ADTs (or anything with a type tag) in
a database is that the metadata and thus the type tag would be lost
after a storage-retrieval cycle.

Konrad.
Cosmin Stejerean
2009-03-11 17:54:02 UTC
Permalink
On Wed, Mar 11, 2009 at 12:00 PM, Konrad Hinsen
Post by Konrad Hinsen
Post by Rich Hickey
I know people usually think of collections when they see vector/map/
set, and they think classes and types define something else. However,
the vast majority of class and type instances in various languages are
actually maps, and what the class/type defines is a specification of
what should be in the map. Many of the languages don't expose the
instances as maps as such and in failing to do so greatly deprive the
users of the language from writing generic interoperable code.
My own experience is mostly with Python. Python objects are indeed
essentially maps (Python calls them dictionaries). But even though it
is easy to obtain the map equivalent of any object (object.__dict__),
I hardly see this being done. Python programmers tend to use maps and
objects in very different ways, and that includes experienced
programmers who are very well aware that objects are just maps plus a
type tag plus a set of methods.
IMHO a big reason Python programmers don't typically treat objects like is
maps/dictionaries is that the set of things found in the map (dictionary)
for that object (__dict__) are just a small subset of the interesting
attributes of the object. In Python things like class level attributes,
properties, descriptors and multiple inheritance all add a lot of
flexibility to defining and using objects that would take a some effort to
replicate on top of simple maps.
The flexibility of Python however does allow you to treat even complex
objects as dictionaries (by implementing __getitem__) or dictionaries as
objects (by overriding __getattr__ or __getattribute___). I've used these
techniques in places where I need to treat an object like a dictionary for
interop, or places where I wanted to use a dictionary but with the nicer
syntax for attribute access on objects ( a.foo instead of a['foo'] saves 3
keystrokes).
--
Cosmin Stejerean
http://offbytwo.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
To unsubscribe from this group, send email to clojure+***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Chouser
2009-03-11 22:34:57 UTC
Permalink
Post by Cosmin Stejerean
IMHO a big reason Python programmers don't typically treat objects like is
maps/dictionaries is that the set of things found in the map (dictionary)
for that object (__dict__) are just a small subset of the interesting
attributes of the object. In Python things like class level attributes,
properties, descriptors and multiple inheritance all add a lot of
flexibility to defining and using objects that would take a some effort to
replicate on top of simple maps.
It's interesting to compare a Python class with a dict inside to a
Clojure map with metadata "outside".

Interacting directly with a class dict feels a little dirty, because
you could be circumventing the API provided by the class methods,
making it easy to get the object into a bad state. Clojure's maps
being immutable reduces the amount of trouble you can cause by dealing
directly with the map.

Defining a instance method for a Python class allows you to connect
some code to your data, which internally uses a type pointer from the
instance to the class. In Clojure you can put functions directly in
the metadata (as clojure.zip does), or put a type tag in the map or in
the metadata, and use a multimethod dispatching on that to connect
code to your data.

Similarly, any inheritance in Clojure would normally be defined on a
keyword (or symbol or collection of either) that is in the map or the
map's metadata. In Python, the object knows its class, and the class
knows about the hierarchy.

I don't know if that leads to any particular conclusion. I suppose it
does suggests a trivial program (or a trivial part of a program) in
Clojure will likely have less code for setting up classes than the
Python equivalent -- you start with the data you actually need, and
can add "methods", polymorphism, etc. if needed later.

--Chouser
Cosmin Stejerean
2009-03-12 01:11:04 UTC
Permalink
On Wed, Mar 11, 2009 at 5:34 PM, Chouser <***@gmail.com> wrote:
[...]
Post by Chouser
Defining a instance method for a Python class allows you to connect
some code to your data, which internally uses a type pointer from the
instance to the class. In Clojure you can put functions directly in
the metadata (as clojure.zip does), or put a type tag in the map or in
the metadata, and use a multimethod dispatching on that to connect
code to your data.
Similarly, any inheritance in Clojure would normally be defined on a
keyword (or symbol or collection of either) that is in the map or the
map's metadata. In Python, the object knows its class, and the class
knows about the hierarchy.
I don't know if that leads to any particular conclusion. I suppose it
does suggests a trivial program (or a trivial part of a program) in
Clojure will likely have less code for setting up classes than the
Python equivalent -- you start with the data you actually need, and
can add "methods", polymorphism, etc. if needed later.
I think it's largely possible to abuse Python to achieve some of the
possibilities you mentioned. At runtime you can add new methods to a class,
you can add new methods directly to an object (hint: use
new.instancemethod), you can change the __bases__ of a given class to inject
behavior, and you can change the class of an object by assigning to
__class__.

I included a small example of using the above techniques that makes it easy
(I think) to separate code and data in Python by composing instances that
provide data with classes that provide behavior at runtime.

http://gist.github.com/77848
--
Cosmin Stejerean
http://offbytwo.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
To unsubscribe from this group, send email to clojure+***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Konrad Hinsen
2009-03-12 08:49:11 UTC
Permalink
Post by Chouser
Interacting directly with a class dict feels a little dirty, because
you could be circumventing the API provided by the class methods,
making it easy to get the object into a bad state. Clojure's maps
being immutable reduces the amount of trouble you can cause by dealing
directly with the map.
Not really. Most map operations, such as assoc and dissoc, return a
map with the same metadata as the input map. The result thus looks
like being of a specific type, even if dissoc just removed a key that
is important for that type's semantics.
Post by Chouser
I don't know if that leads to any particular conclusion.
My main conclusion is that Clojure's system is a lot more flexible
but also a lot more fragile. Any function can modify data of any
"type" (as defined by metadata), even without being aware of this.
Any function can at any time modify the global inheritance hierarchy
in any way it wants. Any module can add an implementation for any
type to any multimethod. That opens the way to many interesting
strategies for data handling, but also to errors that will probably
be hard to track down.

Konrad.
Jeff Rose
2009-03-12 09:59:36 UTC
Permalink
Post by Konrad Hinsen
Post by Chouser
Interacting directly with a class dict feels a little dirty, because
you could be circumventing the API provided by the class methods,
making it easy to get the object into a bad state. Clojure's maps
being immutable reduces the amount of trouble you can cause by dealing
directly with the map.
Not really. Most map operations, such as assoc and dissoc, return a
map with the same metadata as the input map. The result thus looks
like being of a specific type, even if dissoc just removed a key that
is important for that type's semantics.
Post by Chouser
I don't know if that leads to any particular conclusion.
My main conclusion is that Clojure's system is a lot more flexible
but also a lot more fragile. Any function can modify data of any
"type" (as defined by metadata), even without being aware of this.
Modifying type tags without being aware of it? That sounds like FUD to
me. Using metadata is relatively atypical in the first place, and
modifying the :type tag without being aware of it sounds like an
extremely minimal risk.
Post by Konrad Hinsen
Any function can at any time modify the global inheritance hierarchy
in any way it wants. Any module can add an implementation for any
type to any multimethod. That opens the way to many interesting
strategies for data handling, but also to errors that will probably
be hard to track down.
Konrad.
We've heard this line of reasoning before when moving from static to
dynamic languages. If having the power to do what you want with the
language scares you, then maybe Java is a better choice. All these
"hard to track down" bugs people worry about when having more
flexibility in the language don't seem to crop up often enough to drive
people away though.

Adding an implementation for a new type to a multimethod is equivalent
to adding an interface implementing method to a class you defined. So
for example you could add to-string or to-xml or to-bytes or whatever to
your own objects to make them interoperate with some existing library.
Having libraries built on top of abstract interfaces like this is
exactly what makes them interesting.

In Ruby you can open any built-in class you want, like String, and add
or modify any methods you want. In practice it happens rarely and
almost never causes problems.

-Jeff
Konrad Hinsen
2009-03-12 10:35:02 UTC
Permalink
Post by Jeff Rose
Post by Konrad Hinsen
My main conclusion is that Clojure's system is a lot more flexible
but also a lot more fragile. Any function can modify data of any
"type" (as defined by metadata), even without being aware of this.
Modifying type tags without being aware of it?
Not modifying type tags, but modifying data that has a type tag
without being aware of the fact that the data has a type tag, and
thus perhaps specific constraints on its contents. The most basic
example is calling dissoc on a map to remove a key that is required
by the semantics of the type implemented as a map. dissoc is agnostic
about type tags, so it won't complain.

In this specific case, struct maps can be used to prevent a key from
being removed, but that's a solution only for this specific case, and
not necessarily a simple one to implement.
Post by Jeff Rose
We've heard this line of reasoning before when moving from static to
dynamic languages. If having the power to do what you want with the
language scares you, then maybe Java is a better choice. All these
It doesn't scare me, otherwise I wouldn't be using Clojure. And I
wouldn't be using Python as my main language either. However, I think
it is important to be aware of the risks in order to watch out for them.
Post by Jeff Rose
Adding an implementation for a new type to a multimethod is equivalent
to adding an interface implementing method to a class you defined. So
for example you could add to-string or to-xml or to-bytes or
whatever to
your own objects to make them interoperate with some existing library.
Having libraries built on top of abstract interfaces like this is
exactly what makes them interesting.
I agree, of course. And yet, it is important to be aware of the
consequences. For example, don't ever try to memoize the dispatching
function of a multimethod - its result may well change after
importing another library module.

Konrad.
Jeff Rose
2009-03-12 15:08:26 UTC
Permalink
Post by Konrad Hinsen
Post by Jeff Rose
Post by Konrad Hinsen
My main conclusion is that Clojure's system is a lot more flexible
but also a lot more fragile. Any function can modify data of any
"type" (as defined by metadata), even without being aware of this.
Modifying type tags without being aware of it?
Not modifying type tags, but modifying data that has a type tag
without being aware of the fact that the data has a type tag, and
thus perhaps specific constraints on its contents. The most basic
example is calling dissoc on a map to remove a key that is required
by the semantics of the type implemented as a map. dissoc is agnostic
about type tags, so it won't complain.
Ahh, I see what you were getting at, and it is a more interesting point
than I originally realized. I guess to achieve this level of safety
while still being map-compatible you would need to implement the
Associative interface and maintain the constraints using getter/setters
or something of the sort.
Post by Konrad Hinsen
Post by Jeff Rose
We've heard this line of reasoning before when moving from static to
dynamic languages. If having the power to do what you want with the
language scares you, then maybe Java is a better choice. All these
It doesn't scare me, otherwise I wouldn't be using Clojure. And I
wouldn't be using Python as my main language either. However, I think
it is important to be aware of the risks in order to watch out for them.
Post by Jeff Rose
Adding an implementation for a new type to a multimethod is equivalent
to adding an interface implementing method to a class you defined. So
for example you could add to-string or to-xml or to-bytes or
whatever to
your own objects to make them interoperate with some existing library.
Having libraries built on top of abstract interfaces like this is
exactly what makes them interesting.
I agree, of course. And yet, it is important to be aware of the
consequences. For example, don't ever try to memoize the dispatching
function of a multimethod - its result may well change after
importing another library module.
Konrad.
True. Sorry if I came out swinging in the last message. I've gotten
sick of static typers spreading FUD about languages like Clojure not
being usable for "real" or large pieces of software. Your points are
well taken. Although I don't find them to be great risks, it is
worthwhile to understand them.

-Jeff

Loading...