On the importance of recognizing and using maps

Discussion:

Rich Hickey

2009-03-08 17:53:34 UTC

It's great to see all of the Clojure libraries springing up as people
pull Clojure towards their application domains, and shape it for the
styles of programming they prefer.

In looking at some of the libraries, I am a bit concerned that maps
are not being used when the logical entity is in fact a map.

If you like Clojure, I would hope one of the things you like about it
is that 'everything works with everything' to as great an extent as
possible. This falls out of the fact that Clojure very much takes to
heart this Alan Perlis quote:

"It is better to have 100 functions operate on one data structure than
10 functions on 10 data structures."

And recasts it as:

It is better to have 100 functions operate on one data abstraction
than 10 functions on 10 data structures.

Recently it was said in a thread:

"You can do a lot of that in Clojure, too, but, unless I'm mistaken,
there are some arbitrary limits as things stand right now. ... you can
have any kind of structure you want, as long as it's a map, set, or
sequence."

I know people usually think of collections when they see vector/map/
set, and they think classes and types define something else. However,
the vast majority of class and type instances in various languages are
actually maps, and what the class/type defines is a specification of
what should be in the map. Many of the languages don't expose the
instances as maps as such and in failing to do so greatly deprive the
users of the language from writing generic interoperable code.

Classes and types usually create desert islands. When you say:

//Java
class Foo {int x; int y; int z;}

--Haskell
data Foo = Foo {x :: int, y :: int, z :: int}

you end up with types with a dearth of functionality. Sure, you might
get hashCode and equals for free, or some other free stuff by deriving
from Eq or Show, but the bottom line is you are basically starting
from scratch every time. No existing user code can do anything useful
with your instances.

Logically, instances of these classes/types are similar:

{:class Foo :x _ :y _ :z _}
{:data Foo :constructor Foo :x _ :y _ :z _}

i.e., they are maps. Note that what seems to be special, type/
constructor/class, is merely just a privileged attribute. If we choose
to represent similar things as maps in Clojure, we get a lot of
benefits - all existing map-based functionality will work with our
instances right out of the box.

So, I want to look at 3 of the contrib libraries, not to single them
out, nor to criticize them, but to warn about reduced utility and
interoperability due to choosing not to use maps.

First up is contrib.sql, where insert-rows and insert-values both take
a vector of column names followed by vectors of unlabeled values that
must be in the same order as the corresponding columns. I would hope
never to have such fragile things as those vectors in my programs.
OTOH, the query and update APIs take and return maps.

Next is contrib.datalog, where add-tuples looks like this:

(add-tuples db-base
[:employee :id 1 :name "Bob" :position :boss]
[:employee :id 2 :name "Mary" :position :chief-
accountant]
...)

I can't help thinking that making the relation (:employee) special is
an onion from prior systems that we shouldn't propagate. I don't want
to have such vectors in my program. Clojure.contrib Datalog does the
right thing in getting away from positional components - go all the
way and recognize relation as just another attribute and a tuple as a
map:

{:relation :employee :id 1 :name "Bob" :position :boss}

Making relation non-special is the key to making the logic system
readily useful to those who weren't thinking about designing
specifically for it.

Finally we have contrib.types, an algebraic data type system where
constructors generate vectors of unnamed components as instances.
Again, unnamed positional components are an onion of some algebraic
data type systems, there's no need to repeat that.

I'd very much like to see these libraries be interoperable, e.g. to
store ADTs in a database or query them with Datalog, and I know that
would be possible if they were all using maps consistently.

I guess I want to advocate - don't merely replicate the things with
which you are familiar. Try to do things in the Clojure way. If your
logical structure is a mapping of names to values, please use a map.
Positional data is fragile, non-self-descriptive and unmanageable
after a certain length - look at function argument lists. Note that
using maps doesn't preclude also having positional constructors, nor
does it dictate a space cost for repeating key names - e.g. structmaps
provide positional constructors and shared key storage.

I really appreciate the work everyone is doing, just trying to
maintain 'everything works with everything' with a nudge towards more
consistent use of maps. Don't build your API on an island.

Thanks,

Rich

Stuart Sierra

2009-03-08 19:02:08 UTC