2024‐03: Implementing Uniplates and Biplates with Structure Preserving Trees

Niklas Dewally - 2024-03-21

Recap: Uniplates and Biplates
Definition of Children
Storing The Structure of Children
Uniplates and Biplates Using Structure Preserving Trees

Comments and discussion about this document can be found in issue #261.

In #180 and #259 we implemented the basic Uniplate interface and a derive macro to automatically implement it for enum types. However, our current implementation is incorrect and problematic: we have defined children wrong; and our children list does not preserve structure, making more complex nested structures hard to recreate in the context function.

Recap: Uniplates and Biplates

A quick recap of what we are trying to do - for details, see the Uniplate Haskell docs, our Github Issues, and the paper.

Uniplates are an easy way to perform recursion over complex recursive data types.

It is typically used with enums containing multiple variants, many of which may be different types.

data T = A T [Integer]
       | B T T
       | C T U
       | D [T]

With Uniplate, one can perform a map over this with very little or no boilerplate.

This is implemented with a single function uniplate that returns a list of children of the current object and a context function to rebuild the object from a list of children.

Biplates provide recursion over instances of type T within some other type U. They are implemented with a single function biplate of similar type to uniplate (returning children and a context function).

These functions are able to be implemented on any data type through macros making Uniplate operations boilerplate free.

Definition of Children

We currently take children of T to mean direct descendants of T inside T. However, the correct definition of children is maximal substructures of type T.

For example, consider these Haskell types:

data T = A T [Integer]
       | B T T
       | C T U
       | D [T]

data U = E T T
       | F Integer
       | G [T] T

Our current implementation would define children as the following:

children (A t1 _) = [t1]
children (B t1 t2) = [t1, t2]
children (C t1 u) = [t1]
children (D ts) = ts

However, the proper definition should support transitive children - i.e. T contains a U which contains a T:

children (A t1 _) = [t1]
children (B t1 t2) = [t1, t2]
children (C t1 (E t2 t3)) = [t1,t2,t3]
children (C t1 (F _)) = [t1]
children (C t1 (G ts t3)) = [t1] ++ ts ++ [t3]
children (D ts) = ts

While a seemingly small difference, this complicates how we reconstruct a node from its children: we now need to create a context that takes a list of children and creates an arbitrarily deeply nested data structure containing multiple different types. In particular, the challenge is keeping track of which elements of the children list go into which fields of the enum we are creating.

We already had this problem dealing with children from lists, but a more general approach is needed.

Storing The Structure of Children

How do we know which children go into which fields of the enum variant?

For example, consider this complicated instance of T:

data T = A T [Integer]
       | B T T
       | C T U 
       | D [T]

data U = E T T
       | F Integer
       | G [T] T

myT = C(D([t0,...,t5),G([t6,...,t12],t13))

Its list of children is: [t0,...,t13].

Try creating a function to recreate myT based on a new list of children. Hint: it is difficult, and even more so in full generality!

If we instead consider children to be a tree, where each field in the original enum variant is its own branch, we get:

data Tree A = Zero | One A | Many [(Tree A)]

myTChildren = 
  Many [(Many [One(t0),...,One(t5)]),      -- C([XXX], _) field 1 of C
        (Many [                            -- C([_],XXX)  field 2 of C
          (Many [One(t6),...,One(t12)]),   -- G([XXX],_)  field 2.1 of C
          (One  t13))]                     -- G([_],XXX)) field 2.2 of C

This captures, in a type-independent way, the following structural information:

myT has two fields.
The first field is a container of Ts.
The second field contains two inner fields: one is a container of Ts, one is a single T.

Representing non-recursive and primitive types

Often primitive and non-recursive types are found in the middle of enum variants. How do we say “we don’t care about this field, it does not contain a T” with our Tree type? Zero can be used for this.

For example:

data V = V1 V Integer W
       | V2 Integer

data W = W1 V V

myV = (V1 
        (V2 1) 
        1 
        (W1 
          (V2 2) 
          (V2 3)))

myVChildren = 
  (Many [
    (One v1),  -- V1(XXX,_,_)
    Zero,      -- V1(_,XXX,_) 
    (Many [    -- V1(_,_,XXX)
      (One v2),
      (One v3)
   ]))

This additionally encodes the information that the second field of myV has no Ts at all.

While this information isn’t too useful for finding or traversing over the target type, it means that the tree structure defines the structure of the target type completely: there are always the same number of tree branches as there are fields in the enum variant.

Uniplates and Biplates Using Structure Preserving Trees

Recall that the uniplate function returns all children of a data structure alongside a context function that rebuilds the data structure using some new children.

To implement this, for each field in an enum variant, we need to ask the questions: “how do we get the type Ts in the type U”, and how can we rebuild this U based on some new children of type T?.

This is just a Biplate<U,T>, which gives us the type T children within a type U as well as a context function to reconstruct the U from the T.

Therefore:

We build the children tree by combining the children trees of all fields of the enum (such that each field’s children is a branch on the tree).
We build the context function by calling each fields context function on each branch of the input tree.

This is effectively a recursive invocation of Biplate.

What happens when we reach a T? Under the normal definition, Biplate<T,T> would return its children. This is wrong - we want to return T here, not its children!

When T == U we need to change the definition of a Biplate: Biplate<T,T> operates over the input structure, not its children.

For our example AST:

data T = A T [Integer]
       | B T T
       | C T U 
       | D [T]

data U = E T T
       | F Integer
       | G [T] T

Here are some Biplate and Uniplates and their resultant children trees:

Biplate<Integer,T> 1 is Zero.
Uniplate<T> (A t [1,2,3]) is (Many [Biplate<T,T> t ,Zero]).

This evaluates to (Many [(One t),Zero])
Biplate<T,T> t is One t.
Biplate<T,U> (G ts t1) is (Many [Biplate<T,[T]> ts , Biplate<T,T> t ]).

This evaluates to (Many [(Many [(One t0),...(One tn)]),(One t)])
Uniplate<T,T> (C t (G ts t1)) is (Many [Biplate<T,T> t, Biplate<U,T> (G ts t1))

This evaluates to (Many [(One t), (Many [(Many [(One t0),...,(One tn)]), (One t)]))

The contents of this repository are licensed under the Mozilla Public Licence.

Home

Contributor's Guide

Design Documents

CI/CD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024‐03: Implementing Uniplates and Biplates with Structure Preserving Trees

Recap: Uniplates and Biplates

Definition of Children

Storing The Structure of Children

Uniplates and Biplates Using Structure Preserving Trees

Clone this wiki locally