Study Guide: Trees


This is a study guide with links to past lectures, assignments, and handouts, as well as additional practice problems to assist you in learning the concepts.


Important: For solutions to these assignments once they have been released, see the main website






A tree is an abstract model of a hierarchical structure. It consists of nodes with a parent-child relationship. Its name comes from the fact that when drawn, it resembles an upside-down tree: the root of the tree is at the top and the leaves are at the bottom.

A tree is a recursive data structure; each node of the tree contains a label value and a list of branches, each of which are also trees. The label can be any data type, while the branches are represented as a list of trees. The lecture slides provide a good overview of tree terminology and a visual to help demonstrate.

Hierarchical Data

We learned in lecture about lists, containers which hold sequences of data. These sequences can also be arbitrarily nested to create nested lists.

>>> l = [1, 2, 3]
>>> [l, [l]]
[[1, 2, 3], [[1, 2, 3]]]

This results in a hierarchy that lets us model, organize, and represent complex systems.

Trees build upon this idea of hierarchy by constraining the structure of the tree. We are only allowed to build trees in a certain way, and they take on a certain visual representation, but underlying all of the nodes and their labels and branches is the same story of a nested, list hierarchy. What the tree data abstraction allows us to do is think of trees as larger units and not worry about manipulating individual lists.

In this class, we have two different views of trees which we often switch between depending on the situation.

  1. We can view trees as individual nodes. In Python, our tree data abstraction is compound type. This means that we reference it with a pointer or arrow, just like a list. But this means that, in our list of branches, each tree is pointed to with a pointer too! You can imagine this can get complicated quickly, so when we view trees as individual nodes, we consider what we can do with an individual node of a tree. Python sees trees this same way: not as a whole, but as an individual node in a larger structure that happens to be connected.
  2. We can also view trees as larger, connected structures. Just like functional abstraction allows us to reason about the behavior of a program in terms of its domain, range, and behavior, thinking abstractly about tree structures allow us to reason about the behavior of programs at a larger scale. We can say, for example, that calling replace_thor_at_leaf on a branch of a tree will return a branch replacing all of its leaves' label values.

Tree Traversal

Hierarchy and structure helps us organize data, but at the cost of making that very same data harder to find. Now that we've organized data into a tree, we'd like to be able to use and manipulate that data.

Tree traversal refers to the process of visiting each node in a tree data abstraction, exactly once. Just like how we saw that we needed to come up with an organized way to explore the space of possible solutions in tree recursion, we also need an organized way to traverse trees to avoid repeating work or getting stuck in an infinite loop.

Each node of a tree has a predefined structure: a label value and a list of branches. This structure lets us explore tree structures in a very specific pattern where we visit each node of the tree one-by-one until we've completed the traversal and solved the problem.

  1. Visit the current node and check or otherwise process the node. Our base case often falls under this category: we process the node by checking if the node is a leaf, or some other stopping condition.
  2. Iterate across the branches of the tree and recurse down. We often write a for loop or list comprehension to iterate across the branches of the tree and recurse on the branch to explore down the tree.
  3. Combine the result obtained from the recursive call to determine the final result. To find the sum of all the label values in a tree, for example, it's not sufficient to stop at making a recursive call: we also need to determine what to do with the number returned from the sum of the values in the branch.

Tree problems tend toward recursive solutions. This also makes it convenient for understanding the structure of tree problems by first examining the expected behavior by coming up with small example cases. How should our program behave when we call it on a leaf, for example? How about if we scale up to a larger tree of 2 or 3 nodes? And so forth.

Practice Problems


Q1: Tree Map

Define the function tree_map, which takes in a tree and a one-argument function as arguments and returns a new tree which is the result of mapping the function over the entries of the input tree.

def tree_map(fn, t):
    """Maps the function fn over the entries of tree and returns the
    result in a new tree.

    >>> numbers = tree(1,
    ...                [tree(2,
    ...                      [tree(3),
    ...                       tree(4)]),
    ...                 tree(5,
    ...                      [tree(6,
    ...                            [tree(7)]),
    ...                       tree(8)])])
    >>> print_tree(tree_map(lambda x: 2**x, numbers))
"*** YOUR CODE HERE ***"
if branches(t) == []: return tree(fn(label(t)), []) mapped_subtrees = [] for subtree in branches(t): mapped_subtrees += [ tree_map(fn, subtree) ] return tree(fn(label(t)), mapped_subtrees) # Alternate solution def tree_map(fn, t): return tree(fn(label(t)), [tree_map(fn, t) for t in branches(t)])

Q2: Same shape

Define a function same_shape that, given two trees, t1 and t2, returns True if the two trees have the same shape (but not necessarily the same data in each node) and False otherwise.

def same_shape(t1, t2):
    """Return True if t1 is indentical in shape to t2.

    >>> test_tree1 = tree(1, [tree(2), tree(3)])
    >>> test_tree2 = tree(4, [tree(5), tree(6)])
    >>> test_tree3 = tree(1,
    ...                   [tree(2,
    ...                         [tree(3)])])
    >>> test_tree4 = tree(4,
    ...                   [tree(5,
    ...                         [tree(6)])])
    >>> same_shape(test_tree1, test_tree2)
    >>> same_shape(test_tree3, test_tree4)
    >>> same_shape(test_tree2, test_tree4)
"*** YOUR CODE HERE ***"
s1, s2 = branches(t1), branches(t2) if len(s1) != len(s2): return False for index in range(len(s1)): if not same_shape(s1[index], s2[index]): return False return True # Alternate solution def same_shape(t1, t2): children_same = all(map(same_shape, branches(t1), branches(t2))) return len(branches(t1)) == len(branches(t2)) and children_same