Doubly-linked balanced tree with GhostCell — observations and benchmarks

Introduction#

The data structure#

Abstractly, I needed a list labeling data structure to use in Async UI for providing each item in a UI list with its up-to-date index in face of insertions/deletions elsewhere in the list. The API shoud look like

fn insert(&mut self, index: usize, data: T) -> ItemId;
fn remove(&mut self, index: usize);
fn id_to_index(&self, id: &ItemId) -> usize;
// and len, get, get_mut, ...

This is implemented roughly as a doubly-linked B-Tree with order statistics. But for this blog post, we only need to care that the tree is doubly-linked (each node holds weak reference to its parent in addition to parents holding strong references to children) and each ItemId holds weak reference to a node.

Simple Implementation#

A simple way to make doubly-linked trees in Rust is to use Rc, Weak, and RefCell.

Diagram of simple tree design. Each node is Rc<RefCell<Node>>. Each node has Weak reference to its parent. ItemIds are outside the tree and hold references the leave nodes corresponding to the items

Here, we need RefCell's interior mutability because our nodes are wrapped in Rc, which only allow immutable access. Rc can only give out immutable/non-exclusive access because data behind Rc can have multiple owners.

But in our case, there is an exclusive owner: the tree itself. The nodes are all inside the tree — no one outside can own them. Any operation on the nodes have to be done through the API provided by the tree.

Avoiding RefCell#

Since the tree exclusively own all its nodes, anyone with mutable/exclusive access to the tree should have mutable/exclusive access to all nodes too. Tree methods such as insert(&mut self, ...) should be able to mutate the nodes freely. The runtime checks of RefCell shouldn't be needed.

This is where GhostCell comes in. Separate permission from data! Let the tree hold the key/token to all its nodes. That the nodes' data is wrapped in Rc or Weak doesn't matter; the tree has the key so the tree owns the data.

Switched from RefCell to GhostCell. The key is stored as a field in the Tree struct

Now, methods like insert(&mut self, ...) can use their mutable/exclusive access to self.key to get exclusive access to the data inside GhostCells at zero runtime cost!

Observations#

Compile-time check feels like Rust#

My initial implementation with RefCell had a few panics due to borrowing RefCell at the wrong time. With GhostCell, this problem is completely eliminated. I encountered compile errors while switching, but once it compiles, it works.

Read-only borrows are easy#

GhostCell#

The insert method needs a way to prove to the nodes that it has exclusive access to the tree. This is where GhostCell comes in. Each GhostCell can only be unlocked with the single GhostToken key associated with it. To exclusively/mutably borrow into a cell, we need to show &mut GhostToken to prove that we have exclusive/mutable access to the key. Likewise, to immutably borrow into a cell, we need to show &GhostToken.

GhostCell infographics

We have many GhostCell-wrapped data nodes in our tree. All of them are associated with the same key. This key should be stored at the root of the data structure, so that methods could use it to unlock nodes and do work.

Each GhostCell is associated with a token or key. Multiple cells can be associated to the same key, but no cell can be associated to more than one key. To access data inside a GhostCell, you need the key. A non-exclusive access to the key gives you immutable access to the cell's data. An exclusive access to the key gives you mutable access to the cell's data.

Using GhostCell for doubly-linked tree#

For our tree, we can hold the key at the root of the data structure. Any method that operate on the tree can borrow this key, using it to unlock nodes.

For example, insert has exclusive reference to the tree, so it gets exclusive reference to the key. With that, it can mutably borrow the content of any node in the tree.