RGA Algorithm Improvements
Proposed RGA algorithm improvements
- Avoid duplicate merge (this is a generic optimisation, to do for all CRDTs):
- An RGA instance memorises the highest timestamp used so far.
- This timestamp is the first field of the serialised RGA.
- When merging in a remote RGA, remember its highest timestamp.
- On next merge, if the highest timestamp of the remote RGA has not changed, the merge is a duplicate. Skip the rest of deserialisation, skip the slow merge algorithm, and retain the local version only.
-
Replace copying with iterators
-
Accelerate search for siblings.
- In memory, a node maintains a reference to its closest right-side sibling. This reference is not serialised.
- Thus, siblings form a list, sorted in decreasing timestamp order.
- In the merge algorithm, when comparing a new node
- This reference is volatile and is not in the serialised representation.
- Accelerate UID lookup.
- In memory, maintain a map (hash table) of UID to the corresponding node.
- When creating a node, add it to the map.
- Look up a node by UID by looking it up in the map.
- Tombstoning a node has no effect on the map.
- This map is volatile and is not in the serialised representation.
- Prefer node reference to index
- Alternative versions of insertAt(), removeAt(), getAt(), toRealIndex() whose argument is a node reference, rather than an index.
- Avoids the costly index-to-node lookup.