The CRDT literature sometimes leaves room for mathematical ambiguity. Maybe because the bulk of the work tends to be targeted at systems researchers and developers, like a lot of work on eventual consistency.
The discussion below untangles three subtle but important ideas in CRDT design, which turn out to be interrelated:
- Determinism vs Convergence guarantees
- Early
read
s: lower bounds or not? - Algebraic property requirements for
update
functions.
The CRDT guarantee of strong eventual consistency does not guarantee determinism!
If you want your CRDTs to be deterministic, or you want to treat CRDT read
s as lower bounds, then your update
functions must be inflationary.
This is the 4th post in a series I'm doing on CRDTs. This one is a bit more technical and narrow than my previous CRDT posts, but practitioners should still know about the main conclusions. This post also contains a small formal result that seems novel -- Strong Eventual Consistency does not guarantee determinism! I'd be curious to hear about prior work that makes this point.
Definitions First
Before we get into the details, let's be a bit careful with our terminology. I'll assume you've seen CRDTs before, but I'll review the key points while trying to keep this breezy.
Join Semi-Lattices
You likely remember that CRDTs are related to lattices in abstract algebra. Let's review that.
Let’s suppose we have a set of possible states , equipped with a join operator . The join is required to satisfy three properties:
- Idempotent:
- Commutative:
- Associative:
If you have a set of states and a binary operator satisfying these properties, you have a join-semilattice.
From this operator, we can define a partial order:
This means that applying join never forgets anything. It only moves “upward” in the order induced by join.
From Join Semi-Lattices to CRDTs
A CRDT is essentially a replicated join semi-lattice.
To the core mathematical definition we add one more API, an update
function that mutates the state of the CRDT (while of course staying within the state space ). As we'll see below, these functions can be the source of nondeterminism depending on their algebraic properties. Also, the CRDT literature tends to refer to the join operator as merge
. (That works for me, as it disambiguates the CRDT term from the unrelated relational join operator from databases.)
Next, we assume an asynchronous network dissemination service that periodically sends a snapshot of one node's state to another. When state arrives at a destination node, its local CRDT state is mutated to reflect the merge
of the current state with the message.
We'll assume that every node eventually merge
s data from every other node directly or indirectly (transitively).
That's all we need, at the level of detail we'll pursue here. However, the literature on CRDTs typically highlights a subclass of CRDTs called "op-based" CRDTs. You can read about those in an earlier post and I'll have more to say about them below.
Given this background, let’s move on to define a key property of functions on the lattice domain.
Inflationary
We say that a function over an ordered domain is inflationary if its output is never smaller than its input:
For CRDT discussion, we assume the domain is the same as that of our semi-lattice, and the partial order is the one from our semi-lattice definition. Inflationary functions never take us "down" in the lattice order.
Let’s start with something reassuring: the join operator is always inflationary in each of its arguments.
Why? because is an upper bound for .
This is one reason why many CRDT papers don’t emphasize inflationarity: they tend to focus on merge
operations, and merge
() satisfies the property automatically.
Our issue is going to be the update
function each node may run between merge
s. More on that shortly.
But first, let's define the standard guarantee that CRDTs are designed to provide.
Strong Eventual Consistency (SEC)
The canonical correctness guarantee for CRDTs is Strong Eventual Consistency (SEC), as defined by Shapiro et al. It requires that, eventually, all replicas converge to the same state, provided they have seen the same set of update
s.
The SEC condition includes three properties:
- Eventual delivery: All
update
s eventually reach all replicas. - Termination: All operations eventually complete.
- Strong convergence: If two replicas have received the same set of
update
s, they will reach the same state.
Note that SEC says nothing about ordering of operations; in particular it does not constrain the interleaving of update
and merge
. Given that dissemination is assumed to be asynchronous, a replica might send its current state before or after applying an update
. We’ll see how this matters next.
SEC Does Not Guarantee Determinism
We tend to think of convergence as a property that avoids non-determinism: after all, in a convergent replica system, all the replicas agree on the outcome. But that doesn't actually mean that they agree on a deterministic outcome!
As defined, Strong Eventual Consistency only says that replicas converge within a run. It says nothing about whether different executions — with the same set of updates
and nodes — produce the same final result.
Consider the following function:
The ("top") symbol is the standard lattice notation for topmost state in a finite lattice (i.e. the unique join of all the states).
is not inflationary: it "drops" from to a lower value . If a user applies it once at node A
, the final result depends on whether A
sends its state to B
before or after applying the update
.
Example: Two Possible Outcomes
Let , with .
- Node A starts at
- Node B starts at
- A single user applies once on node A
Run 1: send first, then update
- A sends to B
- B merges:
- A applies
- B sends to A
- A merges:
- Final state: A = , B =
Run 2: update
first, then send
- A applies
- A sends to B
- B merges:
- Final state: A = , B =
Same updates. Same nodes. Same messages. Different orders. Two different converged states.
Inflationarity is Needed for Deterministic Convergence
As we've seen, non-inflationary functions like may non-deterministically drive the CRDT into one of many possible converged states, which depend on the interleaving of update
s and messages.
If all update
s are inflationary, then once a state has moved "up" the lattice, it never moves back down — and that ensures:
- Each replica's state moves up the lattice.
- Merging always moves replicas up in the lattice.
- The final state is the least upper bound of all
update
s.
That last point ensures determinism: the final result depends only on the set of update
s, not their timing.
Inflationary update
s are required for deterministic convergence in CRDTs.
Non-inflationary update
s will still converge to some common state, but the choice of converged state will be non-deterministic.
Non-Inflationary Updates Spoil Lower Bounds
In the previous post I talked about how it's unsafe to read
the naked state of a CRDT, but at least
a "well-behaved" CRDT provides lower bounds. Well, in addition to providing non-deterministic outcomes,
non-inflationary update
s are "ill-behaved" and do not provide lower bounds.
Imagine you're writing application logic that reads from a CRDT value during execution.
If the update
function is inflationary, you have a guarantee: the current state always provides a lower bound of the final state. That makes it safe to make assumptions like:
- "This user will never lose this permission."
- "This counter will only increase from here."
But if update
s are not inflationary, those assumptions no longer hold. You might see a high value now — and a lower one later — because your node applied a non-inflationary update
.
That means your application can't treat CRDT reads as stable observations in any sense. It must treat every intermediate value as potentially temporary.
When using non-inflationary update
functions, CRDT read
values can be completely arbitrary. Inflationary update
ensures that CRDT read
is a lower bound on the final converged state, which is a common assumption in the CRDT literature.
Ensuring Inflationarity
There is a simple hack to ensure inflationarity, by changing the implementation of the update
API, or removing it entirely.
The change is as follows. When a user invokes update(x)
, we do not directly mutate the CRDT state to x
; rather, we mutate the state to merge(update(x))
. Because merge
is inflationary (as discussed above), the effect of updates in this implementation is always inflationary.
The result is determinism and guaranteed lower bounds, at the cost of a slightly surprising instant/"local" effect from update
.
If you want the contract with the CRDT programmer to be more explicit, you can simply remove update
from the API and force developers to invoke merge
locally. This is nearly the same as the above design, except for type signatures:
merge
update