Concurrent reads/writes and isolation levels

I’m trying to understand the implications of the different isolation levels fauna DB provides and how it relates to the handling of unique constraints on indexes. I took a look at this thread on idempotent document creation and would like to leverage something similar. However, when reading the isolation levels documentation it states:

Strict serializability adds a simple extra constraint on top of serializability. If transaction Y starts after transaction X completes, which means that X and Y are not (by definition) concurrent, then a system that guarantees strict serializability guarantees that both:
1. the final state is equivalent to processing transactions in a serial order, and
2. X must be before Y in that serial order.

It appears that the documentation explicitly calls out non-concurrent transactions. I’m trying to better understand Fauna’s behavior in the concurrent transaction case.

The FQL pseudocode from the thread on idempotent document creation is as follows:

If(Exists(Match(Index("unique_id"), "id")), false, Create(...))

My understanding of this is that it will check if some unique identifier already exists using an index with a unqiue constraint on the id, and if not it will create a new document with that id. Assuming the id doesn’t already exist, what happens when this query is run concurrently with input that has the same id?

All Fauna transactions are ordered.

For non-contended transactions, those that operate on disparate documents, the actual order isn’t terribly important: the effects of one transaction do not affect another.

For contended transactions, these are automatically retried several times. If a previous transaction completes during this interval such that the current transaction can complete, it does. Otherwise, the transaction fails, and the client application would need to execute the query again.

Your example query is a decent example of defensive query writing. If you’re not sure about the state of the database, it is always best to check before attempting document creation, updates, or deletes.

Note that Match can return a set of zero or more items. When you use Exists on a set, only the first item in the set is actually evaluated. Depending on the result you expect, you could just use an index with unique: true, and that setting ensures that only one document can exist with the specific terms and values defined. Your transaction would fail, but it is easier to reason about.

Also note that Create returns the document created. If you capture the result of your query, it would likely be better to replace false with Get(Match("unique_id"), "id").

All

Side note: The docs on isolation levels say

Reads can opt-in to strict serializability by using the linearized endpoint, or by including a no-op write in the transaction.

The linearized endpoint can be used by appending /linearized to the request URL. For example, if your request normally goes to https://db.fauna.com, then the linearized endpoint is https://db.fauna.com/linearized

Hello,

Just so I am clear. Which of the following is accurate?

Assuming a case where there is a index on the collection that enforces a unique constraint on a data field within the document, and assuming a worse case scenario where 30 requests come in, distributed across geographically separated fauna nodes, to get-or-create documents that contain the same value for that unique data field, then:

  • Fauna will order the geographically distributed transactions so that one transaction will fall into the “else” case and create the document, while all others will find the document already exists?

OR

  • Fauna will ensure only one transaction creates the document, and the other 29 transactions will transparently retry until they either find the document exists and return, or the retry limit is exceeded and an error will be thrown back to the calling client?

I’m trying to ensure I capture the behavior in this edge case. The documentation might lead one to believe that a get-or-create type of query something like If(Exists(Match(Index("unique_id"), "id")), false, Create(...)) will always succeed, returning yes or no, versus there being a 3rd option where there is a failure returned to the client that indicates the transaction should be retried rather than considered some other type of fatal error.

Your second alternative is the closest to reality.

There are three possible alternatives for any of the transactions: the transaction creates the document, the transaction notices that the document already exists, or the transaction fails with a contention error.

The worst-case flow looks like:

  • 30 transactions are submitted and executed concurrently.
  • No transaction sees an existing document, so they all follow the Create branch
  • Since all 30 transactions are attempting to write, they all get sequenced in the transaction log.
  • 1 transaction “wins” and creates the document.
  • The other 29 transactions fail internally with a contention error, but they are retried.
  • The 29 retried transactions re-read the index, see that the document now exists, and then follow the false branch.
  • Since none at this point are attempting any writes, they all return false.

If the contention involved a more complicated situation than document existence, it could be that one or more transactions could result in contention errors.

1 Like

Fantastic. Thanks for clarifying! This was the detailed information I was looking for.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.