Concurrency and isolation levels

I am trying to further understand Fauna’s isolation levels and concurrency, but I am not sure I am thinking about it correctly.

Say I have an application that supports 2 concurrent users, both reading from and writing to the same document in a collection.

This simple starting document is enough to illustrate the example:

{
  selections: []
}

Now let’s look some FQL that is responsible for handling user actions. In summary, this FQL does the following:

  1. Reads the document
  2. Performs some complex logic
  3. Creates a new version of the selections array with an item added
  4. Updates the document
Query(
  Lambda(["docId", "userId", "selection"],
    Let({
      docRef: Ref(Collection("MyDocuments", Var("docId")),
      doc: Get(Var("docRef")),
      selections: Select(["data", "selections"], Var("doc")),
      // Complex logic
      newSelections: Append(
        { userId: Var("userId"), selection: Var("selection") },
        Var("selections")
      ),
    },
       Update(
         Var("docRef"),
         {
           data: {
             selections: Var("newSelections"),
           }
        }
    )
)

Given that FQL, let’s dive into a scenario where the 2 users make selections simultaneously. Is this order of events possible?

  1. Assuming the code in the Let portion of the FQL can run concurrently, execution from both user 1 and user 2 concurrently read the selections array in the document with this line:
      selections: Select(["data", "selections"], Var("doc")),

Since the array was initialized as empty, both “threads” (for lack of a better term) will then append to the newSelections array with their own selection.

So thread 1 from User 1 might store this in newSelections:

[{ userId: 1, "User 1 selection" }]

and thread 2 from User 2 might store this in newSelections:

[{ userId: 2, "User 2 selection" }]

This would result in data loss, where selections only contains the result from either user. But I really want it to include both, and the order doesn’t actually matter as long as each entry is there.

So what I am wondering is if I need to push the Update up further in the FQL, to essentially “lock” the document earlier in execution to prevent this concurrency issue.

Query(
  Lambda(["docId", "userId", "selection"],
    Let({
      docRef: Ref(Collection("MyDocuments", Var("docId")),
      Update(
        Var("docRef"),
        Let({
          doc: Get(Var("docRef")),
          selections: Select(["data", "selections"], Var("doc")),
          // Complex logic
          newSelections: Append(
            { userId: Var("userId"), selection: Var("selection") },
            Var("selections")
          )
        },
        {
           data: {
             selections: Var("newSelections"),
           }
        }
      )
    )
  },
))

Logically, it would make sense that as soon as the first Update begins, the document can no longer be written to until the first Update ends, but I am not sure if this is how Fauna works.

So I guess my question is whether or not moving the Update further up changes anything or if I’m thinking about concurrency and isolation levels all wrong.

Thank you!

transactions would be the term. Maybe it helps to look at this document to understand how Fauna does optimistic calculations (Consistent Backends and UX: How Do New Algorithms Help? | CSS-Tricks)? Towards the end there is a diagram.

In your case, if I’m not mistaken.

  • transactions will be ordered and optimistically calculated, so lets say they are ordered nicely and t1 is before t2.
  • when applying transactions on all nodes, each node will com to the same conclusion. t1 goes through since it’s data has not been modified in the meantime. t2 does not go through since it’s data is modified, t2 is added to the ordered log again and therefore again calclated and scheduled.
  • All nodes now accept t2 since it has not been modified.

That’s not correct, since the second transaction will be recalculated due to the selections being changed, it will contain the correct data and there will be two appends.

That’s what locking is. Fauna does not do locking but rather optimistically calculates and then verifies whether it was too optimistic when each node applies the calculated value. This is possible since everything is deterministic. The end result is the same though, you could call it ‘optimistic locking’.

You don’t have to, your logic should be correct however it might not be an ideal way to model your problem.

There is a limit to the amount of times a transaction is going to be retried and if you hit that limit you will get an error in the vein of: Transaction was aborted due to detection of concurrent modification. The way you have modelled it has two downsides:

  • you are replacing the complete array of selections each time.
  • when there are multiple concurrent writes, there will sometimes be these concurrent modifications which might trigger the error if it happens too often. You could of course always retry yourself.

The positive point is:

  • querying your document will be fast since your selections is readily available in the document, you don’t need to get a reference or use an index.

Therefore, it’s a tradeoff, ask yourself whether you want to optimize for writes or reads (or both?), there is some Fauna modelling advice that might interest you in this three part series: Modernizing from PostgreSQL to Serverless with Fauna Part 1

1 Like

@databrecht both your response and the resources you linked are helpful, thank you.

I have one follow up question, to make sure I understand you correctly.

Given your explanation, pushing the Update further up in the FQL shouldn’t make a difference at all. Fauna’s optimistic calculation will operate the same regardless of where the Update is.

But you mentioned “it might not be an ideal way to model your problem”, but I’m not sure if you are saying there is still a trade-off between these two choices (placement of the Update) or if there is another alternative I should consider?

I struggle to understand what you mean with pushing the Update further up. If it means, starting with the Update statement instead of the Let, no that doesn’t make a difference at all.

I’ve linked the postgres document since it talks a lot about such tradeoffs. Currently you are using an array to store ‘things’. Either nested documents or either you are storing an array of references to some other documents (iirc you are storing user ids). If your application is write heavy a better solution is to store the references to the this new document in the user. That’s only possible when it’s a one to many relation. If it’s a many to many relationship you need an association collection (explained in the postgres doc, besides that you could also take a look at how our GraphQL collections are generated if you add a many to many relation).

1 Like

Okay, I think I’ve got it now. Kind of amazing how Fauna keeps even concurrency pretty simple. I’m glad I understand better now, but I think in the end I could have been fine if I just trusted Fauna to do the right thing :wink:

1 Like

For me i just throw every logic into a Do FQL function and put it into a single call.
Is it the way can make sure the whole complex logic flow become one txn by one txn?

Example:>

  Do(
    If(
      GTE(
        Select(["data", "credit"], Var("student")), //Make sure Student has enough credit 
        Select(["data", "price"], Var("course"))
      ),
      Var("student"),
      Abort("Not enough credit")
    ),
    If(
      GT(Select(["data", "seats"], Var("course")), 0), // Make sure course still has seat
      Var("course"),
      Abort("Course sold out!")
    ),
    Update(Select("ref", Var("course")), { //Subtract course seat by 1
      data: { quota: Subtract(Select(["data", "seats"], Var("course")), 1) }
    }),
    Update(Select("ref", Var("student")), { //Subtract the credit by price
      data: {
        credit: Subtract(
          Select(["data", "credit"], Var("student")),
          Select(["data", "price"], Var("course"))
        )
      }
    }),
    If(
      GTE(Select(["data", "quota"], Var("course")), 0), //double check course is GTE 0
      Var("course"),
      Abort("Course sold out! stage 2")
    ),
    If(
      GTE(Select(["data", "credit"], Var("student")), 0), //double check credit is GTE 0
      Var("student"),
      Abort("Not enough credit! stage 2")
    ),
    Create(Collection("course_students"), { // create record
      data: {
        courseID: Select("ref", Var("course")),
        studentID: Select("ref", Var("student"))
      }
    })
  )

@dominwong4 all fauna queries work like transactions, that is, the whole thing succeeds or the whole thing fails and no parts of it take effect.

That’s slightly different from concurrency and considering “serialization” or “strict serialization”.

The isolation levels page and Brecht’s responses explain that read-only queries meet “serialization” standards, but might not happen strictly in the order received (due to global distributed nature). but once there is a write to a doc or serialized index or unique index, anywhere in the whole of a query, extra steps are taken to ensure “strict serialization”. I.e. multiple transactions will be processed in order submitted, and all DB shards will agree about it deterministically.

1 Like