Practical difference between strict serializability and serializability for reads

The isolation levels documentation indicates that read only transactions operating on documents only or serialized indexes are serializable, rather than strictly serializable. However, it also points out that a user can opt into strict serializability by using the “linearized” endpoint or adding a no-op write.

My question is: what is the practical difference of a strictly serialized read vs a serialized read? Is there a good example that demonstrates the behaviors of one over the other? I’d also be interested in knowing the performance differences, if there are any. Thanks in advance!

1 Like

Hi @rcausey,

My understanding is that reads by default will read everything that has been completed up to the point when the read is issued. If there are transactions that are not completed, they are not included (the read will not wait). That’s ‘serializability’.

Regarding ‘Strict Serializability’ - I think that the read will wait for any transactions already started to finish - but will exclude any transactions started after the read is issued. If you have a write heavy database, this is going to slow reads down if you are using ‘Strict Serializability’.

I do not work for Fauna, and I don’t know how the system is architected, so I could be wrong. It is probably best to wait for someone from Fauna to weigh in, still, hope this helps.

2 Likes

@Polecat7 has got it.

Read-write transactions have to go through Fauna’s transaction pipeline, which makes sure that each transaction takes into account the effects of the transactions before it.

Read-only, non-index reads are still “Serialized” even if they start after other read-write transactions but before any writes are actually applied. This is one limitation explained in the linked docs page:

The second, but related, limitation of serializability is that the picked order of transactions doesn’t have to be at all related to the order of transactions that were submitted to the system. A transaction Y that was submitted after transaction X may be processed in an equivalent serial order with Y before X.

Note that what you will never encounter is reading results from a transaction Y, submitted after transaction X, which includes some partial effects of transaction X. That is, Serializability ensures that whatever the order is (either X → Y, or Y → X), the result is the same as running one transaction completely, then running the next one (i.e. running them “in series”). Importantly, when discussing a distributed database, Serialization guarantees the same result across replicas.

When you use the linearized endpoint, you require that all requests go through our transaction pipeline, guaranteeing that what you read hasn’t changed in between the moment you read it and the moment the transaction is sequenced. That is to say, what @Polecat7 said :slight_smile:

Good reads

Definitely check out this blog if you can. It goes through the life of a transaction step by step.

Side note on “Reading your own writes”

All of our official drivers (as of this post) support reading-your-own-writes. The client tracks the “last seen transaction” time and forwards that along with your requests, which the database will use to guarantee the read happens after that last-seen timestamp.

1 Like

Thank you for your detailed reply @ptpaterson and the link. Very useful to know about “Reading your own writes”.

@ptpaterson thanks again for that link to Consistency without Clocks. Just read through it, and I could be wrong, but think there is a mistake in the image directly under the line “In the case of T11, the original reads are different from the reads of the correct snapshot:” it shows Washington DC with “T10’s Buffered Writes” but I think it is supposed to be “T11’s Buffered Writes”?

(Just some thoughts on things - not important to read if you are busy…)

Learning about the distributed transaction log reminded me of the solution I came up with to make sure clients got the most recent data (in a PHP/MySQL application I wrote 15 years ago; but which has been constantly updated with better data syncing). At the beginning I just used a time stamp of the last update; but it was unreliable (reads could skip writes that were not yet complete at the time of the read). I then updated server code to fetch data that was timestamp minus 5 seconds which sometimes resulted in duplication so the server side had to keep a cache of what it had seen before (for each client - though this cache was itself written to the DB and read back again on each reconnect), and therefore only pass new entries back to the client. But even this could result in occasional missed writes. It was not serialisable. What I then devised was to write each update to anything in the DB to a log table, and rather than use a timestamp, use the primary key of that table as a means to understand what had and had not been read. Since primary keys (in SQL) are in sequence, it became easy to understand that a write had not been read - so the server could wait a bit, and go back and try to read the log again until it had a complete block of in sequence primary keys. Once it had a chunk of logs that were in sequence it could read the data they related to, and pass that back to the client. Thus, time was no longer important. Of course, this didn’t necessarily pass the serialisability test, but it was much better.

The latest design I had (but which I never put into production - because I came across Fauna) was to use Redis for the log; so the load on the DB server would be reduced (web servers would only query the DB if they saw new logs in the Redis server), and if different DB servers in the replication were not quite in sync, the log would help with that.

Fauna is literally one of the most amazing database systems ever put together. Transactionally ACID compliant, serialisable, distributed, non-blocking reads - and it’s fast. I had fully designed a new system with Apache, PHP, SQL - the usual, and was about to start building it when I came across Svelte (and Richard Harris demo that he made a few years back), which lead me to Cloudflare, and then Fauna. My world changed forever.

I am so curious how Fauna has been implemented on the hardware and software level…

This is an aside, but are you going to set up servers in Asia at some point? I’m in Japan, and the service works very well, but I hope that one day you’ll get servers set up here too. Thanks!

Does that include the graphql driver?

1 Like

Excellent point @wallslide!!

We have an open issue to pass through other headers such as those for query metrics. I’ve added notes that we should also pass through the X-Last-Seen-Txn header.
Fauna doesn’t provide a GraphQL client, but most have some way of hooking into the actual fetch requests. You would have to extend the GraphQL client to persist a value, update it on every response, and send it out with every request. Once the API is updated, it may be appropriate to try out a small package in FaunaLabs for an Apollo Link, for example.

1 Like

Update regarding GraphQL:

The GraphQL service already leverages the drivers to keep the last-seen-transaction time up to date. Since the service proxies requests for many users, you can consider that the last-seen time will be at least as recent as your own requests. So there is no need to provide it yourself!

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.