Question about read transactions and isolation

Hi! I’m considering using Fauna in my project and I have one question before starting to experiment.

I’m building a web app that needs to load all data related to certain project from the database when user starts working on the project. The client app will then write to the db using small transactions. It’s very important that when the whole project is initially loaded, the read transaction is done with serializable isolation guaranteed. I understood that with Fauna this would be possible without any limitations, so if I need to load 10 000 documents initially no other client can mess up the data by a concurrent write?

I was considering DynamoDB first but was surprised to learn that its read transactions are limited to reading max 25 items. That makes it practically unusable for my project. Then started looking at PostgreSQL which masters isolation. But I’d like to get the benefits of document based approach, did some googling and found out about Fauna.

You can definitely read multiple documents at once. The Paginate function has a default of 64 documents per page, but can be adjusted to return up to 100,000 documents per page.

At that limit, and with large documents, your queries might time out or bump into transaction size limit. So you may need to execute a number of Paginate queries that return subsequent pages of results.

To guarantee result stability when paging over a large result set, you can use Now() to fetch the current timestamp, and then in subsequent queries, use At(<timestamp>, Paginate(...)) so that subsequent creations, updates, or deletions don’t influence the pagination you’re working through.

Ok, thanks! Makes sense. I’ll go ahead experimenting.

In my app a realistic estimate for document count for one project is 5 000 - 10 000 and the documents are quite small. Could I read them all in one query? I did this:

client.query(
    q.Map(
        q.Paginate(q.Documents(q.Collection(clc)), { size: 100000 }),
        q.Lambda('ref',
            q.Get(q.Var('ref'))
        )
    )
)

Is that ok, or would it be better to use just the Documents function in the first query to get all the document refs, then make a new query for each of those refs to fetch the actual document data, and use timestamp there to ensure the document data is the same as it was during the first Documents query?

If you are looking for consistency when fetching many documents, using the least number of transactions can be helpful, since that operation is easier to reason about, and reduces query overhead.

You can definitely return just the references, and then fetch documents as you need them. Fetching all documents would be slower when fetched individually (over multiple queries), and you might have to handle some queries being slower than others (resulting in out-of-order responses).

1 Like