Fauna strange/erroneous byteReadOps and storageBytesRead values

Oliver_Weller-Davies · June 2, 2022, 9:48pm

Hey,

So far I’m loving FaunaDB (!) and would love to continue using it for my application, which I hope to launch in the next few months.

I was recently trying to understand how many byteReadOps I was using. I wanted to reduce this amount as I found the number displayed in the dashboard to be strangely high (almost 200k).

After playing around in the shell, I noticed some behaviour that I thought was odd, and it would be good to get some clarification on this.

Here’s what I did, in order:

I first created a document with a small nested array of strings/numbers.
I queried in the shell for this document; it costed 1 byteReadOp and 293 storageBytesRead, as expected.
I then added more elements to the array, and I queried for the document as before. It then costed 2 byteReadOps and an appropriate number of storageReadBytes (just under 8kb). Again, this was expected.
However, after this, I then removed some entries from the document, and I queried again. To my surprise, the number of byteReadOps increased to 3, and the number of storageBytesRead increased to 11kb, even though the size of the document decreased (?).
To investigate further, I then removed all the extra data so that the document was the same as it was in step 1. I queried again, and the number of byteReadOps remained at 3 (?), although the number of storageBytesRead did slightly decrease to just over 8kb.

So reading the same document with the same data is now requiring 3x the number of byteReadOps.

As a final sanity check, I then copied the data in this document and made a new document. Again, I queried from the shell, and it costed 1 byteReadOp, as expected.

Here is a screenshot from the shell showing this behaviour. (I’m limited to posting only one screenshot as I’m new.)

A simple document, retrieved with a simple Get(Ref(Collection(“Test”), “333389064662155855”)) operation, requiring 3 byteReadOperations.

I really like Fauna, and I hope to use it in production soon, but it would be nice if I could understand this further.

Is this a bug or am I misunderstanding something?

All the best,
Oliver

Oliver_Weller-Davies · June 2, 2022, 10:23pm

Update: after waiting some time (a couple of hours), the cost of the query has decreased to 2 byteReadOps and just over 5kb of storageBytesRead. (I haven’t changed the data.)

This is surely a bug?

ptpaterson · June 3, 2022, 12:48pm

Hi @Oliver_Weller-Davies and welcome!

EDIT: There may be some more nuance affecting the Reads here, Oliver. So I will redact my previous comment until we finish investigating some more.

ptpaterson · June 3, 2022, 1:12pm

@Oliver_Weller-Davies, it sounds like you were testing and investigating costs originally because of the 200k Read Ops that you found surprising. I’d be happy to dig into that more with you, too.

My first thought is to ask how much the dashboard says the ops come from the dashboard itself. We have an article that walks through how the dashboard can incur costs.

For the rest of the read cost:

Do you have many Documents that contain large arrays? Are you Indexing on those arrays? Are you updating the documents frequently, or are your updates mutating fields that affect your Indexes?

Are there other specific queries that you need help with explaining the cost?

Oliver_Weller-Davies · June 3, 2022, 3:51pm

Hey, thanks for such a prompt reply! I wasn’t expecting to hear back so soon; it helps bolster my confidence that choosing Fauna was a good decision

I see what you’re saying now about the log-structured merge-tree (from the redacted comment). It’s now feasible to me that reads can cost more after expanding and then shrinking the document size.

Just for clarity (and posterity), it may help adding some documentation about this (or what you’ll follow up with) somewhere. Fauna’s implementation details can be pretty opaque, especially for people who aren’t that familiar with database implementation details; however, pricing is something that all clients should be able to reason about fairly easily.

I’m guessing this still doesn’t explain why shrinking the document size may actually increase the number of byteReadOps (perhaps this is why the comment was redacted), but maybe I’m wrong.

Anyway, thanks again for the prompt reply — and sorry for being overly keen to call out the behaviour as a bug. I just found it slightly unintuitive.

I look forward to hearing the follow up.

All the best,
Oliver

Oliver_Weller-Davies · June 3, 2022, 4:56pm

Thanks a lot for offering to help.

The 200k reads seemed very high at first glance. However, thinking about it for a moment, there’s likely an obvious explanation.

Whenever I save my code during development, it causes a ‘fast refresh’ of the app. Every time this happens, we make a rather expensive set of queries to Fauna.

I’m spamming the save key all day as I rely on it for code formatting. I also probably consume far too much caffeine, which makes me attack the keyboard with the same APM as a Starcraft pro (unfortunately this is not correlated with productivity). I guess it all adds up.

I do plan to return to this with a more rigorous analysis. However, if you don’t mind, I’d like to understand a bit more the way that indexes and TTL affect read/write costs.

For indexes, looking at Billing - Fauna Documentation, it says:

One write operation is counted when up to 1k of document + index data is written. When a 1KB document is modified, and every field is indexed, two write operations are counted.

Suppose I have an index that has a single Number term, presumably 8 bytes, and it points to a single Ref value, which (let’s say) is 32 bytes. Does the following then apply:

If I add a new 0.5kb document to the collection, then will I incur roughly 0.5kb + (8 + 32) = 0.54kb of storageByteWrites? (and therefore only one byteWriteOp?)
Similarly, if I have N identical indexes like this for the same collection, would I have 0.5kb + N * (8 + 32) storageByteWrites?
If the above was true, If N = 15, then would I incur 0.5kb + 15 * (8 + 32) = 1.1kb storageByteWrites, and therefore 2 byteReadOps?
If I have N indexes, again all indexing over the same term, and if I modify the document, would I be charged a similar amount to above? (i.e. one byteWriteOp for small N).
However, if every field in the document happened to be indexed, I would always incur at least two write ops? From Billing - Fauna Documentation it says “When a 1KB document is modified, and every field is indexed, two write operations are counted.”

Sorry if this is a lot of text. I’m assuming I’ve got at least one thing wrong, so it would be nice get some clarification on my mental model.

I also wanted to check the following: unlike writes, are read prices uncorrelated with indexes when you’re not using the index for the read query? In other words, if I have N indexes for a collection, will I be charged the same amount for a simple Get(Ref(Collection(“X”, id))) query if N = 15 and if N = 0?

I also wanted to understand the costs associated with temporality a bit more. I vaguely remember reading that you should lower your TTL if you can. However, looking at Billing - Fauna Documentation, the only costs associated with TTL are with respect to storage. Are there other ways in which TTL can affect read/write costs?

I understand this is a large set of questions, which are somewhat unrelated to the initial post, so I’d be very appreciative of any help you can give me here.

Thanks a lot.

All the best,
Oliver

ptpaterson · June 3, 2022, 6:18pm

I’ll answer this one first: yes

Per the docs: “One read operation is counted when up to 4KB of any document is read from storage.” Get’s only fetch the latest version of your document. What we were discussing above is how that read from storage part works, but it’s still just the one Document. The cost only reflects what it takes to fetch the one Document and is not affected by any Indexes.

The docs are giving one example in which a Write Op due to Index writes could be counted. Specifically, in the case that a single Document is about 1kB it follows that the data would be spread out across the Documents fields. So if you Index each and every field, you will most assuredly write at least an accumulated 1kB of data to to those Indexes.

Index entries are tuples of data, based on whatever values you define. If the sum of all Index tuples created in a single transaction exceed 1kB, then that will cost another Write Op. So the example in the docs presents a situation where this will definitely happen.

Regarding the math, for the most part we can ignore the exact values of a single Index tuple, but for a better frame of reference:

numbers are stored as 64-bit integers or floats.
References are not just a simple, thin pointer. ID’s are themselves stored as 64-bit integers. A Reference is a special type that contains the Document’s ID and a Reference to a Collection, which contains its own ID and a reference to the Database… ~~So, bear in mind that a Ref is about an order of magnitude larger than in your example.~~

EDIT: I thought you said “32-bit” but you said “32-byte” for Refs. That’s a better approximation! Also 8-bytes for integer = 64-bits Sorry for patronizing.

Your equation is a good approximation. Indeed, if you have multiple indexes on the same document, the writes for each Index are aggregated and then you get size/1kb Write Ops.

Oliver_Weller-Davies · June 3, 2022, 6:41pm

Hey, thanks for the reply.

This helps a a lot: for write costs, we should just think in terms of total bytes written across the document and indexes combined, and there’s no special case when every field is indexed (I think I slightly misread the docs there). This makes a lot of sense, thank you.

Thanks again for your time.

All the best,
Oliver

ptpaterson · June 3, 2022, 6:49pm

I updated my response above. I had thought you were comparing Get to paginating all 15 Indexes – which was wrong to think.

A Get call will cost the same no matter how many Indexes there are covering that Document.

Oliver_Weller-Davies · June 19, 2022, 1:46am

Hey, I saw you changed your documentation on read op costs, indicating that the history of the document is also taken into account.

As illustrated in this forum post: Help understanding read ops

I think this answers the question, so I’ll mark this as the solution.

system · June 21, 2022, 1:47am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mysteriously High Read Ops Help	4	624	May 4, 2021
Read ops/pricing Help	16	1404	April 25, 2021
Found 359 reads for my account which has 1 database with 1 collection with 2 documents and 2 indexes - what is going on here? Help bug	9	480	November 10, 2021
Question about read transactions and isolation Help transactions	4	253	May 9, 2022
Mysterious Shell Read and write ops Help dashboard , billing , read-ops	3	636	December 8, 2021

Fauna strange/erroneous byteReadOps and storageBytesRead values

Related topics