Help understanding read ops

Hello,

I have a simple FQL query :

Paginate(Match(Index('index_name'), false))

which is returning 49 tuples.
It costs 'x-byte-read-ops': '1438' which seams really high. I’m reading the billing doc here : Billing :: Fauna Documentation

The size of the returned document when stored as text is about 7kb.
So I would expect 2 read ops for the index read (2* 4kb). And a few read ops for the additionnal partitions.

Get(Index("index_name")) gives me the number of index partitions : 1

So I would expect something like 3 read ops. Not 1438… There is something I don’t understand.

Fauna is a versioned database. Each document update adds a new version of the keys+values for that document. Indexes also store index entries for each version of covered documents.

When Fauna performs reads, it does so using “pages”, which are 4k blocks of stored document data. When reads occur, from documents or indexes, some amount of history is also read in order to determine the correct version to use for the query at hand.

For a document with only one version, the math is simple. For a document with history, and for a query involving temporality, multiple 4k reads might be required to access the full document within that history.

We recently discovered that this was not properly described in our documentation, but we recently published an update to include:

One read operation is counted when up to 4KB of document history
is read from storage. A 0.5KB document with more than 32KB of
historical versions requires 8 read operations.

You can mitigate the impact of history, assuming that you don’t need it for your workflows, is to set history_days on a collection to zero (the default is 30). Doing so means that only the current version of a document is retained.

You could also identify documents that accrue history rapidly, and then use the Remove function to delete historical events from those documents. That would be especially useful for any “counter” documents that may have thousands or millions of versions.

Thank you.
It is more or less what I thought this was.

I changed my implementation. I wrote an UDF that does the same thing than replace, but also manipultes the history with Insert and Remove, so I have a flag attribute that is always true for most uptodate version, and false for all other historical versions. (It could be done for update to, but I only needed Replace).

With this new flag attribute, I am able to add a term in my index for this flag. When I match this index, I add True for this term. This reduced my read ops dramaticaly (while raising the write ops a bit beacause its needed for Insert and Remove).

Before the fix, for one full day :
48,9 read ops and 2,0 write ops (and the read ops that were raising everyday)
After the fix :
3,5 read ops and 4,3 write ops (and this is more stable).

While preserving full history.

My function if someone wants to do something similar :


def replace_and_update_history(ref, data):
    return q.let(
        {
            "ref": ref,
            "last": q.get(q.var("ref")),
            "last_ts": q.select("ts", q.var("last"))
        },
        q.do(
            q.replace(q.var("ref"), q.merge(data, {"data": q.merge(q.select("data", data), {"is_last": True}) })),
            q.remove(q.var("ref"), q.var("last_ts"), "update"),
            q.insert(q.var("ref"), q.var("last_ts"), "update", {"data": q.merge(q.select("data", q.var("last")),  {"is_last": False}) }),
        )
    )

(here it is defined in python but you could adapt this to your language of choice easlily).
Then you use is_last in your index as a term, and match on is_last=True. The history does not cost you anymore on reads :slight_smile: (but you have to pay more on writes)

PS : I think this mechanism should be transparent for the user (with the flag being not necessary visible to the user). When not in a “At” function, this could save a lot of read ops. When in a “At”, this would not be used, and change nothing.