Index with empty resultset -> lot of read ops?

Hello,

I’m struggling with a weird behavior: paginating over a match-index with empty result set costs hundreds of read ops, while doing the same over a non-empty result set costs very few ones.

E.g.

The following query costs 340 byteReadOps even if result is 0 (there’s no record for “byStatus” index):

        Count(
          Union(
            Map(
              ["value"],
              Lambda(
                "x",
                Intersection(
                  Match(Index("byX"), Var("x")),
                  Match(Index("byStatus"), "BLOCKED")
                )
              )
            )
          )
        )

The following query costs 5 byteReadOps even if result is 5 (there ARE records for “byStatus” index):

        Count(
          Union(
            Map(
              ["value"],
              Lambda(
                "x",
                Intersection(
                  Match(Index("byX"), Var("x")),
                  Match(Index("byStatus"), "CLOSED")
                )
              )
            )
          )
        )

Moreover, I noticed that if I try to set a Union on 2 indexes with empty result sets, the query goes often in timeout… which is probably a consequence of the huge amount of read ops those indexes are causing.

What am I missing? How to avoid this behavior?

Thank you so much!

Claudio

Just did another test: it seems the index is acting weirdly only with some matching parameters.

Following query returns 13 records using 346 byteReadOps:
Paginate(Match(Index(“Slot_by_status”), “WAITING”), { size: 1024 })

The following one returns more than 50 records using 2 byteReadOps:
Paginate(Match(Index(“Slot_by_status”), “CLOSED”), { size: 1024 })

The following one returns 0 records using 346 byteReadOps:
Paginate(Match(Index(“Slot_by_status”), “BLOCKED”), { size: 1024 })

So, it doesn’t seem to be related to an empty or not-empty result set… it seems some matching parameters are causing a lot of reads.
And as I wrote in previous post, if I do an Union with “WAITING” and “BLOCKED” status, the query goes often in timeout.

Why the index is OK while searching for “CLOSED” and very inefficient while searching for “WAITING” or “BLOCKED”?

Need to add more details, because this is getting more and more weird.

First of all, I started troubleshooting read and write ops due to a very high number of operations shown up in the dashboard: I got an average of 60k ops (out of 100k of the free tier) in the last weeks.

I got ops data from EVERY query used in the code, and a part the strange byteReadOps described above, readOps and writeOps are always very low (with the requests I got on the website I should have ops 2-3 magnitudes lower).

Moreover, this is from my dashboard:

I more or less didn’t use the shell at all in the last days… 20k operations in the last 7 days with almost no usage???

Is it possible that the dashboard is showing wrong data? But also costs are wrong in this case…

Hi @tiaccadi and welcome!

Thanks for all of the details! I believe what is happening here is that you are accumulating history in your Indexes. There’s a few other things, going on in your question, so I hope I can touch on all of them.

Read-Ops vs. Byte-Read-Ops

Quick clarification: the read-ops and write-ops metrics are deprecated in favor of the byte-read-ops and byte-write-ops. byte-read-ops and byte-write-ops take into account the amount of data that is read/written, either directly to Documents or to Indexes (I’ll talk about this more below).

The former headers are still sent for backwards-compatibility, but should not be relied upon. ONLY byte-read-ops and byte-write-ops are used for billing and displayed on the Dashboard. Please see the Docs Billing page for additional details.

Dashboard operations

There is a delay of several hours for the Dashboard to update your current operations, but otherwise the Dashboard will be accurate.

What I note is that the 20k read ops you circled are “Dashboard/Shell/Playground” ops. This covers all of your activity in the Dashboard. Please be aware that all of your navigating around the Dashboard accrues costs.

Here is a helpdesk article that walks through how operations from the Dashboard are calculated.

Reading Index history counts for Read Ops

This indicates that you have a lot of history built up for these indexes.

Immediate recommendation

I’ll explain more, but I strongly recommend that you set the history_days setting for the Indexes’ source Collection to zero, or the lowest that your application will allow. The default is 30 days.

Fauna’s temporality feature means that you can query an Index at any time. That means that when you read an Index, it reads the history too.

Possible explanation of your query costs

From the examples you’ve shared, I guess that your Documents have a status field that changes over time. When you query Paginate(Match(Index(“Slot_by_status”), “BLOCKED”), { size: 1024 }) for example, you might get back 0 results because there are currently zero Documents with the BLOCKED state. However, you likely have A LOT of history. The number of Ops suggests you are reading over a Megabyte of data from your Index to get the result you need (346 ops * 4kB/op). That’s possibly over 10000 Index entries, but actual number could vary wildly depending on your data – point though is there is a lot of history.

If the status field for your Documents are updated frequently – for example, if they toggle back and forth between WAITING and BLOCKED many times before reaching CLOSED – then the history can build up very quickly.

2 Likes

Hey @ptpaterson ,

thanks for all the details and clarifications! Those are enlightning!

I just set history_days of all the collections to 0: I don’t need to query back in time at all.
By the way, my records don’t change very often (usually those records just go from WAITING to BLOCKED to CLOSED… that’s it)… so, it’s still a bit weird.

How much time the change will require to reflect to the queries? Just tested the same query immediatly after the change and the byteReadOps are the same (347).

Again, thank you so much!

You are welcome.

It can take some time yet to take effect. Removal is handled by a background task, so once the history “expires” (which is immediately since you set history_days to 0), it could still possibly be hours or days.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.