Boosting FaunaDB Query Performance: Seeking Expert Advice

I’m encountering performance bottlenecks while querying a growing user database and could really use your expertise. I’m specifically focused on efficiently retrieving and paginating user records based on their status.

My current approach involves using a Paginate query with an index on the ‘status’ field:

Code snippet

Paginate(
  Match(Index("school_query_reverse"), "active"),
  { size: 10 }
)

While this works for smaller datasets, query performance is deteriorating as the database expands. I’ve experimented with pagination size adjustments and index refinements but haven’t achieved the desired performance gains.

I’m eager to learn from the community about:

  • Index Optimization; Proven methods for creating and fine-tuning indexes to maximize query efficiency.
  • Pagination Best Practices; Strategies for handling large datasets while maintaining optimal pagination performance.
  • Common Performance Pitfalls; Avoiding common mistakes that can hinder query speed.

I have tried some suggestions rescourse/article: How to paginate index which has terms and value ms azure tutorial - Fauna, Does anyone know how to fix this issue?, but the error persists.

Any insights, tips, or recommended resources would be immensely helpful. I’m committed to unlocking FaunaDB’s full potential and ensuring my queries are both fast and scalable.

Thank you for your time and assistance.

Hi @ridabaig! I have some initional questions. Can you provide the following? We’ll need some more information to give you more specific advise for your application.

  • What is the index definition? (output of Get(Index("school_query_reverse")))
  • What is history_days set to for the source collection?
  • Can you share an example document that would match the index?
  • How often do you need to paginate through the entire Set?
  • How many different values are there for status, and how often to they change?

In general, I can offer a few pointers for paginating a single index that you notice is degrading in performance:

  • Limit the history of your collections as much as possible.
    • Index reads include historical versions. If you have history_days set to > 0 and have any significant amount of changes, those historical versions will increase the read and latency cost of your queries.
    • If you set history_days: null, that actually means infinite history retention, which will absolutely begin to bog down index reads if there are ongoing mutations to your index. If you want zero history, then set history_days: 0. 0 days is now default, but as you are using v4, you may have an older Collection which had default 30 days.
  • Low-cardinality indexes should be avoided when possible.
    • Booleans and fields that have only 2 or three values do not perform as well as indexes with higher cardinality. They can cause “hot spots” where a single db node is doing all the work to fetch results.
    • You can mitigate by adding additional terms in order to increase cardinality. This reduces the resulting Set size and spreads the data out more evenly, but does mean that you may need to combine the results of multiple index reads.
  • You don’t always need indexes when filtering works efficiently.
    • If your collection contains a high percentage of documents with the active status, then it may actually be more helpful to filter the primary index.
    • e.g. (FQL v10) schools.where(.status == active)