is it normal for faunaDB’s latency to be sporadically over 5 seconds?
I have been casually monitoring the round-trip-time to make a request to FaunaDB via the GraphQL endpoint, and ive been noticing response times of 5-12 seconds, but only sporadically throughout the day.
My serverless functions have a timeout of 10 seconds, and as a result i am noticing timout errors on my serverless api endpoints daily (only really starting being a daily nuisance this week)
The exact same request will usually have a round-trip-time of 300ms, so i am pretty sure its not an issue of requesting too much data.
(I am using netlify Functions, a wrapper of AWS Lambda).
is this a normal issue with FaunaDB? should i take the time to implement a client-side re-try when i get a serverless timeout? should i increase the timeout past 10 seconds? or is there something that i might be missing about how to properly query fauna? how is everyone else handling this? (edited)
@askrzypczak The Sporadic nature of the issue has been real challenging to trace and troubleshoot. We have identified couple of places in GraphQL API layer which could cause this latency. Will keep you posted as we continue to troubleshoot. @bez just curious, are you seeing similar issues ?
@Jay-Fauna thank you for the update! Would it be fair to assume that this issue can be worked around by switching my queries to FQL instead of using the GQL endpoint?
In case anyone is interested in my current workarounds:
After discussing with Jay, I split my queries into multiple smaller ones (I had some queries that would join over 4+ collections using many-to-one relations). This reduced the frequency at which the latency reached over 10s, but did not fully resolve the issue.
On the browser side, I implemented a simple one-time re-try of any request that failed after more than 10s had passed since the request was sent. A single re-try was sufficient to avoid any errors on the browser side, as the second request almost always has a latency of < 300ms
@askrzypczak varying latency issue is primarily due to the core architecture of the product with respect to remote reads and GC functionality. Which means you would see it through all the drivers and API. But with the drivers one reasonable way to handle it gracefully is to specify a user-specified timeout and retry to give better experience. Unfortunately this is missing in GraphQL API.
We are working to improve it but is a long term effort.
To add to this, GraphQL API prechecks every GraphQL query for schema enforcement, thus increasing those varying latencies. Next couple of GraphQL API releases would include multiple enhancements (refactoring, client specified timeout, better schema import experience, error handling… etc) which would should address these.