Expected read latency from California on Classic Region Group?

Hi there, I’m on the classic region group, which is supposed to have 2 replicas in US and one in EU according to this article: Region Groups :: Fauna Documentation

It doesn’t really say which cities the replicas are located, however, so I’m trying to figure out if the latency I’m seeing makes sense. From my laptop in California, making this super minimal query getting a tiny document (815 bytes according to fauna’s query stats) by ref:

q.Get(q.Ref(q.Collection('apps'), '327798305001046612'))

I’m seeing latencies of 120ms+, with frequent spikes up to 200ms+.

Does this latency make sense given the locations of the replicas? If there’s a replica anywhere near the west coast, these latencies feel rather slow to me, given that the internal read latency in fauna is supposed to be sub 10ms in classic region group: Fauna Status

Hi @lewisl. The x-query-time response header/metric will tell you the internal latency of your queries. You can subtract this from your total latency to calculate the external component.

Note that the status page is reporting the 50th percentile of Read/Write times. I.e. half of all queries should be expected to take less time and half more time. Regardless, the internal latency will be reflected in the x-query-time response header/metric.

It is difficult to assess your total latency since there are so many factors that can affect it. What code is running between the points you are recording start and end-times to calculate the total latency? What other activity is your machine performing at the same time? What’s going on in your local network and the internet between you, your ISP, and Fauna?

Thanks for the follow up @ptpaterson. Here’s the code I’m using to test:

  const start = performance.now()
  await fauna.query(q.Get(q.Ref(q.Collection('apps'), '327798305001046612')), {
    observer: (response) => logger.info(response.responseHeaders),
  })
  logger.info({ duration: performance.now() - start })

As you can see there’s no code in between the query execution and the measurements outside of the fauna client.

This logs something like this in the typical case:

[api] [1655764121198] INFO:
[api]     :status: 200
[api]     x-txn-time: "1655764121170429"
[api]     x-compute-ops: "1"
[api]     x-read-ops: "1"
[api]     x-byte-read-ops: "1"
[api]     x-byte-write-ops: "0"
[api]     x-write-ops: "0"
[api]     x-query-time: "2"
[api]     x-query-bytes-in: "63"
[api]     x-query-bytes-out: "313"
[api]     x-storage-bytes-read: "815"
[api]     x-storage-bytes-write: "0"
[api]     x-txn-retries: "0"
[api]     x-faunadb-build: "220615.003119-e99a2f0"
[api]     content-length: "313"
[api]     content-type: "application/json;charset=utf-8"
[api] [1655764121198] INFO:
[api]     duration: 144.39999997615814

According to the headers, the internal latency is 2ms, but the total roundtrip time took 144ms.

My internet connection shouldn’t be contribution too much inherent latency, as you can see from results here:

I’m really just looking to figure out if this latency is in the expected ballpark given the physical location of the fauna classic region group replicas, or if there’s some room for further optimizations that I’m missing somehow. From the latency numbers alone, it doesn’t feel like I’m hitting a server on the west coast.

Hi @lewisl sorry for the late response. I’ve been out-of-office then sick then taking some time to run some tests of my own on this latency thing.

Can you run your performance test again, but this time run multiple requests, serially, in the same script? Then compare the duration for the first request to the subsequent ones.

For me, in Ohio, I am recording initial requests about 250-400ms longer than subsequent requests.

TLS handshakes can add 100ms or more to network latency

This is not unique to Fauna, but is common to any connection over HTTPS. A TLS handshake introduces a couple more round-trips across the network and cryptographic compute before you and the server even agree to swap data.

It scales with the network latency a lot since there are so-many trips required.

The drivers reuse connections whenever possible

Subsequent requests may show lower latency, because the clients can reuse the TLS connection once established.

Test script

here is a test script that I ran

const run = async () => {
  const execute = async (i) => {
    const start = performance.now()
    let queryTime = null

    await fauna.query(
      q.Get(q.Ref(q.Collection("users"), "319254125213646912")),
      {
        observer: (response) =>
          (queryTime = parseInt(
            response.responseHeaders["x-query-time"]
          )),
      }
    )

    return {
      queryTime,
      duration: performance.now() - start,
    }
  }

  try {
    const initial = await execute()
    console.log('Initial request', {
      ...initial,
      externalLatency: initial.duration - initial.queryTime,
    })

    const NUM_REQUESTS = 30
    const queryTimes = []
    const durations = []
    const externalLatencies = []
    for (let i = 0; i < NUM_REQUESTS; i++) {
      const subsequent = await execute(i)
      queryTimes.push(subsequent.queryTime)
      durations.push(subsequent.duration)
      externalLatencies.push(
        subsequent.duration - subsequent.queryTime
      )
    }

    console.log("subsequent averages", {
      queryTimes: mean(queryTimes),
      durations: mean(durations),
      externalLatencies: mean(externalLatencies),
    })

    console.log("subsequent median", {
      queryTimes: median(queryTimes),
      durations: median(durations),
      externalLatencies: median(externalLatencies),
    })
  } catch (e) {
    console.error(e)
    throw new Error(e)
  }
}

run()
Initial request {
  queryTime: 4,
  duration: 368.5752210021019,
  externalLatency: 364.5752210021019
}
subsequent averages {
  queryTimes: 2.033333333333333,
  durations: 36.156075533231096,
  externalLatencies: 34.122742199897765
}
subsequent median {
  queryTimes: 2,
  durations: 36.27919600903988,
  externalLatencies: 34.2124989926815
}

Ah nice, looks like that’s the culprit!

When I ran your script verbatim (asides from the ref in the query itself), I was getting similar results: subsequent latencies in the ~30ms range.

To give you a fuller picture of how I was testing before, it looked something like this:

fastify.get('/fauna/test', async () => {
  const start = performance.now()
  await fauna.query(q.Get(q.Ref(q.Collection('apps'), '327798305001046612')), {
    observer: (response) => logger.info(response.responseHeaders),
  })
  logger.info({ duration: performance.now() - start })

  return ''
})

I was actually visiting the /fauna/test API endpoint from the browser and refreshing in order to trigger this repeatedly (not very scientific, I know).

When trying again, with the TLS handshake overhead in mind, I began to refresh in quick succession, and indeed latency became ~30ms after the initial request. However, it looked like the timeout on the TLS connection reuse is rather aggressive, somewhere on the order of ~1 second. If I leave more than ~1 second between refreshes, latency immediately climbs back up to ~100+ms.

I then checked the docs to see if the timeout is configurable, and indeed it is: fauna/faunadb-js: Javascript driver for FaunaDB (github.com)

Looks like the default is 500ms, and we can set a max of 5000ms, which should be a bit more reasonable for my use case where I don’t have a enough traffic to keep the connection reused indefinitely yet, but still want to keep consecutive requests triggered by a human within the same session fast.

Thank you so much for looking into this with me!

2 Likes