Scaling query with counters

madsem · March 2, 2022, 3:59pm

With the help of @ptpaterson I was able to get my upsert & delete queries to work.

The next part of my journey is:

Read the data, during a redirect
Select the key with the least amount of hits
Update the hits for that key
Redirect user

The high-level goal is: Evenly redirect users for A/B testing purposes. And eventual consistency is not enough, it has to be as even as possible, also at low volumes.

Now, @ptpaterson already let me know that this pattern will lead to contention problems, at relatively low loads even…

I would love some help on how I can make this work, without having to worry about failing redirects (The users we redirect, are from paid ad campaigns. So it would be pretty bad for me and my job, if at some point redirects start failing because of contention haha)

My schema currently looks like this:

.query(
      CreateCollection({
        name: "landing_page_redirects",
        history_days: 0,
        ttl_days: null,
      })
    )
    .then(() => {
      client.query(
        CreateCollection({
          name: "offer_redirects",
          history_days: 0,
          ttl_days: null,
        })
      );
    })
    .then(() => {
      return client.query(
        CreateIndex({
          name: "landing_page_redirects_search_by_uuid",
          source: Collection("landing_page_redirects"),
          terms: [{ field: ["data", "key"] }],
          unique: true,
        })
      );
    })
    .then(() => {
      return client.query(
        CreateIndex({
          name: "offer_redirects_search_by_uuid",
          source: Collection("offer_redirects"),
          terms: [{ field: ["data", "key"] }],
          unique: true,
        })
      );
    })

The data for lander_redirects looks like this:

{
  "ref": Ref(Collection("landing_page_redirects"), "325031451757118028"),
  "ts": 1646233296003000,
  "data": {
    "key": "f90b38d9-77d7-47f1-920c-79b2b4bb2c8d",
    "value": {
      "542e2364-5911-4b7e-b5f8-e29a5461cca0": 0
    }
  }
}

For offer_redirects it looks like this:

{
  "ref": Ref(Collection("offer_redirects"), "325031764682605131"),
  "ts": 1646233296000000,
  "data": {
    "key": "542e2364-5911-4b7e-b5f8-e29a5461cca0",
    "value": [
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca1",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      },
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca2",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      },
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca3",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      },
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca4",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      },
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca5",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      },
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca6",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      },
      {
        "key": "542e2364-5911-4b7e-b5f8-e29a5461cca7",
        "value": {
          "95c2cca9-b086-4e36-bda3-50096fb95711": 0
        }
      }
    ]
  }
}

The int values in my value properties, were supposed to hold the amount of hits.
That was, until @ptpaterson explained what will happen if I do this.

Question is now, how can I achieve the goal mentioned in the beginning of this post, without running into contention issues & broken redirects.

Thank You!

PS: I’ve read the best practices, as well as the post on scaling queries. I must confess, I’m not a DB guy and mostly understood, well, nothing really. Haha.

A hands-on example would be highly appreciated.

madsem · March 2, 2022, 4:28pm

Actually, I think I now understand after reading How to update post count and vote count counters - #4 by Bryan_Fauna.

I can see how this could work in my case, by adding a hitLog collection, and a aggregatedHits collection.

The hitLog would be just using insert during a redirect, right after I queried the current key/value from my first post.

And a scheduled Netlify function could populate the aggregatedHits collection, at specific intervals. (As often as possible).

Questions and concerns I have now:

Should those two collections also have indexes?
Can the hitLog & aggregatedHits be built by day?

EDIT: As Netlify’s scheduled functions aren’t production ready, maybe a solution could be:
Not have an aggregatedHits collection, only a hitLog.

Then flow could look like:

User clicks link
Query current data, join with hitlog data from today
select key with least hits
Write new new document to hitlog
Do redirect

Then set documents in hitlog to have a ttl of ~48 hours.

Does that make any sense and would allow me to avoid contention entirely?

An alternative idea:
Don’t have a hitlog either, only the two collections I have now.

user clicks link
Get the latest document for the uuid that was requested
Grab the key with the least hits
Increment the value in JS
Create a new Document with incremented values
Redirect user

I hope this makes sense? But unsure if this would solve the contention issues at all? And maybe have another process deleting documents older than 48 hours.

ptpaterson · March 7, 2022, 2:26pm

It might be easier to first describe why this won’t solve the contention issue, to help you think about it.

You want to read the latest document for a given uuid. This will require an Index read. Specifically it will require reading a particular “Set”, the Match on the Index with the uuid as a term. If you create a new Document with the same uuid, then you will be writing to the Index, which means updating the same Set from which you were reading. It’s this reading and writing to the same Set that will cause contention in this case.

The solution the ensures you avoid contention is one where the reads ONLY perform reads, and the writes are consolidated in some way to spread them out and avoid contention.

The basic template for the Event Sourcing pattern provides reads through the snapshot/aggregated collection + most recent logs. Reads are contention free because they are READ ONLY and read only queries by definition do not cause contention. Writes to the log collection are contention free because nothing is trying to read AND write to them in the same transactions. Writes to the snapshot/aggregation collection are contention free because there are few of them spread out over time.

NOTE: I said writes to the logs are contention free because nothing is reading and writing in the same transaction. To be more specific, the pattern avoids any read-write queries involving a particular resource, regardless of which transactions are reading and which are writing that resource. For example, contention can still occur if you read the contended resource while writing to something different at the same time another transaction is writing to the subject resource. This ensures that the writes in your queries meet strict-serialization guarantees.

Unfortunately, you might be in a bit of a pickle, because it seems the semantics that you need are: “Each time you read a document, update the document”. Which is by definition unscalable due to contention issues.

In this flow, you are still reading a Set and updating that Set in the same Transaction. – in this case, the latest hitlogs. You can avoid contention by writing a new Document to the hitlog in a separate transaction, but that would mean eventually correct values – between reading one redirect and updating the hitlog another request might come in and have already read the same counter. I think that in this scenario, the final tally will be correct, so at some point things will average out. However, you indicate that that may be unacceptable.

EDIT:

To clarify the pickle, we’re trying to figure out how to do this in a stateless way, which is limiting options. What I mean is that the redirects are being issues in a Lambda (Netlify Functions), and the lambda has no shared state from one call to the next.

If you had a service with some state, then the service can rotate through the known redirects and create a log when one is issued. Or you could create a queue service which is prefilled with redirects, and create a log as redirects are popped off of the queue. You can run the aggregation/snapshot task to calculate the initial state for such a service. Both of these options I think would rely on a single entry point, which has its own scalability issues, but would get you to WAAAY more than contention at 10 requests per second.

I imagine, though, that the whole reason you are working on this redirection scheme is to leverage the lambda’s available in Netlify, so I don’t know how helpful this will be.

ptpaterson · March 14, 2022, 3:41pm

Hi @madsem! How is this going? Is this helping or do you have any additional questions?

ptpaterson · April 4, 2022, 3:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query performance issues, does not scale with documents Help fql	1	477	March 9, 2021
Paginate doesn't work as expected Help fql , bug	4	393	March 23, 2021
Correct way to implement a simple rate-limiting? Help fql , best-practices	2	393	September 20, 2022
Contended Transaction trying to update Record Help fql , best-practices	5	344	August 15, 2022
Error 429 - ERROR ThrottlingError: Rate limit exceeded Help fql	2	117	January 19, 2025

Scaling query with counters

Related topics