How to update post count and vote count counters

Wanted your opinion on what is the best way to increment post count and vote count counters for users and posts.

The only way I could think of was to move post and vote document creation to UDF and then create the logic to update the counters on user document and post document when the post is created or vote is updated.

Is that the only way to do this?

Got the answer

function LikeFweet(fweetRef) {
  return Let(
    {
      account: Get(CurrentIdentity()),
      userRef: Select(['data', 'user'], Var('account')),
      fweetStatisticsRef: Match(Index('fweetstats_by_user_and_fweet'), Var('userRef'), fweetRef),
      fweet: Get(fweetRef),
      authorRef: Select(['data', 'author'], Var('fweet')),
      followerStatisticsRef: Match(Index('followerstats_by_author_and_follower'), Var('authorRef'), Var('userRef')),
      newLikeStatus: If(
        Exists(Var('fweetStatisticsRef')),
        Not(Select(['data', 'like'], Get(Var('fweetStatisticsRef')))),
        true
      ),
      popularityGain: If(Var('newLikeStatus'), 1, -1)
    },
    Do(
      // Update the fweet so we have an idea of the total likes
      If(
        Var('newLikeStatus'),
        Update(fweetRef, {
          data: {
            likes: Add(Select(['data', 'likes'], Var('fweet')), 1)
          }
        }),
        Update(fweetRef, {
          data: {
            likes: Subtract(Select(['data', 'likes'], Var('fweet')), 1)
          }
        })
      ),
      // Update fweet stats so we know who liked what
      If(
        Exists(Var('fweetStatisticsRef')),
        // Getting the same element twice has no impact on reads, the query will only get it once.
        Update(Select(['ref'], Get(Var('fweetStatisticsRef'))), {
          data: {
            like: Var('newLikeStatus')
          }
        }),
        Create(Collection('fweetstats'), {
          data: {
            user: Var('userRef'),
            fweet: fweetRef,
            like: Var('newLikeStatus'),
            refweet: false,
            comment: false,
            created: Now()
          }
        })
      ),
      // Update the follower stats so we can raise his popularity, this
      // has an impact on the feed that the user will see, every post he likes or retweets from an author he follows will
      // raise the popularity of the author for this particular user (this is kept in the followerstats collection)
      // return the new fweet with its stats.
      If(
        Exists(Var('followerStatisticsRef')),
        Update(Select(['ref'], Get(Var('followerStatisticsRef'))), {
          data: {
            postlikes: Add(Select(['data', 'postlikes'], Get(Var('followerStatisticsRef'))), Var('popularityGain'))
          }
        }),
        // We don't keep stats for people we don't follow (we could but opted not to)
        true
      ),
      GetFweetsWithUsersMapGetGeneric([fweetRef])
    )
  )
}
1 Like

Hi @rahulrakesh.

We want to highlight an important caveat of how the fwitter example performs updates to statistics fields, such as the number of likes, and therefore to how you may implementing your application.

This is an anti-pattern. It is not scalable, and will quickly run into write-contention errors

In such a query, you are both reading and writing to the “fweet” Document. That means that any other query that writes to the same Document – or even read that Document while writing to something else – can cause contention with this one. Contention can easily happen if there are only a handful of transactions a second, which is clearly not scalable.

All requests in Fauna are transactions. Fauna’s strict serialization rules ensure that transactions are handled in the correct order, but it also limits the rate at which concurrent updates can be made. Here’s a recent discussion about how multiple requests to update the same document can lead to contention errors. Transaction was aborted due to detection of concurrent modification - #11 by ptpaterson

High volume, concurrent updates on single records classically lead to high contention within all databases. Fortunately, there are schema design techniques that avoid this problem. In order to actively increment a “metrics” document or field as part of mutation, the safest route is to aggregate in “chunks” outside of your document, thus avoiding contented writes.

This comes up a lot, so we are pulling together a more complete example for how to do this at scale. I wanted to be sure to update your post now, to give a word of warning. We’ll update this ticket with additional information when we can.

1 Like

Paul’s advice is very good. This pattern will lead to contention when the volume of write increases. In cases where our users need to perform aggregations or counts on high volume workloads, we recommend use of the following a schema design pattern that fully avoids write contention. This schema design is applicable to all databases, and is based on two proven design patterns, called Event Sourcing and Command Query Responsibility Segregation, developed for use in high performance banking applications.

This schema design:

  • Fully avoids contention caused by rapid writes to a single record.
  • Is a “write-once, read-many” pattern that avoids updates to indexes (and in Fauna’s case, unbounded history growth).

Rather than incrementing a User’s “votes” field repeatedly (causing contention), your application should instead individual vote events as a single document in a separate collection, which we’ll call the voteLog collection. This avoids write contention as we’re only inserting new records all the time and nothing more. Here’s an example of what such a document would look like:

{
 ref: Ref(Collection("voteLog"), 1 ),
 ts: 1630107062470000,
 data: {
  user: Ref(Collection("Users"), 1 ),
  votes: 5
 }
}

In this example, this individual document records the votes earned by a User at a particular moment. In this document, the User has scored 5 votes. If in the next moment the user earns another 7 votes, we’d record those votes in a new document, where the votes field has a value of 7. Each document inserted into the database records the number of votes a user earns in each “round of play” in the contest. These documents can be thought of as single entries in an event log. We can determine the total votes the user has earned any time we wish by simply reading through all the documents we’ve inserted into the voteLog, and sum the votes we see in each document. Consider this stylized representation of the User’s points we’ve recorded in the voteLog collection.

In this illustration, a “scorekeeper” process is responsible for recording the User1’s score to the voteLog collection as a series of documents, each of which record the votes earned at each round of play. User1’s total score can be calculated at any time during the game by reading through the log and summing the points earned in each document. In this example, User1 has earned a total of 26 points.

If we are running a multiplayer contest, the votes of many users may be interleaved together in the voteLog collection. We can determine the current vote count for all user by simply reading through the log and summing the points and grouping them by user.

In this example, User1 has earned 9 total votes, User2 has earned 15 total votes, and User3 has earned 19 votes. In this example, we can determine what any user’s score is at any time we wish by simply reading through the log and summing points.

The data recorded in the voteLog collection contains all the data needed to present a leaderboard to users and spectators. All that is needed to determine the position on the leaderboard is to simply sort the users by the scores we calculated in the previous step.

This example illustrates the “write-once, read-many” strategy we are using here. The only type of writes we do in the database are inserts, documents are never changed after their initial insertion, so there is no possibility for database contention, and therefore the write throughput we can achieve is extremely high.

However, you may already detect a problem with the read workload as it has been explained up to this point. If the number of documents in the voteLog collection becomes bigger and bigger, the number of documents we must read through to get each user’s score and rank in the leaderboard becomes bigger and bigger. At some point, the number of documents in the log becomes prohibitively expensive to read through in both time and operations. There’s a better way:

Recommendation: Implement another process that reads through the records (the first job will have to read through all at once but subsequent ones won’t have to), and stores the aggregations as a set of incremental snapshots. Each snapshot stores the aggregate value at specific, incremental points in time. An example of the data in a snapshot would look like this:

{
"ref": Ref(Collection("snapshots"), "2"),
"ts": 1630107063800010,
"scores": [
{
"User": Ref(Collection("Users"), 1),
"totalVotes": 1163
},
{
"User": Ref(Collection("Users"), 2),
"totalVotes": 897
},
{
"User": Ref(Collection("Users"), 3),
"totalVotes": 1307
},
]
}

Any time a consistent, current view of each user’s total vote tally is required, it first queries for the latest snapshot, and then reads all voteLogs that have occurred since that snapshot was made. One suggestion is to schedule a cron job (or equivalent) to create new snapshots on a set interval

In this illustration, Snap1 and Snap2 store the total scores for each user, at the time the snapshot was created. To determine each player’s current vote tally and position in the leaderboard, the Leaderboard Generator simply loads the data stored in the last snapshot (in this case, Snap2 ) and replay the newest events that have been recorded since the snapshot was created. The Leaderboard Generator process will have each user’s score in memory, and then it needs to perform an in-memory sort to determine the current position / ranking of each player.

We recommend that snapshots be persisted in their own collection, separate from the voteLog collection.

Bottom line: Since the Workers don’t have to replay the entire log, getting the latest aggregate is much faster.

Summary

This solution comprises 3 separate functions that are responsible for executing their own, very specific database task:

  • One Function/Worker that inserts new logs and does nothing else
  • One Function/Worker that runs on a cron task and only generates snapshots, nothing more
  • One Function/Worker that gets the latest state by loading the latest snap and reading the newest logs.

Additional Benefits of this Pattern:

A great aspect of CQRS is that you can add new functionality without risk of introducing bugs to existing code. Consider the case that you want to add aggregate on a new field. Say, you want to track the rate at which votes are being cast. Rather than modify the code of your existing snapshot generator or leaderboard generator, you simply add additional functions/workers that perform this specific aggregation. TLDR: Don’t modify any existing functions. It’s safer to implement new aggregation functionality in new Function/Workers, as they won’t have an effect on any existing Functions/Workers currently running.

2 Likes

Thanks for the detailed responses you both. :nerd_face: Is there a way to run a scheduled cron task on Fauna natively or will I have to trigger it from outside like a scheduled Lambda trigger on AWS.

I am anyway planning to use Firebase Auth with Firebase functions or AWS API Gateway/Lambda for user auth. Will start work on that in January.

After thinking a bit more on this, implementing it purely on Fauna with scalability in mind can get a little complicated. It would be far more scalable and resilient if we use AWS API Gateway, Lambda and SQS.

The idea being that we can offload the counter transactions into a queue and it can get processed at a configurable rate without going into contention issues. We can still survive with single documents as given in the Fwitter example.

I think I will go with the Fwitter approach for now for the alpha stage of the application and do a rewrite with AWS services before the beta stage.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.