Paul’s advice is very good. This pattern will lead to contention when the volume of write increases. In cases where our users need to perform aggregations or counts on high volume workloads, we recommend use of the following a schema design pattern that fully avoids write contention. This schema design is applicable to all databases, and is based on two proven design patterns, called Event Sourcing and Command Query Responsibility Segregation, developed for use in high performance banking applications.
This schema design:
- Fully avoids contention caused by rapid writes to a single record.
- Is a “write-once, read-many” pattern that avoids updates to indexes (and in Fauna’s case, unbounded history growth).
Rather than incrementing a User’s “votes” field repeatedly (causing contention), your application should instead individual vote events as a single document in a separate collection, which we’ll call the voteLog collection. This avoids write contention as we’re only inserting new records all the time and nothing more. Here’s an example of what such a document would look like:
{
ref: Ref(Collection("voteLog"), 1 ),
ts: 1630107062470000,
data: {
user: Ref(Collection("Users"), 1 ),
votes: 5
}
}
In this example, this individual document records the votes earned by a User at a particular moment. In this document, the User has scored 5 votes. If in the next moment the user earns another 7 votes, we’d record those votes in a new document, where the votes field has a value of 7. Each document inserted into the database records the number of votes a user earns in each “round of play” in the contest. These documents can be thought of as single entries in an event log. We can determine the total votes the user has earned any time we wish by simply reading through all the documents we’ve inserted into the voteLog, and sum the votes we see in each document. Consider this stylized representation of the User’s points we’ve recorded in the voteLog collection.
In this illustration, a “scorekeeper” process is responsible for recording the User1’s score to the voteLog collection as a series of documents, each of which record the votes earned at each round of play. User1’s total score can be calculated at any time during the game by reading through the log and summing the points earned in each document. In this example, User1 has earned a total of 26 points.
If we are running a multiplayer contest, the votes of many users may be interleaved together in the voteLog collection. We can determine the current vote count for all user by simply reading through the log and summing the points and grouping them by user.
In this example, User1 has earned 9 total votes, User2 has earned 15 total votes, and User3 has earned 19 votes. In this example, we can determine what any user’s score is at any time we wish by simply reading through the log and summing points.
The data recorded in the voteLog collection contains all the data needed to present a leaderboard to users and spectators. All that is needed to determine the position on the leaderboard is to simply sort the users by the scores we calculated in the previous step.
This example illustrates the “write-once, read-many” strategy we are using here. The only type of writes we do in the database are inserts, documents are never changed after their initial insertion, so there is no possibility for database contention, and therefore the write throughput we can achieve is extremely high.
However, you may already detect a problem with the read workload as it has been explained up to this point. If the number of documents in the voteLog collection becomes bigger and bigger, the number of documents we must read through to get each user’s score and rank in the leaderboard becomes bigger and bigger. At some point, the number of documents in the log becomes prohibitively expensive to read through in both time and operations. There’s a better way:
Recommendation: Implement another process that reads through the records (the first job will have to read through all at once but subsequent ones won’t have to), and stores the aggregations as a set of incremental snapshots. Each snapshot stores the aggregate value at specific, incremental points in time. An example of the data in a snapshot would look like this:
{
"ref": Ref(Collection("snapshots"), "2"),
"ts": 1630107063800010,
"scores": [
{
"User": Ref(Collection("Users"), 1),
"totalVotes": 1163
},
{
"User": Ref(Collection("Users"), 2),
"totalVotes": 897
},
{
"User": Ref(Collection("Users"), 3),
"totalVotes": 1307
},
]
}
Any time a consistent, current view of each user’s total vote tally is required, it first queries for the latest snapshot, and then reads all voteLogs that have occurred since that snapshot was made. One suggestion is to schedule a cron job (or equivalent) to create new snapshots on a set interval
In this illustration, Snap1 and Snap2 store the total scores for each user, at the time the snapshot was created. To determine each player’s current vote tally and position in the leaderboard, the Leaderboard Generator simply loads the data stored in the last snapshot (in this case, Snap2 ) and replay the newest events that have been recorded since the snapshot was created. The Leaderboard Generator process will have each user’s score in memory, and then it needs to perform an in-memory sort to determine the current position / ranking of each player.
We recommend that snapshots be persisted in their own collection, separate from the voteLog collection.
Bottom line: Since the Workers don’t have to replay the entire log, getting the latest aggregate is much faster.
Summary
This solution comprises 3 separate functions that are responsible for executing their own, very specific database task:
- One Function/Worker that inserts new logs and does nothing else
- One Function/Worker that runs on a cron task and only generates snapshots, nothing more
- One Function/Worker that gets the latest state by loading the latest snap and reading the newest logs.
Additional Benefits of this Pattern:
A great aspect of CQRS is that you can add new functionality without risk of introducing bugs to existing code. Consider the case that you want to add aggregate on a new field. Say, you want to track the rate at which votes are being cast. Rather than modify the code of your existing snapshot generator or leaderboard generator, you simply add additional functions/workers that perform this specific aggregation. TLDR: Don’t modify any existing functions. It’s safer to implement new aggregation functionality in new Function/Workers, as they won’t have an effect on any existing Functions/Workers currently running.