Social media/Fwitter user view checker extension. Index on document exists/Indexing empty documents. Speeding up queries which check if a relationship exists

Hello @Hunter_K!

Definitely! Difference is great for small, bounded Sets. It has to fully read each of the input Sets to execute, so indeed it can be problematic for Sets that will keep growing as you scale. So let’s dig in to other ways we might accomplish your search.

First, one more quick mention that the Fwitter example suffers from a scalability problem that I want to discuss, because I think reviewing that will also help to add perspective to your question. The issue is with how it manages stats on the Fweet: each action a user makes updates the fweet and fweetstats documents. This is fine when there are only an occasional impression on a single fweet. However, once you have spikes of even a handful of requests trying to modify the same document, you will run into aborted transactions due to contention.

The typical solution is not easily covered at the same time as demonstrating all of the other sophisticated things you can do with Indexes in Fauna, which the fwitter example focuses on. I am referring to the “Event Sourcing Pattern” which we describe in detail here:

The TLDR; is that you cannot frequently write to a single document from multiple source. Instead, the various sources should write to a kind of log Collection, while a background process reads the logs and acts as a single source of writing the aggregated values.

So what does this have to do with filtering tweets already seen by a user? I have a few thoughts:

  • Doing pure Index operations (bindings plus things like Difference) on unbounded data is not possible in a scalable, efficient way. It is better to narrow your search as much as possible and then do some simple/fast computation on those results.
  • You can use the event sourcing pattern to update a single document that caches a list of things to filter (already viewed, blocked users, etc.). Then you can use that list to filter results more efficiently than fetching the list with a indexes every time.
  • Use some cheap monotonic (e.g. an always increasing number) metric that you can use that you know means the user has not seen it, or least makes it more likely. For example, if you only want to show fweets with a rank higher than the most recent fweet in the feed, you know that you can pre-filter the new fweets (and those in the deny/filter-list) by rank first to narrow down the list of possible ones to show.

That’s not a concrete solution for you, but I hope it helps guide you in the right direction!


See also: