Using Fauna to build a notification feed (mass insert, multiple updates, order, limit...)

As part of research in moving part of our traffic to Fauna, I am looking into our notification feed.
Just to make sure we are on the same page, a notification feed is the screen in many apps that shows a list of top X events that happened, and shows a seen/not seen on each one of them.

Our current implementation using Postgres is a table like this one:

Screen Shot 2020-06-28 at 19.10.33

The notification_id column is a forgien key for a table containing the text and other info about the notification. So for example to display the notification feed for user_id 19456, we need to select from table where user_id=19456 and order by row_id desc limit 15, and join it with a notifications table by notification_id.

The notification feed supports this main features:

  • GetFeed(userId) - Gets the top 15 notifications for a specific user id. To start with, we support fetching only the last 15. Fetching more won’t be possible.

  • GetCountUnSeen(userId) - Gets the number of unseen notifications. For all the notifications,not just the top 15.

  • SetSeen(userId, timestamp) - Will set is_seen=true for all the rowsof the userId that are below timestamp.

  • FanoutNotification(notificationId, [userIds]) - will insert new rows of (notificationId, userId) for each user in the array. The userIds array can be large. For exmaple 400,000.

Can those operations be built with Fauna and perform well? The GetFeed() should be around 500ms since its cllaed from the UI. Regarding FanoutNotification() it could be slower. It’s no problem that it takes 10 min to fanout a notification.

Any help on the subject will be much appreciated.

Seems to me that this is a great use-cases for FaunaDB. Joins + OLTP style of fast updates + big (or slightly bigger) data.

Make an index with values to sort and terms to filter on the user id

Make sure you also put the notification ref in the aforementioned index and just Map() over the results of that index o and use Get() to get references of the notifications. You could also just put the ‘row_id’ which would be just the ‘ref’ of this document in FaunaDB in the index values and Get that to get the whole document and then Select() the notification reference and Get that (this will be slightly pricier), tip: use Let to structure your query, see examples here) for such joins, you can easily do complex joins in one query.

Index on the terms: ‘userId’ and ‘unseen’ and use Count() on it.

Complex transactions are also possible in FaunaDB, you might run into one thing and that is pagination. Imagine the amount of ‘seen’ is greater than 100000 (biggest page) you’ll have to check whether there is an after cursor and update consequent pages. I don’t consider this a limitation personally, it’s generally a good idea to not let users run gigantic update statements as this will might other transactions.

Just take care of the request size limit which is 1MB (see here, you might have to break up an array into a few requests.

@databrecht thanks for the detailed answer! Regarding SetSeen(), if a user has 100,000 rows that belongs to it in the notification_feed table (most of them are already seen=true) and we call SetSeen() won’t it incur 100,000 updates but all we need is the couple of last rows to be set to true?

Indexes are your friend (disclaimer, didn’t test the queries since I have to leave my desk in about 5 minutes :slight_smile: ), given that you have an index with two terms (user and seen) something in the vein of:

    Paginate(Match(Index('notifications_by_user_and_seen'), 'someuser', false),
    Lambda(['notificationRef'], Update(Var('notificationRef'),  .. do whatever you want .. )) 

You could also update the info that is linked to the transaction by first mapping over the notification refs, getting them and get the info and so on and updating whatever you want. It’s much like how you would do it in a regular programming language since FQL is procedural, not declarative. FQL is literally made for these kind of more complex transactions (hence why it is procedural). Forget about the idea that a scalable databases is because it’s scalable very limited in the way you query, FaunaDB’s FQL is quite powerful.

If you only want to get those between a certain range (e.g. the time the notification came in), you add the creation time to the values of your index and wrap Range around the match (values are for three things, range queries, sorting and the index’s result values).

              Match(Index('notifications_by_user_and_seen'), 'someuser', false),  
              Time(...), Time(...)))
    Lambda(['notificationRef'], Update(Var('notificationRef'),  .. do whatever you want .. )) 

Of course, I can’t write it for you :upside_down_face: so I’ll only explain it in pseudo FQL code for now (I might have missed a bracket here and there) . If you do get blocked on how to do something, let us know.

So you probably want to read up on these specifically:

  • Indexes
  • Match/Paginate
  • Range
  • Map/Lambda + Get (Let to structure)
  • Update

to get started :slight_smile:

1 Like

I’ll try it. thanks!

@databrecht I am having a hard time figuring out how to create the index for GetFeed(userId). I need an index that allows to get last 15 notifications for user order by their time of creating desc.
So “terms” should include the useId but how should I choose “values” so that I can sort by creation date desc. Should I add a new field to the document “createdAt”? (the existing ts property can chnage when updating a row and I would not want to sort on it).

Just for reference, this is how a document from the notifications_feed collection looks like:

  "ref": Ref(Collection("notifications_feed"), "270127554066645511"),
  "ts": 1593872560470000,
  "data": {
    "notificationId": Ref(Collection("notifications"), "112233"),
    "userId": Ref(Collection("users"), "3"),
    "isSeen": false,
    "isClicked": false,
    "isClickedExternal": false

Yes, if you want to order by creating time you have to add a createdAt that you set to Now() since ts is rather updatedTime.