I’m a big fan of Fauna. However, I’ve been not been fully comortable relying on it as an app’s primary database.
I’ve experienced rare but multiple instances where transactions have failed because of a service outage. A 15 minute outage can cause hard to fix problems, particularly where later events rely on the transactions that failed during the outage.
If I were to rely on Fauna as my primary database going forward, what would be the best thing to do in this cases like this (short of taking down my entire service)?
I already wrap write queries in a retry function to bypass socket hang up errors, potentially related to node-fetch. So, a solution I’ve been mulling over is to extend this so that transactions that have failed X times get sent somewhere (not Fauna) to be stored, so I can later insert them when Fauna is back up.
Hey Ewan. No just the ones that correlate with the outages on the status page. Last month’s outage caused a few problems so just thinking how I could minimise any side effects of similar events in the future.
You could fix those issues by not considering changes are made in real-time but are eventually made.
Basically, what you express isn’t fauna-specific, any online software has such issues, AWS has too.
What people usually do is to use an event queue instead of hitting services (api, etc.) directly. Everything goes to the queue and services consume the queue.
This way, in the case of an outage, the queue acts as source of truth for every service.
This is more a design (architectural) problem, it’s basically an issue that’s quite common with the serverless architecture.
I don’t have any experience with actual implementation of service queue.
I know AWS SQS is famous for this SQS Service de messagerie | Amazon Simple Queue Service (SQS), I’m not aware of anything related to FaunaDB itself, those are usually pretty agnostic. SQS is probably a good choice but you’ll need to dig in deeper and see for yourself if it fits your needs.