Hi everyone,
I am looking for advice to see if my architectural decisions could be improved.
I have an application with several customers. I want to give each customer an isolated child-database, due the benefits of true data isolation, better performance and less complex queries that child-databases provides vs having all the customers in the same DB.
Here’s a simplified version of my planned architecture:
In the users_collection under the MainDatabase I am planning to store each customer login credentials, current tier details and a reference to a child database (referenced via an admin key). So each child-database is tied to a user document in the users_collection. I need the users_collection in the MainDatabase in order to share the same UI layer to all the customers.
When an user gets logged in I would return the associated admin key instead of the authentication token retrieved by the Login function.
What’s the recommended way to handle the authentication in this scenario? Storing the child-databases admin keys along the users_collection data is the only way? is there a better way or a better strategy?
I just wanted to point out that in terms of performance, it might be neglectable. A collection scales and should not become slower when the amount of data in it increases. The other two reasons could be a good motivator. The data separation is indeed very clean but there are also limitations to this approach:
You need to deal with two secrets a token for the master user database and a key for the child database, that could also be perceived as an advantage in some scenarios.
You won’t have transactions/joins over multiple databases.
You can’t take advantage of Identity() in ABAC since you can’t transfer the Identity() of your user token to another database.
If any of these are a requirement, it might be easier to link your documents to a user reference and keep them in one database. You could also opt to put each ‘customer’ in different collections and programmatically determine what collection the query is going to address. A less clean separation but you could do transactions over these collections. If not, multiple databases would be a fine choice.
I can’t say there is one recommended way, each application is different. I did want to let you know that you do not necessarily have to store the actual key (not sure if that was the idea), you could perfectly store the database references that you could obtain with Database(“your database”) and get a key on the fly in your backend via an Admin key on the parent database and/or reclaim that key once you ‘log out’ that specific user via Delete. The disadvantage is that you need a powerful key in your backend, the advantage is that you don’t need to store keys again which are already stored by FaunaDB in a secure way. Storing them as plain text in a document is not ideal.
In terms of performance, I was referring to the extra computational work needed to check permissions on the fly and make sure each user is creating, writing and deleting only his own data. I am not sure how expensive this overhead is, but surely is better if is not present. Let me know if this shouldn’t be a concern.
Also keeping all the customers in same DB forces me to have:
A strong ABAC configuration (this could be hard to maintain as complexity grows)
A customerRef field on the documents to specify the owner of the data
Extra configuration on indexes as we need to specify the owner of the data here as well
Would be great to know how everyone else keeps this separation using a different strategy.
I’ve noticed that in my particular case I am using Identity() mostly to keep that data separation. So this won’t be an issue in my case (perhaps is win). However I’ll consider the other limitations you mention (it could be a future problem for the business goals).
Great insight, thanks. I was planning to store the admin key in the users collection, this recommendation is gonna help a lot.
Thanks for your detailed reply. I have now a better understanding of the trade-offs of this approach. I’ll be looking forward for the article.
This is absolutely true. You’ll have to be clever when you write your ABAC rules since you might end up writing a role that has a performance impact.
There is an alternative to the ABAC configuration solution though in case you’re afraid of writing a mistake or not sure about the performance implications. You could use User Defined Functions and use indexes to filter the data you need by user (which you could use Identity() for), since it’s encapsulated in a UDF the user can’t get around it, even when calling these UDFs directly from the frontend and the overhead should be minimal (since it is index backed). You are also right that you would have to define an index and have this logic for each data type which is indeed more work, it’s definitely a trade off.
Note, UDFs are not necessary of course if all queries pass by the backend and you control those queries.