Add Source to existing index?

Good morning,

Upfront, I’m sorry if there is any obvious answer here that I failed to find.

The question: Given an existing index, is it possible to add more collections to it later?

What I found:
I know one could do a wildcard index, so by definition, this should work?

Target Scenario:
I have a collection two collections with users. I have an index to query against both collections. In the future, I have a new type of user and add a new collection. Is it now possible to amend the index to pull data from the new collection as well?

  • Using wildcards would be inefficient as there are a ton of other collections as well in this DB.
  • Deleting and re-creating would be resource-intensive and cause the index to be unavailable during the cache-timeout…

Any help on the subject would be appreciated!

Interesting question. I think we currently don’t have a way to dynamically specify a number of collections. It’s either all collections, a fixed set of collections or one collection.

If you specify the wildcard ‘_’, if I’m not mistaken that takes the collections that were present at the time you created the index. This also makes sense since if a collection gets added later on, that means the index has to bulk update to take into account all documents from the new collections=. Indexes are consistent from the moment they are online. Adding a collection would suddenly require them to ingest a big set of data which would not be feasible while remaining consistent.

I can see the appeal though if you use multiple collections for … multi-tenancy I assume?

1 Like

Hi Julien, I’d like to understand more about the new user type you are introducing into your schema. Given that Fauna has a flexible schema, you could place multiple types of user in the same collection, and not have to declare or alter your existing indexes. As I understand your use case, you already have two user types, which happen to be in separate collections. If you consider each of these user types to be related to one another, and if they share at lease one common field, you could consider these user types as related sub-types of a parent user object. If you put all of your user types into the same collection, you’d be able to future-proof your schema design, and eliminate the need to change your index declaration each time you need to introduce a new user into your taxonomy. Would this work for you?

1 Like

Hi Bryan!
Thank you so much for your insight on the subject! To clarify, I called it users in the example to make it slightly easier. To give you a better picture of the situation:

This powers our internal project management software. Each project has its own collection, and within each collection, we have many tasks and many sub-tasks. Tasks and sub-tasks are stored directly within the project (collection) and referenced by ref. Each task has a property of isOpen, which is a bool.

Now, what we wanted to do, was to be able to retrieve all the currently open tasks. The logical choice forward here was to create an index for each collection which would take isOpen as the term and then to run a join across all the indexes. However, this wasn’t ideal, as each time a new project is started, we have to create an index, and then amend the join.
Which is why being able to add a collection to an index would have been ideal :slight_smile:

Thanks for the clarification, I think Bryan’s advice is still valid, placing those projects in one collection would make it much easier for you to work with them. Maybe I’m still missing something, is there a specific reason why it was modelled as different collections? Maybe a limitation you tried to work around or a specific feature this brings?

1 Like

Hi databrecht,

To be completely honest with you I am not sure as I wasn’t the one who made that decision. If I recall correctly, this was done just for clarity on the back end to keep projects separated and isolated, and so that if something went wrong with one project, it wouldn’t affect the rest of the projects? I think, again, not completely sure here.

I definitely see the appeal to put everything into one collection, it would also drastically simplify the other indexes that each collection has to have for themselves. The only other reason I could possibly think of why they were in separate collections was so that the data wouldn’t accidentally be leaked, or that if a projected needed purging its collection would just be completed.

Isolation is often the reason why Fauna users model like that, but it definitely is a trade-off. Purging a project is something I didn’t think of. It would indeed be incredibly easy like that and is definitely much harder when all documents are in one collection (in that case I would have one project document, soft-delete that and then have an async process clean up all related project documents).

I personally would advise you to move to the other model in case you often have new projects and need to index over all projects. However, you currently have this model so here are a few things you could do:

  • Create an index per collection and write your query as a union of these different index results. It expect it’ll cost more reads though if I’m not mistaken.
  • For retrieval of data over multiple collections based on specific metadata of a project, have a separate collection with a unique document per project that contains metadata and refers to the specific collection that contains the actual project data (collections are documents, you can store a reference to a collection as well). Index over that, then retrieve the data of the actual collection with a map get.
  • That the index is unavailable can be worked around by creating a new index, wait for it to be online and once that index is online update your FQL code (e.g. your UDFs) to use the new index. This could be even done automatically with some advanced techniques of FQL composition by immediately updating the UDFs with logic based on the old index in case the new index does not exist yet / or is not online (you can check that with an if test) and starts using the new index once it is online. I don’t think this is a great approach long-term though, once you have many projects, index builds will take long and might add many writes to the bill.