Performance on large collections and indexes

I’m starting to work on a metrics system and wondering if Fauna would be a good fit or our use case.

Anyone here using Fauna to store large numbers of documents in a single collection?

At what point does index performance start to degrade? Millions, hundreds of millions?

Hi @pier,

More than the number of documents in a collection, what you want to be careful of is the TTL for the collection. Setting it for too long a period can result in long read times from the associated indexes, because the indexes need to account for all of that history. See the section on collections in the docs for more details.

Also, keep in mind that pagination maxes out at 100,000 documents per page. So if you have millions of documents in the collection, you’ll need to iterate through 10 or more pages to retrieve them all. Depending on the metrics being tracked there may be more efficient ways of doing that.

Can you share more details about your use case? What kind(s) of metrics are you tracking? How much history for those metrics is required?

Cory

1 Like

Hi @Cory_Fauna ,

Yes, I’m familiar with Fauna indexes.

I’d like to retain data for at least 30 days. The documents would be very lightweight and at any time the collection would not go above say 100M documents.

The idea would be to store each entry as a new document which would make it easier to index. Most likely I’d be using an index with values and use that for filtering and range queries.

Do you think Fauna would be good fit for this use case?

Hi @pier,

Fauna is a general purpose database, so I think it’s a good fit for just about anything. There are other databases that would be a better fit for certain niche cases (redis is a better choice for a caching layer with lots of heavy reads of small amounts of data that change frequently, for instance). But there’s nothing unusual about storing a large number of documents in a single collection and/or database.

So yes, I think Fauna would be a good fit for this. There are some things you’ll want to take advantage of to make things faster and more economical, though:

  • If you have any fields that are changed frequently those are a poor choice to use for terms on an index, as each update to those terms requires an update to the associated index(es). Although it sounds like you’ll just be Create()ing new documents, not updating existing ones. So this shouldn’t be much of a concern in this case.
  • Be sure to use minimal temporality on the collection. The more history that’s stored for an individual document, the longer it’ll take to read that document later, and the longer it takes to update any indexes referencing it (since their history has to be updated, as well). But again, this is more of a concern when documents are changed, not if you’re just creating new ones constantly.

Cory

1 Like

Thanks @Cory_Fauna .

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.