Need an advice for handle huge amount of data

adinjesuha · April 20, 2021, 5:31pm

Hi guys!!

I’ve been working on a project for a while, and the amount of data is very large, I have been testing fauna for a while and using it in small collections it was manageable, but with a collection of 30k or more documents how to handle a full text search or filtering and sorting without affecting performance or operational costs.

Any advice??

ptpaterson · June 29, 2021, 7:05pm

Hi @adinjesuha! Are you still working on the same project, and have you worked anything out?

I am curious what the shape of your documents are and what kind of search/filtering/sorting you are attempting.

Fauna Indexes, especially when considering bindings, are very powerful and flexible for crafting efficient queries. Admittedly, it can get complex very quickly, particularly when you start to combine them.

adinjesuha · June 29, 2021, 8:03pm

Hi @ptpaterson, yes, I’m actually working on the same project, and I have not advanced according to the plan, since my development skills are mostly front-end side and this is my first full-stack client.

For filter/sorting or making pagination combining libraries like SWR for fetch and cache data, my indexes works like the breeze, and thanks to the forums most of my doubts are cleared.

For full-text search, my approach was to make my index binding to the value I want to search, like this:

First the function for create the wordparts:

function WordPartGenerator(Word) {
  return q.Let(
    {
      original: Word,
      lengths: [1, 2, 3, 4, 5, 6, ...],
      lengthsFiltered: q.Filter(
        q.Var('lengths'),
        q.Lambda(
          'l', 
          q.LTE(q.Var('l'), q.Length(q.Var('original')))
        )
      )
    },
    q.Map(
      q.Var('lengthsFiltered'), 
      q.Lambda(
        'l', 
        q.SubString(q.LowerCase(q.Var('original')), 0, q.Var('l'))
      )
    )
  )
}

Then used in the index binding:

q.CreateIndex({
      name: 'ejemplar_by_wordparts',
      source: [
        {
          collection: q.Collection("Ejemplares"),
          fields: {
            wordparts: q.Query(
              q.Lambda(
                'ejemplar', 
                WordPartGenerator(q.ToString(q.Select(["data", "ejemplar"], q.Var("ejemplar"))))
              )
            )
          }
        }
      ],
      terms: [{ binding: 'wordparts' }],
      values: [
        { field: ["data", "ejemplar"]},
        { field: ["data", "nombre"]},
        { field: ["data", "especie"]},
        { field: ["data", "raza"]},
        { field: ["data", "sexo"]},
        { field: ["data", "encaste"]},
        { field: ["data", "padre"]},
        { field: ["data", "madre"]},
        { field: ["data", "propietari"]},
        { field: ["ref"]}
      ],
      serialized: false
    })

Most of my indexes return data values, to try to avoid using get or map, since the project has more than 30k data to which it is necessary to be consulting in a full text search.

How about this approach, can it be made more optimized?

ptpaterson · June 29, 2021, 9:28pm

Very cool! I think it depends on how you want the search to work. You’ve got it so that it can search for an exact match of 1 to n characters. If that’s what you want then I think you’ve got it!

Obviously, the fewer the entries you are creating the more performant the index will be, build time, storage, etc. So maybe you could consider only matching on exact match of 3 to n characters.

lengths: [3, 4, 5, 6, ...],

But it’s up to you how you want your application to work.

Including many values on your index is also good, as you have found, if you can almost always just rely on those values. More storage used to duplicate the data, but much easier on the Read Ops.

adinjesuha · June 29, 2021, 9:55pm

Great advice @ptpaterson

lengths: [3, 4, 5, 6, ...]

But some values in which the index bindign is, are more than 20 characters, if my array of characters is less then, does it leave them out of the index?

ptpaterson · July 13, 2021, 9:56pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance on large collections and indexes Help best-practices , indexes	5	549	October 17, 2021
Trouble with Query Performance Optimization in FaunaDB Help	2	20	April 21, 2025
Data modeling for product catalog Help	5	705	May 4, 2021
Sort by specific value with a limit Help best-practices , javascript	3	348	October 29, 2021
How to optimise the fauna query Help fql , fauna-shell	5	103	June 20, 2024

Need an advice for handle huge amount of data

Related topics