Need an advice for handle huge amount of data

Hi guys!!

I’ve been working on a project for a while, and the amount of data is very large, I have been testing fauna for a while and using it in small collections it was manageable, but with a collection of 30k or more documents how to handle a full text search or filtering and sorting without affecting performance or operational costs.

Any advice??

Hi @adinjesuha! Are you still working on the same project, and have you worked anything out?

I am curious what the shape of your documents are and what kind of search/filtering/sorting you are attempting.

Fauna Indexes, especially when considering bindings, are very powerful and flexible for crafting efficient queries. Admittedly, it can get complex very quickly, particularly when you start to combine them.

Hi @ptpaterson, yes, I’m actually working on the same project, and I have not advanced according to the plan, since my development skills are mostly front-end side and this is my first full-stack client.

For filter/sorting or making pagination combining libraries like SWR for fetch and cache data, my indexes works like the breeze, and thanks to the forums most of my doubts are cleared.

For full-text search, my approach was to make my index binding to the value I want to search, like this:

First the function for create the wordparts:

function WordPartGenerator(Word) {
  return q.Let(
    {
      original: Word,
      lengths: [1, 2, 3, 4, 5, 6, ...],
      lengthsFiltered: q.Filter(
        q.Var('lengths'),
        q.Lambda(
          'l', 
          q.LTE(q.Var('l'), q.Length(q.Var('original')))
        )
      )
    },
    q.Map(
      q.Var('lengthsFiltered'), 
      q.Lambda(
        'l', 
        q.SubString(q.LowerCase(q.Var('original')), 0, q.Var('l'))
      )
    )
  )
}

Then used in the index binding:

q.CreateIndex({
      name: 'ejemplar_by_wordparts',
      source: [
        {
          collection: q.Collection("Ejemplares"),
          fields: {
            wordparts: q.Query(
              q.Lambda(
                'ejemplar', 
                WordPartGenerator(q.ToString(q.Select(["data", "ejemplar"], q.Var("ejemplar"))))
              )
            )
          }
        }
      ],
      terms: [{ binding: 'wordparts' }],
      values: [
        { field: ["data", "ejemplar"]},
        { field: ["data", "nombre"]},
        { field: ["data", "especie"]},
        { field: ["data", "raza"]},
        { field: ["data", "sexo"]},
        { field: ["data", "encaste"]},
        { field: ["data", "padre"]},
        { field: ["data", "madre"]},
        { field: ["data", "propietari"]},
        { field: ["ref"]}
      ],
      serialized: false
    })

Most of my indexes return data values, to try to avoid using get or map, since the project has more than 30k data to which it is necessary to be consulting in a full text search.

How about this approach, can it be made more optimized?

Very cool! I think it depends on how you want the search to work. You’ve got it so that it can search for an exact match of 1 to n characters. If that’s what you want then I think you’ve got it!

Obviously, the fewer the entries you are creating the more performant the index will be, build time, storage, etc. So maybe you could consider only matching on exact match of 3 to n characters.

lengths: [3, 4, 5, 6, ...],

But it’s up to you how you want your application to work.

Including many values on your index is also good, as you have found, if you can almost always just rely on those values. More storage used to duplicate the data, but much easier on the Read Ops.

Great advice @ptpaterson

lengths: [3, 4, 5, 6, ...]

But some values in which the index bindign is, are more than 20 characters, if my array of characters is less then, does it leave them out of the index?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.