Querying and filtering best practice

Hello,
I just started using Fauna and looks awesome!
I’m struggling to understand when to use indexes and when not to use them for filtering.

For example, let’s assume we have a collection with hundreds of thousands of documents of clients with 3 fields, name, monthly_salary, and age.

How can I search for documents that monthly_salary is higher than 2000 and age higher than 30?

In SQL this would be straightforward but with Fauna, I´m struggling to understand the best approach because terms of Index only work with an exact match. I see in docs that I can use the Filter function but I would need to get all documents in advance so it looks a bit counterintuitive and not performant.

Below is an example of how I can achieve it, but not sure if it’s the best approach, especially if it contains a lot of records.

Map(
  Filter(
    Paginate(Documents(Collection('clients'))),
    Lambda(
      'client',
      And(
        GT(Select(['data', 'monthly_salary'], Get(Var('client'))), 2000),
        GT(Select(['data', 'age'], Get(Var('client'))), 30),
        )
      )
    ),
    Lambda(
      'filteredClients',
      Get(Var('filteredClients'))
      )
  
  )

  
  

Is this correct or I´m missing some fundamental concepts? can anyone help?

Thanks in advance,
Jorge

Querying like this in Fauna is very different from SQL, so let’s go through it bit by bit.

First, we should have a couple of indexes

CreateIndex({
  name: 'client_by_sal',
  // No terms
  values: [ { field: [ 'data', 'monthly_salary' ] }, { field: [ 'ref' ] } ]
})

CreateIndex({
  name: 'client_by_age',
  // No terms
  values: [ { field: [ 'data', 'age' ] }, { field: [ 'ref' ] } ]
})

We don’t need Terms on these indexes, but we do want some Values since that’s how indexes are sorted. Here, we’re conveniently sorting by salary/age.

Take a look at Fauna for SQL, it does list how to index with one condition

But that doesn’t quite help us with multiple conditions. In order to do this, we’ll have to make use of Range. We can do something like:

Paginate(
  Intersection(
    Range(Match(Index("client_by_sal")), [2000], []),
    Range(Match(Index("employee_by_age")), [30], [])
  )
)

but this will give us an empty set due to Intersection only comparing the first value. We actually need to intersect the refs, so first we need another index:

CreateIndex({
  name: 'client_by_ref',
  terms: [ { field: [ 'ref' ] }]
})

Now we can join them while intersecting

Paginate(
    Intersection(
      Join(
        Range(Match(Index('client_by_sal')), [2000], []),
        Lambda(['value', 'ref'], Match(Index('client_by_ref'), Var('ref')))
      )),
      Join(
        Range(Match(Index('client_by_age')), [20], []),
        Lambda(['value', 'ref'], Match(Index('client_by_ref'), Var('ref')))
      ))
    )
)

That should give you what you need.

Note: There’s a much longer and better explanation given here, it might be worth reading over the whole thing.

1 Like

Hi @perestrelo and welcome to the forums.

Excellent answer, @TriangularCube!

Equality and Inequality

You can generally differentiate the way indexes are used by considering if you want to search for equality or for inequality.

  • searching for equality (e.g. WHERE A = B):
    • use Index terms to match exactly
  • searching for inequality (e.g. WHERE A < B or WHERE A >= B, etc.):
    • use Index values and the Range function

Use Singleton to change the shape of a Set

You saw that using different values for different Indexes means that they cannot be used directly by Intersection.

@TriangularCube showed the use of an Index takes a ref and returns a ref. This does work, but it can be simplified a bit with the use of Singleton. Doing so is recommended because it will reduce Read Ops and perform better. Writes and storage may also be cheaper without the additional index.

Paginate(
    Intersection(
      Join(
        Range(Match(Index('client_by_sal')), [2000], []),
        Lambda(['value', 'ref'], Singleton(Var('ref')))
      )),
      Join(
        Range(Match(Index('client_by_age')), [20], []),
        Lambda(['value', 'ref'], Singleton(Var('ref')))
      ))
    )
)
1 Like

Thanks a lot for both of the answers,
But isn’t that a lot of boilerplate just to do something as simple as SELECT from clients where salary > 2000 and age >30 ?

I don’t wanna sound disrespectful and I understand the benefit of using Fauna, special in a sense of the JAMstack world, and that’s really what I´m looking for, but isn’t a lot of work for a simple query like this?

I just gave a simple example above, but what I´m building is something that allows the user to create tables and fields from the frontend, so I need the structure of the database to be done dinamicly. And after that, he would be able to construct filters dynamically as well from the frontend which in turn would create the dynamic queries on the database.

According to your explanations, it would be extremely hard to do this with fauna, or am I wrong?

Thanks @ptpaterson @TriangularCube

This is a little difficult to answer since I don’t know exactly what you’re planning, but such is the price for not having rigidly defined schemas. And unlike Mongo which supports queries like SQL, FQL’s premise of being imperative means we have to explicitly define all this.

That’s not to say it would be that difficult, I mean you can dynamically create indexes for whatever fields the user creates, and you simply have to intersect them in the above manner.

1 Like