Combining Queries with Union and partial matching Filter

Kimfucious · July 25, 2022, 12:42pm

Hi there,

I’ve got a couple of queries that I’d like to combine, but I’m confusing myself.

The below snippets are using the faunadb client syntax.

The following will return documents with a partial match on title.

“all_items” is an index that’s been setup on a collection to return all items.

q.Map(
    q.Filter(
        q.Paginate(q.Match(q.Index("all_items"))),
        q.Lambda((ref) =>
            q.ContainsStr(
                q.LowerCase(
                    q.Select(["data", "title"], q.Get(ref))
                ),
                searchTerm  // <= this is a variable passed as a string
            )
        )
    ),
    q.Lambda((ref) => q.Get(ref))
)

The below will return exact matches against two indexes:

q.Map(
    q.Paginate(
        q.Union(
            q.Match(q.Index("searchItemsByTag"), searchTerm),
            q.Match(q.Index("searchItemsByTitle"), searchTerm)
        )
    ),
    q.Lambda((ref) => q.Get(ref))
)

I’d like to use the above partial matching pattern within a Union so that the returned items are either a partial match of the search term on title or an exact match on the searchItemsByTag index, which is an array (actually, I’d like to do partial matching on the tags too, but one thing at a time):

q.Map(
    q.Paginate(
        q.Union(
            q.Match(q.Index("searchItemsByTag"), searchTerm),
           // Use above partial matching query here.  But how?!
        )
    ),
    q.Lambda((ref) => q.Get(ref))
)

I’m getting hung up on where to put Filter and so far, after many attempts, I’ve not been able to figure this out.

I’m hopeful that there’s a solution out there.

Thanks in advance for any helpful advice.

ptpaterson · July 30, 2022, 9:23pm

Hi @Kimfucious and welcome!

Paginate then Filter

The most simple thing would probably be to fetch all of the items by tag, and then filter them. This will fetch SIZE documents and then filter and return those that match. The Filter will only be run on at most SIZE documents, so you can easily predict performance and cost.

q.Filter(
    q.Map(
        q.Paginate(q.Match(q.Index("searchItemsByTag"), TAG), { size: SIZE }),
        ref => Get(ref)
    ),
    doc => q.ContainsStr(
        // I recommend Casefold on both the stored data and your search term
        q.Casefold(q.Select(["data", "title"], doc)),
        q.Casefold(PARTIAL_TITLE)
    )
)

Filter then Paginate

You can put the Filter inside of Paginate, but in this case, Filter will continue to run against as many documents in the Set as necessary to fulfill SIZE, potentially the entire Set. You get more predictable number of results, but less predictable performance and cost. In the worst case, a query with just a couple of results could cost as many read ops as the total documents in the collection. But this may be fine in the case that the Set you are filtering is small (in this case, the Set is q.Match(q.Index("searchItemsByTag"), TAG)).

q.Map(
   q.Paginate(
       q.Filter(
           q.Match(q.Index("searchItemsByTag"), TAG),
           ref => q.ContainsStr(
               q.Casefold(q.Select(["data", "title"], Get(ref))),
               q.Casefold(PARTIAL_TITLE)
           )
       ),
       { size: SIZE }
   ),
   ref => Get(ref)
)

Use Index bindings

Here’s a previous topic that I hope can help: Search for substring ( need performant approach)

Kimfucious · July 31, 2022, 9:05am

Thanks, @ptpaterson, for that detailed and thoughtful reply.

It seems that my initial approach touches upon the issue of potentially excessive operations, in that performing a search across a collection with a lot documents results in a “scan” of the entire collection.

And it seems that this would be the case even without using union to combine searches.

For example, if I only wanted to do a partial search on title, it follows that every document would need to be read to see whether or not there is a title match.

The thing is that this is pretty much what I want: to search the entire collection.

Let me setup a scenario:

The collection is Recipes with the following fields: id, title, cuisines, tags.
id and title are strings, while cuisines and tags are arrays of strings.

I can’t imagine there ever being more that a few thousand documents in Recipes.

As a user of the app, I’d like to enter a search term and find all recipes that contain the search term in the title or the cuisines array or the tags array. No fuzzy logic, just partial matching.

So if I enter “pot”, for example, I could get a recipes that match the title, like “Potato Galette” or “Pot Brownies” or ones that match the tags like “Instant Pot.”

That’s the goal, anyhow.

So, if I understand correctly, a single (non Union) search on an index of “searchRecipesByTitle”, when the collection has 5000 documents, would result in 5000 reads?

Further, if I did a Union that also included searching searchRecipesByCuisine and searchRecipesByTag, that would result in 15,000 reads?

Or do I have that all wrong?

As a user, I don’t want to enter more that one search term, so reducing by tag first doesn’t really work in this scenario, I think.

ptpaterson · July 31, 2022, 11:13pm

Further, if I did a Union that also included searching searchRecipesByCuisine and searchRecipesByTag, that would result in 15,000 reads?

A Union of three sets with 5000 entries each would indeed be a new set with 15,000 entries.

But the expectation of using an Index is that it only returns what matches. If there are zero matches, for example, then you pay the cost of reading the index, but have nothing to iterate over so that’s the only cost. Best case, reading a Union of three sets where none have any entries costs 3 read ops, regardless of how many documents are in the indexes’ source collections.

If you cannot use an index, you can stick to just scanning the collection once, and combine the predicates in Filter.

q.Filter(
    q.Map(
        q.Paginate(q.Documents(q.Collection("Recipes")), { size: SIZE }),
        ref => Get(ref)
    ),
    doc => q.Or(
        q.ContainsStr(
            q.Casefold(q.Select(["data", "tag"], doc)),
            q.Casefold(PARTIAL_TAG)
        ),
        q.ContainsStr(
            q.Casefold(q.Select(["data", "title"], doc)),
            q.Casefold(PARTIAL_TITLE)
        ),
        q.ContainsStr(
            q.Casefold(q.Select(["data", "cuisine"], doc)),
            q.Casefold(PARTIAL_CUISINE)
        ),
    )
)

Regarding indexes with bindings, consider if whole word search is okay. Or perhaps your app can try a more restrictive Index, and fall back on the full scan.

Try exact match (indexes, no bindings)
if not enough matches, try whole word search (indexes with bindings)
if not enough matches, start paginating through the whole collection and filtering

You might also optimize the filtering by using an Index that provides the search fields as values. This will avoid the cost of reading each document as you filter through them.

CreateIndex({
  name: "all_recipes__tag_asc_title_asc_cuisine_asc",
  source: Collection("Recipes"),
  values: [
    { field: ["data", "tag"] },
    { field: ["data", "title"] },
    { field: ["data", "cuisine"] },
    { field: "ref" }
  ]
})

q.Map(
    q.Filter(
        q.Paginate(q.Match(q.Index("all_recipes__tag_asc_title_asc_cuisine_asc")), { size: SIZE }),
        (tag, title, cuisine, ref) => q.Or(
            q.ContainsStr(
                q.Casefold(tag),
                q.Casefold(PARTIAL_TAG)
            ),
            q.ContainsStr(
                q.Casefold(title),
                q.Casefold(PARTIAL_TITLE)
            ),
            q.ContainsStr(
                q.Casefold(cuisine),
                q.Casefold(PARTIAL_CUISINE)
            ),
       )
    ),
    (tag, title, cuisine, ref) => q.Get(ref)
)

Note in the last example, the Get call is after Filter.

As a user, I don’t want to enter more that one search term, so reducing by tag first doesn’t really work in this scenario, I think.

Right! The example I gave more for something like tag = "pot" AND title LIKE "something else". The examples above should do a better job of something like

tag LIKE "pot" OR title LIKE "pot" OR cuisine LIKE "pot"

Kimfucious · August 1, 2022, 12:28pm

Thanks again for another excellent reply, @ptpaterson. Your taking the time to respond with such detail is much appreciated.

Using your first example, I got the below working, querying a collection index, all_recipes, rather than a collection.

As cuisines and tags are arrays, I’ve used ContainsValue, which didn’t like Casefold, so I just used str.toLowerCase() on str to get lc.

q.Filter(
    q.Map(
        q.Paginate(q.Match(q.Index("all_recipes")), { size: SIZE }),
        ref => q.Get(ref)
    ),
    doc => q.Or(
        q.ContainsStr(
            q.Casefold(q.Select(["data", "title"], doc)),
            q.Casefold(str)
        ),
        q.ContainsValue(
            lc,
            q.Select(["data", "tags"], doc),
        ),
        q.ContainsValue(
            lc,
            q.Select(["data", "cuisines"], doc),
        ),
    )
)

While the above works, I’m still a bit thick as to whether or not using a collection index is any more performant/economical than querying the collection. My guess is that it’s not, but I’d like your thought on that, if you’d be so kind.

That said, I like the second example you gave, with a new index that uses the search fields (and the ref) as values.

What I see this does is create a document for the various combinations of values based on existing data in the collection.

What I can’t see is if/when the index gets rebuilt when new data is added or existing data is updated. The docs seem to indicate that indexes do not get rebuilt. This means that I’d need to delete and re-create the index on any new cuisine, tag, or recipe and any related mutations (e.g. adding/removing a new tag to/from a recipe). This seems infeasible, .esp when the app is built around the idea of creating recipes. Any explanation or examples of how people deal with this would be much appreciated.

q.Map(
    q.Filter(
        q.Paginate(
            ( q.Match(q.Index("searchRecipesByCuisineTagTitle")) ),
            { size: SIZE }
        ),
        (cuisine, tag, title, ref) => {
            console.log(
                "cuisine, tag, title",
                cuisine,
                tag,
                title
            );
            return q.Or(
                q.ContainsStr(
                    q.Casefold(title),
                    q.Casefold(str)
                ),
                q.ContainsStr(q.Casefold(tag), q.Casefold(str)),
                q.ContainsStr(
                    q.Casefold(cuisine),
                    q.Casefold(str)
                )
            );
        }
    ),
    (tag, title, cuisine, ref) => q.Get(ref)
)

Finally, ignoring feasibility, while the above works and seems “snappier”, it does result in dupes. For example if title contains “pot” and tag contains “pot”.

I have tried placing Distinct in various spots in the query, but I can’t seem the right spot.

ptpaterson · August 1, 2022, 3:12pm

Indexes are updated transactionally along with their source collection(s). If a document is successfully written to, any affected index is guaranteed to be written.

Please let us know if there is something particular in the docs that gives the wrong impression, so we can review and update as necessary.

Collections are only accessible through an Index. When you use Documents(Collection("some_collection")) you are using a built-in Index. There is no difference between reading the built-in index, and reading your own “all_” index. Note that creating your own uses storage and adds an additional index that must be written to every time a document is created or deleted.

Oh, ok. That does complicate things. If you set tags and cuisines to be the index’s values, then you will get an entry for each item in the array. In fact, if you include both values, you will get a cartesian product of both arrays. This can get quickly out of hand, so be mindful that writing to this index can take more time and cost more if the arrays contain many values.

Consider the following example document:

{
  ref: Ref(Collection("things"), "101"),
  ts: 163405928345123,
  data: {
    a: [0, 1, 2, 3, 4],
    b: [5, 6, 7, 8, 9]
  }
}

With an index like

CreateIndex({
  name: "array_values_example",
  source: Collection("things"),
  values: [
    { field: ["data", "a"] },
    { field: ["data", "b"] },
    { field: "ref" }
  ]
})

Each value field is an array with 5 items, so there will be 5 * 5 = 25 entries for that one document!

Paginate(Match(Index("array_values_example")))

// output
{
  "data": [
    [0, 5, Ref(Collection("things"), "101")],
    [0, 6, Ref(Collection("things"), "101")],
    [0, 7, Ref(Collection("things"), "101")],
    [0, 8, Ref(Collection("things"), "101")],
    [0, 9, Ref(Collection("things"), "101")],
    [1, 5, Ref(Collection("things"), "101")],
    [1, 6, Ref(Collection("things"), "101")],
    [1, 7, Ref(Collection("things"), "101")],
    [1, 8, Ref(Collection("things"), "101")],
    [1, 9, Ref(Collection("things"), "101")],
    [2, 5, Ref(Collection("things"), "101")],
    [2, 6, Ref(Collection("things"), "101")],
    [2, 7, Ref(Collection("things"), "101")],
    [2, 8, Ref(Collection("things"), "101")],
    [2, 9, Ref(Collection("things"), "101")],
    [3, 5, Ref(Collection("things"), "101")],
    [3, 6, Ref(Collection("things"), "101")],
    [3, 7, Ref(Collection("things"), "101")],
    [3, 8, Ref(Collection("things"), "101")],
    [3, 9, Ref(Collection("things"), "101")],
    [4, 5, Ref(Collection("things"), "101")],
    [4, 6, Ref(Collection("things"), "101")],
    [4, 7, Ref(Collection("things"), "101")],
    [4, 8, Ref(Collection("things"), "101")],
    [4, 9, Ref(Collection("things"), "101")],
  ]
}

In order to get the one, distinct Ref from the results, you need to get an array or a set that is just the Ref, otherwise the other values make each entry distinct, even through the Ref is the same for each one.

q.Map(
  q.Distinct(
        // same query as before, but don't Get the document, yet
        q.Map(
            q.Filter(
                q.Paginate(q.Match(q.Index("all_recipes__tag_asc_title_asc_cuisine_asc")), { size: SIZE }),
                (tag, title, cuisine, ref) => q.Or(
                    q.ContainsStr(
                        q.Casefold(tag),
                        q.Casefold(PARTIAL_TAG)
                    ),
                    q.ContainsStr(
                        q.Casefold(title),
                        q.Casefold(PARTIAL_TITLE)
                    ),
                    q.ContainsStr(
                        q.Casefold(cuisine),
                        q.Casefold(PARTIAL_CUISINE)
                    ),
               )
            ),
            (tag, title, cuisine, ref) => ref // don't Get yet
        )
    ),
    ref => Get(ref)
)

side note

Kimfucious:

        (cuisine, tag, title, ref) => {
            console.log(
                "cuisine, tag, title",
                cuisine,
                tag,
                title
            );

You cannot console.log in FQL. Using the Javascript arrow functions is a convenience for using the Lambda FQL function. Fauna does not actually execute javascript code.

Kimfucious · August 2, 2022, 5:51pm

Hi @ptpaterson,

Thanks for another awesome reply.

I went to try some things out and my Fauna db is gone!

It seems like you folks are having a major system event, so I’ll try soonest and report when I get to it.

Best,

Kim

Kimfucious · August 2, 2022, 7:14pm

Regarding the documentation on indexes mentioned above, theres a section titled Modifying Index Documents here.

Being new to Fauna, and reading way too many documents, I totally misread that. Now I understand that while index properties can’t be modified, except for the name, the index does get updated as the data in the collection is modified.

On the occasions where the values do need to get updated, deleting and re-creating the index is trivial compared what I was imagining.

ptpaterson · August 2, 2022, 7:53pm

Thank you for following up and clarifying!

Since schema names can be modified, you can build a new index with a temporary name, and then swap the names so your application can start using the new one sooner. Due to our caching mechanism, you still need to wait about a minute to reuse a name. But consider if you have gigabytes of data, or complex bindings to evaluate, it could take hours to build a new index.

Kimfucious · August 5, 2022, 2:05pm

Hi @ptpaterson,

It’s taking me a while to get work to this to finality, as I’m trying to figure out how I can add sorting and another bugbear (trying to find a match where any of an array of search terms is contained in and index value field. The docs have me looking back at Union, but…), so stay tuned.

Or, if you prefer, we can close this, and I can back to you on a new thread.

ptpaterson · August 5, 2022, 3:03pm

If you are still trying to filter by partial tag & title etc., then I think here would be a good place since the partial string matches are what continue to make this so complicated.

We have a cookbook example for matching an index with a list of terms: List of terms :: Fauna Documentation

If you have questions on the technique in the cookbook that are not related to the previous discussion, a new ticket would be appropriate. But just use your best judgment

Kimfucious · August 6, 2022, 3:02pm

Thanks, @ptpaterson.

This is following the same pattern that we’ve discussed above, so I’m keeping it here for now.

In brief, I’ve created the following index:

{
  name: "recipes_by_chef_id",
  unique: false,
  serialized: true,
  source: "Recipe",
  values: [
    {
      field: ["data", "chefId"]
    },
    {
      field: ["ref"]
    }
  ]
}

This works great for the following query (and I probably don’t need Distinct here):

Get Recipes Query

q.Map(
    q.Distinct(
        q.Map(
            q.Filter(
                q.Paginate(
                    q.Match(q.Index("recipes_by_chef_id")),
                    options
                ),
                (chefId, ref) =>
                    q.Or(
                        q.Equals(chefId, myChefId),
                        q.ContainsValue(chefId, myChefNetwork)
                    )
            ),
            (chefId, ref) => ref
        )
    ),
    (ref) => q.Get(ref)
)

Taking this a step further, I’ve created an index specifically for search:

{
  name: "search_recipes",
  unique: false,
  serialized: true,
  source: "Recipe",
  values: [
    {
      field: ["data", "title"]
    },
    {
      field: ["data", "chefId"]
    },
    {
      field: ["data", "cuisines"]
    },
    {
      field: ["data", "tags"]
    },
    {
      field: ["ref"]
    }
  ]
}

Amazingly my head did not explode with the Ands and Ors to get this working:

Search Recipes Query

q.Map(
    q.Distinct(
        q.Map(
            q.Filter(
                q.Paginate(
                    q.Match(q.Index("search_recipes")),
                    options // <- {size: 5, after}
                ),
                (title, cuisines, tags, chefId, ref) =>
                    q.Or(
                        q.And(
                            q.Or(
                                q.Equals(chefId, myChefId),
                                q.ContainsValue(
                                    chefId,
                                    myChefNetwork
                                )
                            ),
                            q.Or(
                                q.ContainsStr(
                                    q.Casefold(title),
                                    q.Casefold(searchString)
                                )
                            )
                        ),
                        q.And(
                            q.Or(
                                q.Equals(chefId, myChefId),
                                q.ContainsValue(
                                    chefId,
                                    myChefNetwork
                                )
                            ),
                            q.Or(
                                q.ContainsStr(
                                    q.Casefold(cuisines),
                                    q.Casefold(searchString)
                                )
                            )
                        ),
                        q.And(
                            q.Or(
                                q.Equals(chefId, myChefId),
                                q.ContainsValue(
                                    chefId,
                                    myChefNetwork
                                )
                            ),
                            q.Or(
                                q.ContainsStr(
                                    q.Casefold(tags),
                                    q.Casefold(searchString)
                                )
                            )
                        )
                    )
            ),
            (title, cuisines, tags, chefId, ref) => ref // <- order matters here!
        )
    ),
    (ref) => q.Get(ref)
)

As I understand from the docs, the results will be sorted based on the values provided in the index, so the data should be sorted title ascending plus the rest of the values in order of their place in the index.

Now here’s where I’m stuck. If you notice, I’m passing an options object, which has the page size and the after cursor that’s been converted to an expression using new Expr(). This works as expected for the Get Recipes query. Things paginate, and I can load next/previous with the after/before cursors.

However, when I run the Search Recipes query, the results are less than expected.

I know this is related to page size, as I can hardcode that to 100 and see the desired 10 results using specific search criteria; however, if I leave this at 5, for example, I get 2 results for the same search criteria, not 5.

I have scoured the docs and the mentioned cookbooks, but couldn’t find anything that helped with this specific scenario, but I am still a noob with Fauna so… I’m back here.

Any advice would be greatly appreciated.

Kimfucious · August 6, 2022, 5:04pm

I think I understand what’s going on: 2 of the items of the page (with a size of 5) match the search criteria. The other 3 don’t because of how the index is built with values.

This is tricky…

Kimfucious · September 12, 2022, 3:14pm

Hi @ptpaterson,

I’m looking for a bit of advice here, as I still can’t get pagination to work as I want it to for certain queries. I’m writing this here, as it’s related to the original content of this thread.

Here’s what I’m currently working with:

Map(
    Filter(
        Paginate(
            Match(Index("recipes_by_chef_id_is_published")),
            options // <= {size: 10, after}
        ),
        (chefId, isPublished, ref) =>
            And(
                Equals(chefId, searchChefId),
                Equals(ToString(isPublished), "true")
            )
    ),
    (chefId, isPublished, ref) => Get(ref)
)

In some scenarios, I’m using the resultant data in an “infinite scroll” or a list where there’s a “Show More” button, if there is an “after”, and this works well when size is set to a high value; however, when size is set to a low value, like 10 here, the query results can be less than 10, because some of the docs returned do not match the conditionals (searchChefId, isPublished).

At least, I believe that’s what’s happening. Regardless, this makes the UI a little wonky. For example, 10 is a good size for some pages. I’m actually doing dynamic sizes based on screen height, but that’s beside the point. For now, let’s just say that 10 is the right amount of recipes to fill out the user’s view nicely. On infinite scroll pages, the observed element that triggers the next query is below the fold.

When the query returns only 3 items; however, those three recipes are displayed, and then the infinite scroll is triggered (as the observable is above the fold). On pages, where the “Show More” button is used, there’s just three recipes shown with the “Show More” button and then a bunch of blank space.

I thought about doing some recursive logic to make the data result consistent (i.e. keep querying while there’s an after, until the size is met), but I thought I’d ask an expert, who knows his stuff, before I do anything stupid.

As always, your advice is appreciated!

ewan · September 12, 2022, 4:29pm

You are using Filter on the results from Paginate, so there can be 0 or more results that match the criteria.

If you change the nesting to Paginate the results of Filter, then you should get full pages until the end of the set of matching documents is reached. However, calling Filter might exceed the transaction limits if you have too many matching documents.

It would be better to construct an index that met the Filter conditions.

Kimfucious · September 12, 2022, 5:07pm

Thanks for that insightful reply, @ewan.

With your helpful suggestion, this works :

Map(
    Paginate(
        Filter(
            Match(Index("recipes_by_chef_id_is_published")),
            (chefId, isPublished, ref) =>
                And(
                    Equals(chefId, searchChefId),
                    Equals(ToString(isPublished), "true")
                )
        ),
        options
    ),
    (chefId, isPublished, ref) => Get(ref)
)

I’m not sure what you mean by exceeding transaction limits. Do you mean that I could get a 429 (i.e. too many requests error), because of multiple queries in succession as it tries to get to 10? Would you be so kind to elaborate?

I’m not sure the above makes sense in this scenario, it seems like I’d need to create an index for each user (is that what you meant?).

That said, I would love to be able to create indexes, for other scenarios, that were “pre-populated” with data matching certain conditions. For example, an “all_published_recipes” index, that populates with any recipe (from the Recipe collection) where the value of data:isPublished is true.

I was scouring the docs for such a thing, and I came up short, so I didn’t know this was possible. I can’t seem to find where you can add conditional logic to an index. Can you please point me somewhere in the docs where I can read up on this?

Thanks again. Your help is most appreciated!

ewan · September 12, 2022, 6:08pm

I’m not sure what you mean by exceeding transaction limits. Do you mean that I could get a 429 (i.e. too many requests error), because of multiple queries in succession as it tries to get to 10? Would you be so kind to elaborate?

The transaction limits are documented.

If you have millions of index entries, Filter would have to evaluate each of them. Doing so could exceed the available memory or time limit for the transaction, and an HTTP 429 error would occur and you would get no results.

I’m not sure the above makes sense in this scenario, it seems like I’d need to create an index for each user (is that what you meant?).

No.

Your query, as written, appears to perform the equivalent of a table scan to find chefId values that match some searchChefId value, and any index entries where isPublished is true. You could adjust your index definition to include chefId as a term, and let the index perform that matching for you.

For example, if you create your index like this:

CreateIndex({
  name: "recipes_by_chefid_and_ispublished",
  source: Collection("recipes"),
  terms: [
    { field: ["data", "chefId"] },
    { field: ["data", "isPublished"] },
  ]
})

Then you can query for recipes like this:

Map(
  Paginate(
    Match(
      Index("recipes_by_chefid_and_ispublished"),
      ["Gordon Ramsay", true]
    ),
    { size: 10 }
  ),
  Lambda("ref", Get(Var("ref")))
)

For comparison, I created 1,000 recipes where Gordon Ramsay was the chefId, but only 1 recipe had isPublished is true.

Using the index I just suggested, the query consumed:

Query metrics:    176 bytesIn,       298 bytesOut,       10 queryTime, 
                    2 readOps,         0 writeOps,        1 computeOps,
                  221 readBytes,       0 writeBytes,      0 retries

Your query, using the same data set, consumed:

Query metrics:    364 bytesIn,       298 bytesOut,       93 queryTime, 
                   26 readOps,         0 writeOps,       81 computeOps,
               69,930 readBytes,       0 writeBytes,      0 retries

As you can see, using the “table scan” strategy involves 13x read ops and 81x compute ops (in 9x the time) to get the same answer as using the index to do the searching for you.

Kimfucious · September 12, 2022, 8:06pm

Thanks for the clarifications, @ewan.

I read up on the transactions limits you shared, and I can’t foresee exceeding them toward a 429, but I’ll keep this in mind.

Regarding the index, this thread is kinda old and @ptpaterson and I already had a discussion way up top, where he gave me some great insight on performance, which is line with your kind direction. I should have included that for clarity, mea culpa.

The index for this scenario is the below based on what I’ve learned toward avoiding full scans. All my indexes are built like this in hopes that I’m doing what’s most performant and cost efficient.

{
  name: "recipes_by_chef_id_is_published",
  unique: false,
  serialized: true,
  source: "Recipe",
  values: [
    {
      field: ["data", "chefId"]
    },
    {
      field: ["data", "isPublished"]
    },
    {
      field: ["ref"]
    }
  ]
}

Here’s another to illustrate further:

{
  name: "search_recipes", // <= I know this should have a better name 😊 
  unique: false,
  serialized: true,
  source: "Recipe",
  values: [
    {
      field: ["data", "title"]
    },
    {
      field: ["data", "chefId"]
    },
    {
      field: ["data", "cuisines"]
    },
    {
      field: ["data", "tags"]
    },
    {
      field: ["ref"]
    }
  ]
}

Now I’m laughing because I imagine that you’re sharpening the pins for a voodoo doll at this point, but I gotta ask this…

I’ve managed to “fix” all my queries so that I’ve got pagination in the right spot, based on your helpful advice; however, the query that I run on the above index is a bit more tricky:

Map(
    Distinct( 
        Map(
            Filter(
                Paginate(
                    Match(Index("search_recipes")),
                    options
                ),
                (title, chefId, cuisines, tags, ref) =>
                    Or(
                        And(
                            Or(
                                Equals(chefId, myChefId),
                                ContainsValue(
                                    chefId,
                                    myChefNetwork
                                )
                            ),
                            Or(
                                ContainsStr(
                                    Casefold(title),
                                    Casefold(searchString)
                                )
                            )
                        ),
                        And(
                            Or(
                                Equals(chefId, myChefId),
                                ContainsValue(
                                    chefId,
                                    myChefNetwork
                                )
                            ),
                            Or(
                                ContainsStr(
                                    Casefold(
                                        ToString(cuisines)
                                    ),
                                    Casefold(searchString)
                                )
                            )
                        ),
                        And(
                            Or(
                                Equals(chefId, myChefId),
                                ContainsValue(
                                    chefId,
                                    myChefNetwork
                                )
                            ),
                            Or(
                                ContainsStr(
                                    Casefold(ToString(tags)),
                                    Casefold(searchString)
                                )
                            )
                        )
                    )
            ),
            (title, chefId, cuisines, tags, ref) => ref
        )
    ),
    (ref) => Get(ref)
)

I’m using Distinct here because the query without it can return duplicates.

I’ve moved Pagination like the below:

Map(
    Distinct(
        Map(
            Paginate(
                Filter(
                    Match(Index("search_recipes")),
                    (title, chefId, cuisines, tags, ref) =>
                       ... snip
                ),
                options
            ),
            (title, chefId, cuisines, tags, ref) => ref
        )
    ),
    (ref) => Get(ref)
)
);

This works, sort of, but I think I gotta move Paginate around Distinct, but I’m flailing a bit. I’ll keep battling with it.

Thanks again for you guidance and patience!

ewan · September 12, 2022, 8:55pm

Your “more tricky” query continues to use the “table scan” strategy, which is necessary when you need to perform more than simple equivalence matching. But you could use an index to handle all of the direct equivalence matching, and then apply Filter for all of the “contains” matches. By doing that, you’re using indexes to reduce the number of index entries to consider. That would make your query more performant and cost less to execute.

The “contains” matches might be better modelled as many-to-many relations by using a “join” collection that identifies which chefs belong to which chef networks, and then you could use indexes to find those matches too. The same would apply to the cuisines.

For your query involving Distinct, it would be better if the Distinct call was inside the Paginate call, since you’d potentially be removing dupes only in a single page, rather than duplicates throughout the resulting set.

ptpaterson · December 11, 2022, 8:56pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FQL Equivalent of SQL WHERE column_name IN (value1, value2, ...);" Help fql	3	388	March 17, 2021
Sorting Union with Join Help fql	5	255	December 14, 2022
Search Index with a given array Help fql , udfs , functions	3	342	May 20, 2021
Filtering and pagination with multiple terms Help graphql , fql	4	2279	May 24, 2021
Match to any value in Filter function if critereon not set Help fql	3	314	January 31, 2022

Combining Queries with Union and partial matching Filter

Paginate then Filter

Filter then Paginate

Use Index bindings

side note

Related topics