Can you perform a query against an index without full scan?

It’s very efficient to look up something with a query like this:

Get(Ref(Collection("Recipe"), id))

A problem arises when using NextJS with dynamic routing.

For example, here’s a URL: https://www.myapp/recipes/tomahto/oven-cooked-bacon

Note that there is no Ref in the URL, and as such I’ve got no way to use the super efficient query above.

Now I could, but I’m trying to avoid it, make the URL like this: https://www.myapp/recipes/123456789098765432/oven-cooked-bacon, with that number being the the document Ref, and that would allow me to use the super efficient query.

For the record, I’m trying to avoid it purely for cosmetic reasons.

What I’ve been doing is using Match in a query against an index where values “data.author” and “data.slug” are being searched against “tomahto” and “oven-cooked-bacon” in the above URL.

Map(
    Filter(
        Paginate(Match(Index("recipe_by_author_asc_slug_asc")), {
            size: 500,
        }),
        (author, slug, ref) =>
            And(
                Equals(
                    Casefold(author),
                    Casefold(searchAuthor)
                ),
                Equals(Casefold(slug), Casefold(searchSlug))
            )
    ),
    (author, slug, ref) => Get(ref)
)
);

I had thought this to be efficient, but I realize now that a scan is being performed. This crystallized when I exceeded the default (64, I believe) of the pagination size option and I got zero results because the document I was searching for was not in the first page of data.

So here’s my question:

Is there any way to perform a query (i.e. search) against an index that does not involve a scan of the entire index (which pretty much the entire collection), or without the Ref will I always be performing a scan?

Thanks in advance for any advice on this.

You would need to define your index with the terms you are going to search on when creating the index, then you can match on those terms to filter the index results.

client.query(
  q.CreateIndex({
    name: 'recipe_by_author_slug',
    source: q.Collection('recipe'),
    terms: [
      { field: ['data', 'author'] },
      { field: ['data', 'slug'] },
    ],
}))

client.query(
  q.Paginate(
    q.Match(
      q.Index('recipe_by_author_slug'),
      ['author_name_here', 'slug_here']
    )
  )
)
2 Likes

Thanks for your thoughtful response @rcausey.

It seems like, and I’m probably beating a dead horse at this point, that if the document I was looking for was the 1000th in a collection of 1000 documents, for example, that the query would scan until it hit that last doc.

I’m assuming that’s how Match works, but I’m uncertain.

I guess my real question is: is there anything more efficient than:
Get(Ref(Collection("Recipe"), id))?

This is not the case. The index allows for efficient lookup based on the terms. Quoting this documentation:

Indexes act as a lookup table that improves the performance of finding documents: instead of reading every single document to find the one(s) that you are interested in, you query an index to find those documents.

So querying using a Match statement on an index performs an efficient lookup that does not require a full scan.

2 Likes

Excellent, @rcausey!

Thanks much for that clarification!

Just to cover some additional detail, the query should probably wrapped in a Map with a Lambda like this:

Map(
    Paginate(
        Match(Index("find_recipe_by_author_slug"), [
            author,
            slug,
        ])
    ),
    Lambda((ref) => Get(ref))
)
1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.