Feedback wanted! Advanced dynamic query generation

I am working on developing an advanced pattern for generating FQL, based on a dynamic list of search criteria.

I put the following together, but I’d love to hear feedback on this and any suggestions other requirements or ways to extend this. Maybe enabling “OR” elements, or a mechanism to pick the right index with different covered values.

And are there other kinds of solutions you have found to work well? Let us know!

Cheers!

Problem

You have an application that needs to search through a Collection with any number of search terms. These terms may be checking for equality, inequality, the existence of a relationship, etc.

There is a documentation page all about working with multiple Sets, but how can you scale that up to work with some unknown, dynamic combination of search criteria?

Some examples with FQL

Naive intersections works well on small sets (see also FQL v4 Intersection)! But we should avoid it with large or unbounded Sets.

You can build an intersection by simply checking if each value in the first index’s results is included in the next index’s results, then repeat with the next. Here is an example using the demo data available when you create a database in the dashboard.

Product.byName("limes")
  .where(doc => Product.byCategory(Category.byName("produce").first()!).includes(doc))
  .where(doc => Product.sortedByPriceLowToHigh({ to: 500 }).includes(doc))

This can be efficient if the primary index does not include covered values for the search AND the subsequent indexes are small. This is one way you can avoid reading individual documents, but it can mean reading a lot more index data, and performing a lot more compute, if the subsequent indexes are large.

Another way to do this is filtering directly

Product.byName("limes")
  .where(doc => doc.category == Category.byName("produce").first()!)
  .where(doc => doc.price < 500 )

This will always work better than the .includes() method IF the filtered values are covered by the index, but it can also perform terribly if no values are covered (every document will be read). You might not be able to guarantee values are covered for ALL search criteria, so which method you choose might depend on what covered values are available, the cardinality of the index, and how large the index results are expected to be.

Product.byName("limes")
  .where(doc => Product.byCategory(Category.byName("produce").first()!).includes(doc))
  .where(doc => doc.price < 500 )

Building with the driver

So how do we build these kinds of queries dynamically? I’ve used the Javascript driver here, but the idea of composing FQL snippets together is applicable to any language or driver.

In the JS driver you can use the fql`` tagged template function to save a snippet and reuse later.

// JS

let query = fql`Product.byName("limes")`

const filters = [
  fql`.where(doc => doc.category == Category.byName("produce").first()!)`,
  fql`.where(doc => doc.price < 500 )`
]

filters.forEach(filter => {
  // append each filter to the query
  query = fql`${query}${filter}`
})

There are a lot of ways that you can use FQL snippets and interpolation to dynamically compose queries.

Advanced example

Here is a much more complex example, using typescript, of a function that can build arbitrary queries based on a list of search terms.

/**
 * A JS object with sorted list of indexes or filters
 *
 * Javascript maintains key order for objects! Sort items in this map by most
 * selective to least selective.
 */
type QueryMap = Record<string, (...args: any[]) => Query>

/** An object to respresent a search argument
 *
 * `type` is either "by" or "range" to represent whether it is intented to
 * filter by a specific value or range of values.
 */
type SearchTerm = {
  name: string
  args: any[]
}

/**
 * Constructs an optimal query by prioritizing the most selective index and then
 * applying necessary filters.
 *
 * @param default_query - The initial query to which indices and filters are applied.
 * @param index_map - A map of index names to functions that generate query components.
 * @param filter_map - A map of filter names to functions that generate query components.
 * @param search_terms - An array of search terms that specify the type and arguments
 *                       for constructing the query.
 * @returns The constructed query after applying all relevant indices and filters.
 */

const build_search = (
  default_query: Query,
  index_map: QueryMap,
  filter_map: QueryMap,
  search_terms: SearchTerm[]
): Query => {
  // local copy so we don't mutate the original
  const _search_terms = [...search_terms]

  // initialize a default query in case no other indexes are available to be used.
  let query: Query = default_query

  // iterate through the index map, from most specific to least specific
  build_index_query: for (const index_name of Object.keys(
    index_map
  )) {
    // iterate through each search term to check if it matches the highest priority index
    for (const search_term of _search_terms) {
      // If a match is found, update the query, remove the search term from the
      // list, and break out of the loop.
      if (index_name === search_term.name) {
        query = index_map[search_term.name](...search_term.args)
        _search_terms.splice(_search_terms.indexOf(search_term), 1)
        break build_index_query
      }
    }
  }

  // iterate through the filter map, from most specific to least specific
  for (const filter_name of Object.keys(filter_map)) {
    // iterate through each search term to check if it matches the highest priority filter
    for (const search_term of _search_terms) {
      // If a match is found, update the query and remove the search term from
      // the list.
      if (filter_name === search_term.name) {
        const filter = filter_map[search_term.name](...search_term.args)
        query = fql`${query}${filter}`
        _search_terms.splice(_search_terms.indexOf(search_term), 1)
      }
    }
  }

  // if there are remaining search terms, we weren't able to build the full query
  if (_search_terms.length > 0) {
    throw new Error("Unable to build query")
  }

  return query
}

This function lets you provide a javascript object as a priority map full of indexes and filters (which can use indexes) to generate your query. It doesn’t matter what order in which you provide arguments: it will use the highest priority index available and then order the priority of filters as well.

Here’s an example, replicating the demo data queries again.

  // Javascript maintains key order for objects! Sort items in this map by most
  // selective to least selective.
  const product_index_priority_map: QueryMap = {
    by_order: (id: string) =>
      fql`Order.byId(${id})!.items.map(.product!)`,
    by_name: (name: string) => fql`Product.byName(${name})`,
    by_category: (category: string) =>
      fql`Product.byCategory(Category.byName(${category}).first()!)`,
    range_price: (range: { from?: number; to?: number }) =>
      fql`Product.sortedByPriceLowToHigh(${range})`,
  }

  const product_filter_map: QueryMap = {
    by_name: (name: string) => fql`.where(.name == ${name})`,
    by_category: (category: string) =>
      fql`.where(.category == Category.byName(${category}).first()!)`,
    range_price: ({ from, to }: { from?: number; to?: number }) => {
      if (from && to) {
        return fql`.where(.price >= ${from} && .price <= ${to})`
      } else if (from) {
        return fql`.where(.price >= ${from})`
      } else if (to) {
        return fql`.where(.price <= ${to})`
      }
      return fql``
    },
  }

  const product_filter_with_indexes_map: QueryMap = {
    by_name: (name: string) =>
      fql`.where(doc => Product.byName(${name}).includes(doc))`,
    by_category: (category: string) =>
      fql`.where(doc => Product.byCategory(Category.byName(${category}).first()!).includes(doc))`,
    range_price: (range: { from?: number; to?: number }) =>
      fql`.where(doc => Product.sortedByPriceLowToHigh(${range}).includes(doc))`,
  }

  const order_id = (await client.query(fql`Order.all().first()!`))
    .data.id

  const query = build_search(
    fql`Product.all()`,
    product_index_priority_map,
    product_filter_with_indexes_map,
    [
      // { name: "by_name", args: ["limes"] },
      // { name: "by_category", args: ["produce"] },
      { name: "price_range", args: [{ to: 1000 }] },
      { name: "by_order", args: [order_id] },
    ]
  )

  const res = await client.query(query)
3 Likes

I like the idea and the approach, in one of our implementations of graphql + fql, we’re essentially following a pattern where we have the entire logic built into the query and then have all the arguments added at the top. So it looks like this:

let args = ${args ?? null}
let set = DataSet.all()

let set = if (args.filter) { set.where(...) } else { set }
set

It feels pretty flexible because there’s no need to set up maps. I think the main thing I’d like to see with some sort of query builder is being able to create a query without having to build a lot for the query engine to function optimally. I feel like usually with query builders, the main thing we’re trying to achieve is converting a JSON set of filters, sorting, and fields (sometimes with fields containing arguments themselves) into something the DB will spit out. Thinking through this, it’d be pretty awesome to see a similar approach but focused on the transition from JSON to FQL. So maybe something where the resulting developer interface looks like this:

buildQuery(fql`Products.all()`, { price: { $gt: 2 } }, ['sortedByPriceLowToHigh'], fql`{ id, name, ...otherFields }`)

The buildQuery function could internally try to identify indexes for the collection and calculate which indexes to use, the sort could literally just be a direct index, and the fields could be just an optional element. I think this still doesn’t land on the use case where nested fields contain arguments (like they could in graphql) but it’s a pretty good start.

Thanks @lepozepo! Very interesting. I know I left out the part where you transform arguments – maybe query parameters, request body, or a GraphQL selection – into that already specialized array of “search terms” that have a one-to-one relationship with the FQL that will be used. You could start with something like { price: { $gt: 2 } } and process that to select price_range from the map with { from: 2 } as args.

I really like this idea of generating the choices for indexes based on the schema definition. In that way, my index and filter maps are more like an internal implementation detail, rather than something you need to hand craft. Certainly for defaults! Then you could provide custom ones or overrides for cases you want to be more intentional about.

Extending the GraphQL example, default resolvers are provided for you based on the schema, and several frameworks generate resolvers based on the data model, but you can always provide your own.

You wouldn’t want to compute available indexing options available on every query. You’d either want to load that at startup with some sort of introspection query, or generate in advance to commit to your code. I still imagine an internal data structure to represent the available index options and advanced filters (those that rely on more indexes).

I mapped out basic filters, but in retrospect it makes sense those should be more easily generated on the fly. For example { price: { $gt: 2 } }.price > 2 without having to store that in a data structure. So instead of throwing where I did, if a special filter was not found you can generate one that makes sense for the search criteria.