I would like to have variable multi-sort of documents. Like, sort by manufacturer, price etc. It seems like there are only two options. Create indices for every possible combination of sorting order (multiplied by every possible combination of term filtering) or create filter indices and do all the sorting manually, but slow, especially if there’s a lot of documents in the collection. Is there no options for performant and fully-generic multi-sort?
The only way to get performant sorted queries is by using Indexes. To sort without documents requires fetching the entire collection.
To be clear, this is how all database have to operate. Even if a database is accessed through SQL or other language that provides a simple, declarative syntax for sorting, the only way to make those performant is still to use Indexes. If a query planner for another database receives a request for sorted data, it has to make the decision to generate the index for you, or do an exhaustive query and sort manually.
Fauna does not make that decision for you, since the consequence of making that decision for you on a SaaS service could mean a high and/or unpredictable bill. Granted, it can be a pain to set up in advance.
There are tricks you can do in FQL to programmatically select the right index to use for sorting. One generic pattern that can be used is:
Let(
{
sorted_by: "price_asc", // could come from UDF argument, for example
match_results: /* Some query */
sorting_indexes: {
manufacturer_asc: Index("things_by_ref__manufacturer_asc"),
manufacturer_dec: Index("things_by_ref__manufacturer_dec"),
price_asc: Index("things_by_ref__price_asc"),
price_dec: Index("things_by_ref__price_dec"),
manufacturer_asc_price_asc: Index("things_by_ref__manufacturer_asc_price_asc")
// etc...
},
sorted_results: Join(
Var("match_results"),
// choose the index to sort from
Select(Var("sorted_by"), Var("sorting_indexes"))
)
},
Paginate(Var("sorted_results"))
)
Where the sorting Indexes have a form like this:
CreateIndex({
name: "things_by_ref__price_asc",
source: Collection("things"),
terms: [{ field: "ref" }],
values: [
{ field: ["data", "price"] },
{ field: "ref" }
]
})
And make sure that "match_results"
is a Set with only Ref values. For example, comes from an Index (or Intersection, Union, etc. of Indexes) with values: [{ field: "ref" }]
Another way to choose the appropriate Index could be by naming convention and checking for existence.
Let(
{
// could come from UDF argument, for example
// this time, an array of values
sorted_by: ["manufacturer_asc", "price_asc", "release_date"],
match_results: /* Some query */
sorting_index_name: Concat(Append(Var("sorted_by"), ["things_by_ref_"]),"_")
sorting_index: If(
Exists(Index(Var("sorting_index_name"))),
Index(Var("sorting_index_name")),
Abort(Format("No index found for sorting arguments: %@", [Var("sorted_by")]))
),
sorted_results: Join(
Var("match_results"),
// choose the index to sort from
Var("sorting_index")
)
},
Paginate(Var("sorted_results"))
)
To further dive into how cool (IMO ) FQL can be, you can actually generate these indexes using FQL, given the list of values.
I am going to enumerate the possible names and values by hand here, but it would be possible to further automate this with a UDF to create all of the permutations of a list. I’ll leave that up to you and/or another forums topic.
I tested this on a new database with the demo data. Here is a query to generate all of the sorting Indexes:
Let(
{
// could start with a list like this and
collection_name: "products",
sort_fields: ["name", "price"],
// convert those into values fields and Index name suffixes
values_permutations: [
["products_by_ref__name_asc", [{ field: ["data", "name"] }]],
["products_by_ref__name_dec", [{ field: ["data", "name"], reverse: true }]],
["products_by_ref__price_asc", [{ field: ["data", "price"] }]],
["products_by_ref__price_dec", [{ field: ["data", "price"], reverse: true }]],
// then add the permutations of those fields together
[
"products_by_ref__name_asc_price_asc",
[{ field: ["data", "name"] }, { field: ["data", "price"] }],
],
[
"products_by_ref__name_dec_price_asc",
[
{ field: ["data", "name"], reverse: true },
{ field: ["data", "price"] },
],
],
[
"products_by_ref__name_asc_price_dec",
[
{ field: ["data", "name"] },
{ field: ["data", "price"], reverse: true },
],
],
[
"products_by_ref__price_asc_name_asc",
[{ field: ["data", "price"] }, { field: ["data", "name"] }],
],
[
"products_by_ref__price_dec_name_asc",
[
{ field: ["data", "price"], reverse: true },
{ field: ["data", "name"] },
],
],
[
"products_by_ref__price_asc_name_dec",
[
{ field: ["data", "price"] },
{ field: ["data", "name"], reverse: true },
],
],
],
// filter out the indexes that have already been created
values_filtered: Filter(Var("values_permutations"), (name, _) =>
Not(Exists(Index(name)))
),
},
// return the result of creating all of the remaining Indexes
Map(Var("values_permutations"), (name,
values) =>
CreateIndex({
name,
source: Collection(Var("collection_name")),
terms: [{ field: "ref" }],
values: Append({ field: "ref" }, values),
})
)
)
I used a specific naming convention for the Indexes, so I can create a UDF to pick the right one, programmatically from a list of fields.
CreateFunction({
name: "get_sorting_index",
body: Query(Lambda(
["collection_name", "fields"],
Let(
{
sorting_index_name: Concat(Append(Var("fields"), [Var("collection_name"),"by_ref_"]),"_")
},
If(
Exists(Index(Var("sorting_index_name"))),
Index(Var("sorting_index_name")),
Abort(Format("No index found for sorting arguments: %@", [Var("fields")]))
)
)
)
)
})
Here’s an example of using it
Awesome, thank you! I wasn’t aware of Join. Is there any performance penalty for using? A separate index for sorting via Join?
Join does add extra Read cost; what it does is pipe the output of one Set into another Index, which means reading multiple indexes.
Additional indexes also mean:
- More storage - have to save all of that index data
- Higher Write cost - new documents and updates have to be written to all of those indexes
So there are tradeoffs to be made. Join can be used in a couple of different ways. And of course you can start with a complex query using Intersection, Union, etc. The best recommendation we can make regarding cost, is to test out the performance of your queries by inspecting the results.
In addition to the important details for how Fauna charges various operations, the Billing pages of the docs also show how you can inspect your query responses for their cost. So, you can try out some different things and determine for yourself if the tradeoffs are worth it.
This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.