From my understanding of read ops and due to the use of the index, this will only read the documents that fit the criteria in Range, making it efficient.
Range
, by itself, finds a subset within the provided set of values that fit within the start (inclusive) and end (exclusive) points. The set, at that point, is still abstract/hypothetical. To materialize the set, Fauna requires you to use the Paginate
function.
Materializing an index into a page of results based on the indexâs entries does not, in fact, read the associated documents. Read ops do occur to fetch the matching index entries but the documents themselves are not read. So index-only queries can potentially provide a huge savings in read ops, especially for large average document sizes, and including only small fields within the index.
With that background covered, on to your goal. Index construction is key. If your index is defined like so:
CreateIndex({
name: "car_stats_by_type",
source: Collection("cars"),
terms: [
{field: ["data", "type"]}
],
values: [
{field: ["data", "scores", "total"]}
]
})
Then your Range
query should succeed, but the only value returned for matching entries is the value in the total
. You canât fetch documents based on some field value; you need the documentâs reference.
However, if you include the documentâs reference, like so:
CreateIndex({
name: "car_stats_by_type_for_total",
source: Collection("cars"),
terms: [
{field: ["data", "type"]}
],
values: [
{field: ["data", "scores", "total"]},
{field: ["ref"]}
]
})
Then Range
will return both the total
value and the documentâs reference.
That reference can help you combine the two conditions you require. You donât just want the totals and the means, you want the documents that have those properties, either the total AND the mean, or perhaps the total OR the mean. Since Faunaâs indexes provide sets, you can use the set functions Union
(for OR), Intersection
(for AND), or Difference
(for NOT).
So, based on the criteria in your example, the first group of documents to gather is the set of type 2 cars where the total is above a certain value. Thatâs this clause:
Range(Match(Index("car_stats_by_type"), "2"), 15, [])
I chose 15
as the âcertain valueâ. The empty array means âto the end of the setâ, so you get all totals from 15 and up. Note that documents without a total
field are not included in the set.
The second set of documents are cars of type 1 with a mean
smaller than a certain value. Thatâs this clause:
Range(Match(Index("car_status_by_type_for_mean"), "1"), [], 7.5)
Here, I chose 7.5
as the âcertain valueâ. Since the empty array is in the Range
callâs start
position, this time it means âfrom the beginning of the setâ, so you get all means up to (but not including) 7.5.
That would require an index like this:
CreateIndex({
name: "car_stats_by_type_for_mean",
source: Collection("cars"),
terms: [
{field: ["data", "type"]}
],
values: [
{field: ["data", "scores", "mean"]},
{field: ["ref"]}
]
})
If we want to verify that clause 2 is working (based on the sample documents that you provide):
> Paginate(
Range(Match(Index("car_stats_by_type_for_mean"), "1"), [], 7.5)
)
{ data: [ [ 5, Ref(Collection("cars"), "330787989428896290") ] ] }
Now, to combine the clauses into a single set. How we do that depends on the data. Since your document structure can vary, there may be documents (as you described) that have total
fields but not mean
fields, those documents can only possibly exist in the set returned for clause 1 OR clause 2, but can never be in both sets. That rules out using Intersection
, unless you specifically care about documents that have both. Otherwise, youâd have to Union
the sets, to get documents with total
OR mean
fields.
Finally, as you can see in the test query, the results are tuples that include the total
or mean
value. We canât simply Union
or Intersection
these sets (regardless of goal), because the numerical values add an additional, unwanted, axis of comparison. When we combine the sets, we want only the document references.
To do that, we need to employ Join
and Singleton
:
Paginate(
Join(
Range(Match(Index("car_stats_by_type_for_mean"), "1"), [], 7.5),
Lambda(
["number", "ref"],
Singleton(Var("ref"))
)
)
)
This query repeats the Range
clause of the test query, but includes Join
which matches up the results from the Range
with a Lambda
function that forms a single-value set based on the document reference, effectively ignoring the numerical value.
If we do that with both clauses, our query would look like:
Paginate(
Union(
Join(
Range(Match(Index("car_stats_by_type_for_total"), "2"), 15, []),
Lambda(
["number", "ref"],
Singleton(Var("ref"))
)
),
Join(
Range(Match(Index("car_stats_by_type_for_mean"), "1"), [], 7.5),
Lambda(
["number", "ref"],
Singleton(Var("ref"))
)
)
)
)
The result is:
{
data: [
Ref(Collection("cars"), "330787947282432546"),
Ref(Collection("cars"), "330787989428896290")
]
}
Thatâs both of the documents that you provided, and they both match the first clause OR the second clause, because of Union
. Had we used Intersection
, the result would be empty because no document has a total
and mean
.
Since the resulting set only contains document references, we can confirm that what these documents contain with:
> Map(
Paginate(
Union(
Join(
Range(Match(Index("car_stats_by_type_for_total"), "2"), 15, []),
Lambda(
["number", "ref"],
Singleton(Var("ref"))
)
),
Join(
Range(Match(Index("car_stats_by_type_for_mean"), "1"), [], 7.5),
Lambda(
["number", "ref"],
Singleton(Var("ref"))
)
)
)
),
Lambda("ref", Get(Var("ref")))
)
{
data: [
{
ref: Ref(Collection("cars"), "330787947282432546"),
ts: 1651722819570000,
data: { car: 'car1', type: '2', scores: { total: 15, min: 0.5 } }
},
{
ref: Ref(Collection("cars"), "330787989428896290"),
ts: 1651722859760000,
data: { car: 'car1', type: '1', scores: { mean: 5, stddev: 1 } }
}
]
}
After you get this working, then you can work out how the two subordinate Range
clauses should be combined with the actual data you have in your collection.