But anyway, this pattern would return the entire document instead of just the desired dataset, correct? The document where you can find what I’m looking for.
When you use an index for searching, your index definition specifies what is returned. By default, only the indexed document’s reference is returned, but you can specify a values
definition that identifies which document fields should be returned for matching index entries.
For example:
> CreateCollection({ name: "matches" })
{
ref: Collection("matches"),
ts: 1666816661170000,
history_days: 30,
name: 'matches'
}
> CreateIndex({
name: "matches_by_date_start",
source: Collection("matches"),
terms: [
{ field: ["data", "matches", "date_start"] }
]
})
{
ref: Index("matches_by_date_start"),
ts: 1666816837140000,
active: true,
serialized: true,
name: 'matches_by_date_start',
source: Collection("matches"),
terms: [ { field: [ 'data', 'matches', 'date_start' ] } ],
partitions: 1
}
> Create(
Collection("matches"),
{
data: {
season_id: 4,
leg_id: 5,
matches: [
{
match_id: 7,
date_start: ToDate(TimeSubtract(Now(), 3, "days"))
}
]
}
}
)
{
ref: Ref(Collection("matches"), "346615296387187200"),
ts: 1666816955840000,
data: {
season_id: 4,
leg_id: 5,
matches: [ { match_id: 7, date_start: Date("2022-10-23") } ]
}
}
> Create(
Collection("matches"),
{
data: {
season_id: 5,
leg_id: 6,
matches: [
{
match_id: 8,
date_start: ToDate(TimeSubtract(Now(), 2, "days"))
},
{
match_id: 9,
date_start: ToDate(TimeSubtract(Now(), 1, "days"))
}
]
}
}
)
{
ref: Ref(Collection("matches"), "346615380211401216"),
ts: 1666817035800000,
data: {
season_id: 5,
leg_id: 6,
matches: [
{ match_id: 8, date_start: Date("2022-10-24") },
{ match_id: 9, date_start: Date("2022-10-25") }
]
}
}
> > Paginate(Match(Index("matches_by_date_start"), ToDate(TimeSubtract(Now(), 1, "days"))))
{ data: [ Ref(Collection("matches"), "346615380211401216") ] }
If you want to instead perform a range query (instead of searching for a specific date_start
), the index could be:
> CreateIndex({
name: "matches_with_date_start",
source: Collection("matches"),
values: [
{ field: ["data", "matches", "date_start"] },
{ field: ["ref"] }
]
})
> Paginate(
Range(
Match(Index("matches_with_date_start")),
ToDate(TimeSubtract(Now(), 10, "days")),
ToDate(TimeSubtract(Now(), 2, "days"))
)
)
{
data: [
[
Date("2022-10-23"),
Ref(Collection("matches"), "346615296387187200")
],
[
Date("2022-10-24"),
Ref(Collection("matches"), "346615380211401216")
]
]
}
You could adjust that query to fetch the full document for matching entries:
> Map(
Paginate(
Range(
Match(Index("matches_with_date_start")),
ToDate(TimeSubtract(Now(), 10, "days")),
ToDate(TimeSubtract(Now(), 2, "days"))
)
),
Lambda(
["date", "ref"],
Get(Var("ref"))
)
)
{
data: [
{
ref: Ref(Collection("matches"), "346615296387187200"),
ts: 1666816955840000,
data: {
season_id: 4,
leg_id: 5,
matches: [ { match_id: 7, date_start: Date("2022-10-23") } ]
}
},
{
ref: Ref(Collection("matches"), "346615380211401216"),
ts: 1666817035800000,
data: {
season_id: 5,
leg_id: 6,
matches: [
{ match_id: 8, date_start: Date("2022-10-24") },
{ match_id: 9, date_start: Date("2022-10-25") }
]
}
}
]
}
But when it comes to standardization and performance, the first way I was using is probably better. What do you think?
That depends on what the overall workload involving these documents looks like. Normalization can be used to reduce redundancy and improve data integrity. If you commonly need access to the entire document, that makes sense. However, if the majority of your queries for documents in the collection need only a few fields, it might make sense to adjust the data model, including the index definition, accordingly.