Search by date within document data array

Hello, I’m new in faunadb and I’m having trouble getting the expected result for a search in FQL (I tried a lot and looked around here before giving up and coming here for help)

I would like to return data from a document that was within a specific date.

The structure of my document is like this:

"ref": Ref(Collection("matches"), "..."),
"ts": ...,
"data": {
  "season_id": 4,
  "leg_id": 5,
  "matches": [
    {
      "match_id": 7,
      "date_start": "2022-08-22T20:00:00.000-03:00",
      ...
    },
  ]
}

And I would like to search only for the date in the format “YYYY-MM-DD” and return only the records that matched the correct date.

thanks for the help

Hi Roco!

Fauna indexes can capture scalar values, not arrays or objects. If you try to index an array, Fauna creates one index entry per array item. Since your array items are objects, your current data structure is not amenable to easy searching.

To search via an index, the index must specify one or more terms. Each term identifies a specific field within the document that should be included within the index, so that quick lookups are possible. You could specify a term such as:

{ field: ["data", "matches", 0, "date_start"] }

That would let you search on the date_start field for the first match entry. However, subsequent items in the matches array would be difficult to include because there is no way to know the array size when you define the index.

Have you considered creating one document per “match”? That would make searching easier. For example, this document structure would make searching easier:

{
  season_id: 4,
  leg_id: 5,
  match_id: 7,
  date_start: "2022-08-22T20:00:00.000-03:00",
  ...
}

Then the term required would be:

  { field: ["data", "date_start"] }

And every date_start field would be included in the index.

1 Like

I see, so the best approach in this case would be to put each item of my data inside a document and not multiple items from each document.
I was thinking that my previous approach would be better to reduce redundancy and take advantage of this different data format than a sql database. I didn’t know this was impossible.
Thanks.

I didn’t know this was impossible.

I didn’t say it was impossible.

Fauna lets you store documents of arbitrary complexity, up to 8mb/document. But not all structures are easy to search. Performance and usability are common concerns when establishing a data model.

I’m happy to discuss further if you want to share more about the data you have and the goals you want to achieve.

One of my colleagues pointed out that you can index on the date_start field using this path:

{ field: ["data", "matches", "date_start"] }

This field path works because Fauna creates a separate index entry for each item in the array. The ["data", "matches"] portion of the field path reaches the array, and then ["date_start"] applies to the array item during indexing.

Using field paths this way is new to me and is not covered in the documentation. I’ve filed an issue (DOCS-2254) to get this information added to the index coverage.

But anyway, this pattern would return the entire document instead of just the desired dataset, correct? The document where you can find what I’m looking for.

Or would only the data that are inside “matches” be returned?

I’ve already undid the structure and put everything each group of info in a document. When I can I will test with the previous structure.

But when it comes to standardization and performance, the first way I was using is probably better. What do you think?

But anyway, this pattern would return the entire document instead of just the desired dataset, correct? The document where you can find what I’m looking for.

When you use an index for searching, your index definition specifies what is returned. By default, only the indexed document’s reference is returned, but you can specify a values definition that identifies which document fields should be returned for matching index entries.

For example:

> CreateCollection({ name: "matches" })
{
  ref: Collection("matches"),
  ts: 1666816661170000,
  history_days: 30,
  name: 'matches'
}

> CreateIndex({
  name: "matches_by_date_start",
  source: Collection("matches"),
  terms: [
    { field: ["data", "matches", "date_start"] }
  ]
})
{
  ref: Index("matches_by_date_start"),
  ts: 1666816837140000,
  active: true,
  serialized: true,
  name: 'matches_by_date_start',
  source: Collection("matches"),
  terms: [ { field: [ 'data', 'matches', 'date_start' ] } ],
  partitions: 1
}

> Create(
  Collection("matches"),
  {
    data: {
      season_id: 4,
      leg_id: 5,
      matches: [
        {
          match_id: 7,
          date_start: ToDate(TimeSubtract(Now(), 3, "days"))
        }
      ]
    }
  }
)
{
  ref: Ref(Collection("matches"), "346615296387187200"),
  ts: 1666816955840000,
  data: {
    season_id: 4,
    leg_id: 5,
    matches: [ { match_id: 7, date_start: Date("2022-10-23") } ]
  }
}

> Create(
  Collection("matches"),
  {
    data: {
      season_id: 5,
      leg_id: 6,
      matches: [
        {
          match_id: 8,
          date_start: ToDate(TimeSubtract(Now(), 2, "days"))
        },
        {
          match_id: 9,
          date_start: ToDate(TimeSubtract(Now(), 1, "days"))
        }
      ]
    }
  }
)
{
  ref: Ref(Collection("matches"), "346615380211401216"),
  ts: 1666817035800000,
  data: {
    season_id: 5,
    leg_id: 6,
    matches: [
      { match_id: 8, date_start: Date("2022-10-24") },
      { match_id: 9, date_start: Date("2022-10-25") }
    ]
  }
}

> > Paginate(Match(Index("matches_by_date_start"), ToDate(TimeSubtract(Now(), 1, "days"))))
{ data: [ Ref(Collection("matches"), "346615380211401216") ] }

If you want to instead perform a range query (instead of searching for a specific date_start), the index could be:

> CreateIndex({
  name: "matches_with_date_start",
  source: Collection("matches"),
  values: [
    { field: ["data", "matches", "date_start"] },
    { field: ["ref"] }
  ]
})

> Paginate(
  Range(
    Match(Index("matches_with_date_start")),
    ToDate(TimeSubtract(Now(), 10, "days")),
    ToDate(TimeSubtract(Now(), 2, "days"))
  )
)
{
  data: [
    [
      Date("2022-10-23"),
      Ref(Collection("matches"), "346615296387187200")
    ],
    [
      Date("2022-10-24"),
      Ref(Collection("matches"), "346615380211401216")
    ]
  ]
}

You could adjust that query to fetch the full document for matching entries:

> Map(
  Paginate(
    Range(
      Match(Index("matches_with_date_start")),
      ToDate(TimeSubtract(Now(), 10, "days")),
      ToDate(TimeSubtract(Now(), 2, "days"))
    )
  ),
  Lambda(
    ["date", "ref"],
    Get(Var("ref"))
  )
)
{
  data: [
    {
      ref: Ref(Collection("matches"), "346615296387187200"),
      ts: 1666816955840000,
      data: {
        season_id: 4,
        leg_id: 5,
        matches: [ { match_id: 7, date_start: Date("2022-10-23") } ]
      }
    },
    {
      ref: Ref(Collection("matches"), "346615380211401216"),
      ts: 1666817035800000,
      data: {
        season_id: 5,
        leg_id: 6,
        matches: [
          { match_id: 8, date_start: Date("2022-10-24") },
          { match_id: 9, date_start: Date("2022-10-25") }
        ]
      }
    }
  ]
}

But when it comes to standardization and performance, the first way I was using is probably better. What do you think?

That depends on what the overall workload involving these documents looks like. Normalization can be used to reduce redundancy and improve data integrity. If you commonly need access to the entire document, that makes sense. However, if the majority of your queries for documents in the collection need only a few fields, it might make sense to adjust the data model, including the index definition, accordingly.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.