Arrays in indexes produce unexpected results

Hi there,
Have discovered something unexpected with arrays in documents referenced in indexes, and the results that indexes produce.
While you cannot place a field that is an array into the value section of an index where the term of the index only points to the document reference (you will get null / empty for the field - expected behaviour) you can of course place a field that points to an array in the term section.

Example Document in TestArray collection

{
  "name": "Array One",
	"options": [
		Ref(Collection("Options"), "351011260955886162"),
		Ref(Collection("Options"), "351011279983346265"),
		Ref(Collection("Options"), "351011349946434132")
	]
}

Index 1
The first index looks like this:

CreateIndex({
	name: "TestArray_by_ref_1",
	unique: false,
	serialized: false,
	source: Collection("TestArray"),
	terms: [
		{ field: ["ref"] },
		{ field: ["data", "options"] }
	],
	values: [
		{ field: ["ref"] }
	]
})

Query 1
The purpose is to test if an option is present in a document.
Let’s check:

Paginate(
  Match(
    Index("TestArray_by_ref_1"),
    Ref(Collection("TestArray"), "351003461841584722"),
    Ref(Collection("Options"), "351011260955886162")
  )
)

As Expected
Which will return ONE match, because there is of course only one match:

{
  data: [Ref(Collection("TestArray"), "351003461841584722")]
}

The Question
What if we have an array of options as input, and we want to find out which ones of those the document has, and output them as an array?
With the current index we can of course create a query that goes through the input array, repeatedly queries the index for each one, keeping track of the option queried for each time, and then build a new array of those options that match one by one (this is something I have not actually done. but I seem to remember reading something somewhere where someone did manage this).
On the other hand, if we could just grab the option reference directly from the index’s output, and further simply wrap everything in a Paginate to create a simpler query that produces an array that can be consumed by something else - that would be cool.

So, we update the index to contain the array in the value as well - even though it is not expected to work on its own, since the index knows the value pulled from the array as it is in TERMS, perhaps it would be kind enough to copy that to the VALUES as well?
Thought I would give it a try:

CreateIndex({
	name: "TestArray_by_ref_2",
	unique: false,
	serialized: false,
	source: Collection("TestArray"),
	terms: [
		{ field: ["ref"] },
		{ field: ["data", "options"] }
	],

	values: [
		{ field: ["data", "options"] }
		{ field: ["ref"] },
	]
})

So this time when we do a query, maybe, just maybe the options field will contain the item filtered to:

Paginate(
  Match(
    Index("TestArray_by_ref_2"),
    Ref(Collection("TestArray"), "351003461841584722"),
    Ref(Collection("Options"), "351011260955886162")
  )
)

It would be very useful if it were to produce this:

{
  data: [
    Ref(Collection("Options"), "351011260955886162"),
    Ref(Collection("TestArray"), "351003461841584722")
}

However, surprisingly it produces the following:

{
  data: [
    [
      Ref(Collection("Options"), "351011260955886162")
      Ref(Collection("TestArray"), "351003461841584722")
    ],
    [
      Ref(Collection("Options"), "351011279983346265"),
      Ref(Collection("TestArray"), "351003461841584722")
    ],
    [
      Ref(Collection("Options"), "351011349946434132"),
      Ref(Collection("TestArray"), "351003461841584722")
    ]
  ]
}

I realise this is an edge case, and is not documented behaviour.
At present arrays are not meant to be in the values section of indexes, however I thought it was interesting that this happened, and I wondered if there are any plans to tweak the index system to be able to support arrays if they were used in this fashion - such that the value that matches in the terms section is made available to the values section to support simpler queries.

Thanks for taking the time to read this.

One quick clarification: You can provide an array field as a value. Per our docs:

When a values definition points to a document field or index binding result that contains an Array, one index entry per array item is created.

You brought up a particular case of the term being the document’s ref, so I made sure to test it myself.

What you are encountering is exactly what the documentation is saying: one index entry per array item is created. There are 3 options in the array, so you get back 3 results.

It looks like you might have a many-to-many relationship between TestArray and Options. I’m not sure how many unique Options there are, but one thing you might consider is using an intermediate “link table” Collection to join the two. Each document only contains one TestArray and one Option. Then you would check relationships with an index on the link collection.

If you still prefer storing the array of Options (totally valid), and you want to associate the term you used with the results, then Map is the appropriate method here.

Let(
  {
    test_ref: Ref(Collection("TestArray"), "351003461841584722"),
    option: Ref(Collection("Options"), "351011260955886162")
  },
  Map(
    Paginate(
      Match(
        Index("TestArray_by_ref_1"),
        Var("test_ref"),
        Var("option")
      )
    ),
    Lambda(
      ["ref"],
      [Var("ref"), Var("option")]
    )
  )
)

Depending on what you are doing you might not need an index at all. You already have the Ref. So you can just check if the option you have is in the documents list of options.

Let(
  {
    test_ref: Ref(Collection("TestArray"), "351003461841584722"),
    option: Ref(Collection("Options"), "351011260955886162")
    ref_options: Select(["data", "options"])
  },
  If(
    ContainsValue(Var("option"), Var("ref_options")),
    [Var("test_ref"), Var("option")],
    []
  )
)

@ptpaterson Thank you very much for your quick response.

Thank you for pointing out that array fields can be a value in an index. I read the documentation earlier this year (all of it), and somehow conflagrated objects with arrays!
I thought I had tested this, and received an empty array, now I wonder if perhaps I had a failed test. Sorry to have bothered you with this non-issue, but I thank you for your help.

The link you posted did not work. I searched the text you posted and found it here Index Values docs

After reading your note, and the documentation again, it makes perfect sense that there would be multiple results when looking for a single option.

For anyone else reading this: a document with a field array (in this case called options) containing three items will create three index entries in an index formed like this:

CreateIndex({
	name: "TestArray_by_ref_1",
	unique: false,
	serialized: false,
	source: Collection("TestArray"),
	terms: [
		{ field: ["ref"] },
		{ field: ["data", "options"] }
	],
	values: [
		{ field: ["ref"] }
	]
})

but it will create 9 entries for an index formed like this

CreateIndex({
	name: "TestArray_by_ref_2",
	unique: false,
	serialized: false,
	source: Collection("TestArray"),
	terms: [
		{ field: ["ref"] },
		{ field: ["data", "options"] }
	],
	values: [
		{ field: ["ref"] },
		{ field: ["data", "options"] }
	]
})

because as @ptpaterson points out in the documentation:

When a values definition points to a document field or index binding result that contains an Array, one index entry per array item is created.

So when you query a single document and option ref, you end up with three results, not one result (in this case).

Many to many

It looks like you might have a many-to-many relationship between TestArray and Options. I’m not sure how many unique Options there are, but one thing you might consider is using an intermediate “link table” Collection to join the two. Each document only contains one TestArray and one Option. Then you would check relationships with an index on the link collection.

Yes, it is a many-many relationship; and indeed using a look up table (link table) is how I would normally do things in SQL. Thank you for mentioning this, and also for the sample code for solving the issue I had with multiple results. Very succinct.

As an aside :-
The purpose of putting the options in an array like this is that hundreds to potentially thousands of documents may be loaded at one time.
The number of options per document will be kept under ten, some of them will be quite common amongst the documents; there will be thousands of options to chose from as time goes on, but only around ten will be attached to each document.
While options are used to filter the documents, documents that are found to match will require their full options to be loaded - as they are needed for further processing.

While the options could be mapped in a link table, and then read and matched from an index of the link table; after a document is loaded, the full options of the document would need to be merged and attached to it. While I can think of many ways of doing this, this additional processing feels ‘wrong’ somehow.
It seems logically faster and simpler if they are loaded with the document by storing the options as a field array in the document itself.

In terms of how Fauna works (on the backend), does this approach make sense? Am I over thinking things?

Thanks again!

Keeping the options in an array is a valid strategy. It might make it more complicated to return the exact payload you want from an index, but we determined you can get that using other methods. And maybe it can save you a lot of read and write costs elsewhere. You’ll have to determine where to draw the line for your application.

1 Like

Thanks @ptpaterson, appreciate your thoughts.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.