Empty index array not considered unique in multi-term index

,

I have created an index which helps me find documents by tag. Each document may have multiple tags, and I’m using an index binding to create indexes automatically from the document. Some documents may not have any tags (that’s okay, we can retrieve them using a different index). Each tag only corresponds to a single document, so I have set the index to be unique.

So far, so straightforward. The only further complication is that each document also belongs to a user, and the tags are only unique per-user. No problem, I’ll use two terms in my index (one for the user, and one for the tags). This works fine until I try to create two documents with an empty tag array:

CreateCollection({name: "foos"})
CreateIndex({
 name: 'foos_by_tag',
 source: {
   collection: Collection("foos"),
   fields: {
     tags: Query(
       Lambda((doc) =>
           Select(['data', 'tags'], doc)
       )
     ),
   },
 },
 terms: [
   {
     field: ['data', 'user'],
   },
   {
     binding: 'tags',
   },
 ],
 unique: true,
 serialized: true,
})

Create("foos", { data: { user: "john", tags: [] } } )
Create("foos", { data: { user: "john", tags: [] } } ) // error: document is not unique

Note that removing the [data, user] field from the index causes this problem to go away, so I guess this is a bug with the creation of compound indexes (possibly it’s creating an index using just the user when there are no tags, and then that does turn out to be non-unique. In this case I would expect no index to be created (1x0 = 0)).

Hi @cjol,

May I ask you if there is any reason why you concatenate tags?

Luigi

Hi Luigi, sorry I didn’t mean to leave that section in the sample code (even commented out). I’ll remove it now as it’s not necessary for the example (I was originally concatenating because my actual use-case is slightly more complicated0.

Hi @cjol,

If you want to explain your use-case, I’m happy to take a look and in case provide you with some suggestions to optimize you data model.
Feel free to contact me in private, if you prefer to not share your data here.

Luigi

Hi Luigi,

No privacy concerns, I just simplified the example to make it easier to follow. Did you understand the bug I described in my original post? (or maybe you think it’s not a bug!). You should be able to paste it into a Fauna Shell to replicate.

For completeness, I’m happy to also post my original code. The only difference is that the “tags” in the OP example are now arrays of the form ["Animal", "Mammal", "Cat"] (and there may still be 0-N such arrays for each document). As I understand it I can’t index on the array itself, so I’m concatenating to make strings like “Animal#Mammal#Cat” instead.

const CreateCategoryIndex = q.CreateIndex({
  name: 'spaces_by_user_category',
  source: {
    collection: spaceCollection,
    fields: {
      catPrefixes: q.Query(
        q.Lambda((doc) =>
          q.Map(
            q.Select(['data', 'categories'], doc),
            q.Lambda((cats) => q.Concat(cats, '#'))
          )
        )
      ),
    },
  },
  terms: [
    {
      field: ['data', 'user'],
    },
    {
      binding: 'catPrefixes',
    },
  ],
  unique: true,
  serialized: true,
})

Hi @cjol,

Well, it’s not true, you can index an array, and each element of the array is indexed.
You can probably change your index that way:

CreateIndex(
  {
  name:'spaces_by_user_category',
  source:Collection("spaceCollection"),
  terms:[
    {field: ['data', 'user']},
    {field: ['data', 'categories']}]
  }
)

and then query:

Paginate(
  Intersection(
    Match('spaces_by_user_category',['user1','Animal']),
    Match('spaces_by_user_category',['user1','Mammal']),
    Match('spaces_by_user_category',['user1','Cat'])
  )
)

Hope this helps.

Luigi

Thanks, but this doesn’t guarantee uniqueness of the index, correct? I might end up with multiple documents with the same set of tags. Intuitively it also feels like running three indexes would be less performant than the one with the index bindings, wouldn’t it? This is a read-dominated collection, where I would expect each index to be read a few times a day, but only written to less than once a month.

Perhaps you already saw it, but I was musing the best pattern for my use-case over in this over thread: FQL Query for Prefix of Search term

I’ve opened this separate thread because I think there’s a bug in the case of a unique index which uses an array as a secondary term. Here’s an even simpler example that doesn’t use bindings:

CreateIndex({
name: 'foos_by_tag',
source: Collection("foos"),
terms: [
  { field: ['data', 'user'] },
  { field: ['data', 'tags'] }
],
unique: true,
serialized: true,
})
Create("foos", { data: { user: "john", tags: [] } } )
Create("foos", { data: { user: "john", tags: [] } } ) // error: instance not unique

I believe that the error is a bug, and that I should be able to create a second document using this index. Do you disagree?

Hi @cjol,

I don’t think is an error neither a bug. You declared a unique index with 2 terms and trying to create 2 times the same document.
Why don’t you think Fauna should not raise an error here?

Luigi

Becaues the tags array is empty, I don’t think the document should be indexed at all. Notice, for example, that this does not trigger an error:

CreateIndex({
name: 'foos_by_tag_only',
source: Collection("foos"),
terms: [
  {
    field: ['data', 'tags'],
  },
],
unique: true,
serialized: true,
})
Create("foos", { data: { user: "john", tags: [] } } )
Create("foos", { data: { user: "john", tags: [] } } )

The empty array is equivalent to null, and in that case, it does not create an index entry, that’s why Fauna doesn’t raise any error.

Luigi

That behaviour feels inconsistent then, don’t you think?
With a single term, an array of length N creates N indexes (so 0 indexes if the array is empty).
With multiple terms, an array of length N creates N indexes EXCEPT if N = 1, in which case one index is created.

To summarise, I can’t find a way to create no indexes with a multi-term index. To work around this in my code, I’m currently doing this:

CreateIndex({
name: 'foos_by_tag_uniqued',
source:  {
  collection: Collection("foos"),
  fields: {
    tags: Query(
      Lambda((doc) =>
          Append(Select(['ref', 'id'], doc), Select(['data', 'tags'], doc))
      )
    ),
  },
},
terms: [
  { field: ['data', 'user'] },
  { binding: 'tags' }
],
unique: true,
serialized: true,
})

Create("foos", { data: { user: "john", tags: [] } })
Create("foos", { data: { user: "john", tags: [] } }) // allowed: correct
Create("foos", { data: { user: "john", tags: ["fauna"] } })
Create("foos", { data: { user: "john", tags: ["fauna"] } }) // error: correct

The basic idea is to create an additional index for every document as if it was tagged with its own ID. This is obviously pretty hacky, because the ref ID is not actually a tag, but it works around the issue by ensuring the index array is never empty. When tags is empty, the index is guaranteed to be unique because the ref ID is unique.