Wrong Ref returned using Reverse with Union

I have just run into an issue that I believe is a bug using Reverse with Union. This query seems to have been running fine for a while now and the issue was just spotted today. I am using Union on Indexes generated by a GraphQL schema and I have narrowed down the issue to wrong refs being returned when Reverse is applied.

All of these Indexes should return a property ref, but with Reverse applied I am now getting refs for property_guests, property_editors and property_members. If I remove Reverse the correct property refs are returned.

Query:

Map(
  Paginate(
    Reverse(
      Intersection(
        Distinct(
          Union(
            Match(Index("property_owner_by_user"), Ref(Collection("users"), "xxxxx")),
            Match(Index("property_editors_by_user"), Ref(Collection("users"), "xxxxx")),
            Match(Index("property_guests_by_user"), Ref(Collection("users"), "xxxxx")),
            Match(Index("property_members_by_user"), Ref(Collection("users"), "xxxxx"))
          )
        )
      )
    ),
    { size: 10 }
  ),
  Lambda("ref", Var("ref"))
)

Result:

{
  after: [
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx")
  ],
  data: [
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("property_guests"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx"),
    Ref(Collection("properties"), "xxxxx")
  ]
}

Hi @lindy, and welcome!

When you don’t use Reverse, you may only see property refs because the page size is 10. For example, documents from other collections could appear in the results anywhere after the 10th document.

Can you share the definitions for each of the 4 indexes involved in your query? This would help us determine how the set functions are behaving:

[
  Get(Index("property_owner_by_user")),
  Get(Index("property_editors_by_user")),
  Get(Index("property_guests_by_user")),
  Get(Index("property_members_by_user"))
]

Note that the Intersection function isn’t doing much for you. Intersection finds the subset of items that appear in every set, but you only provide one set (the result of calling Distinct).

Thanks @ewan!

This is just a small sample of the overall query where I could replicate my issue. In my full code I am using Intersection on a Union directly after Distinct so I left it in case there was something there.

When using a size of 100000 without reverse I can see the correct property ref where the property_guests ref is otherwise.

I also noticed if I only include Match(Index("property_guests_by_user"), Ref(Collection("users"), "xxxxx")) inside the Union with Reverse applied it will show the correct property ref.

Here are the index definitions:

[
  {
    ref: Index("property_owner_by_user"),
    ts: 1669760534270000,
    active: true,
    serialized: true,
    name: "property_owner_by_user",
    source: Collection("properties"),
    terms: [
      {
        field: ["data", "owner"]
      }
    ],
    unique: false,
    partitions: 1,
    data: {
      gql: {
        ts: Time("2022-11-29T22:22:13.553137Z")
      }
    }
  },
  {
    ref: Index("property_editors_by_user"),
    ts: 1669760534270000,
    active: true,
    serialized: true,
    name: "property_editors_by_user",
    source: Collection("property_editors"),
    values: [
      {
        field: ["data", "propertyID"]
      }
    ],
    terms: [
      {
        field: ["data", "userID"]
      }
    ],
    unique: false,
    partitions: 1,
    data: {
      gql: {
        ts: Time("2022-11-29T22:22:13.553137Z")
      }
    }
  },
  {
    ref: Index("property_guests_by_user"),
    ts: 1669760534270000,
    active: true,
    serialized: true,
    name: "property_guests_by_user",
    source: Collection("property_guests"),
    values: [
      {
        field: ["data", "propertyID"]
      }
    ],
    terms: [
      {
        field: ["data", "userID"]
      }
    ],
    unique: false,
    partitions: 1,
    data: {
      gql: {
        ts: Time("2022-11-29T22:22:13.553137Z")
      }
    }
  },
  {
    ref: Index("property_members_by_user"),
    ts: 1669760534270000,
    active: true,
    serialized: true,
    name: "property_members_by_user",
    source: Collection("property_members"),
    values: [
      {
        field: ["data", "propertyID"]
      }
    ],
    terms: [
      {
        field: ["data", "userID"]
      }
    ],
    unique: false,
    partitions: 1,
    data: {
      gql: {
        ts: Time("2022-11-29T22:22:13.553137Z")
      }
    }
  }
]

Thanks for the index definitions!

The main problem is that the index definitions provide incompatible results. The property_owner_by_user index has no values definition, so it returns references to the matching properties documents. The other indexes all return only the propertyID from matching documents.

I’m not entirely sure why you see references from the other collections. Behind the scenes, each matching index entry always contains a reference to the covered document so that pagination functions correctly. But three of your indexes don’t return those in a “visible” way. There might be some interplay with that combination of set functions that I haven’t seen before.

You could verify the difference by executing this query in the Dashboard shell (after manually replacing the xxxxx entries with the appropriate user document ID):

[
  Paginate(Match(Index("property_owner_by_user"), Ref(Collection("users"), "xxxxx")), { size: 2}),
  Paginate(Match(Index("property_editors_by_user"), Ref(Collection("users"), "xxxxx")), { size: 2}),
  Paginate(Match(Index("property_guests_by_user"), Ref(Collection("users"), "xxxxx")), { size: 2}),
  Paginate(Match(Index("property_members_by_user"), Ref(Collection("users"), "xxxxx")), { size: 2}),
]

That should give you up to 2 matching documents from each index, making the output comparison straightforward. I think that would demonstrate the result mismatches that would cause your query to appear to misbehave.

To get the expected results, you’d need to update the property_owner_by_user index to have the same values definition as the other indexes. Since you are using GraphQL, that’s best done by adjusting your schema and re-uploading it. From FQL, you’d have to delete the index and recreate it.

When I run a Pagination on each Index like that I get properties refs for all of them, everything comes through okay. I do think you are on to something about the property_owner_by_user index, though. If I take that one out of the Union with Reverse I get all properties documents as expected.

Paginate(
  Reverse(
    Union(
      // Match(
      //   Index("property_owner_by_user"),
      //   Ref(Collection("users"), "xxxxx")
      // ),
      Match(
        Index("property_editors_by_user"),
        Ref(Collection("users"), "xxxxx")
      ),
      Match(
        Index("property_guests_by_user"),
        Ref(Collection("users"), "xxxxx")
      ),
      Match(
        Index("property_members_by_user"),
        Ref(Collection("users"), "xxxxx")
      )
    )
  ),
  { size: 10 }
)

I’m not sure how to adjust the GraphQL, as far as I can tell these should all be returning properties refs.

This is the part of the schema creating the Indexes:

type Property @collection(name: "properties") {
  editors: [User!]! @relation(name: "property_editors")
  guests: [User!]! @relation(name: "property_guests")
  members: [User!]! @relation(name: "property_members")
  owner: User! @relation(name: "property_owner")
}

type User @collection(name: "users") {
  propertiesAsEditor: [Property!]! @relation(name: "property_editors")
  propertiesAsGuest: [Property!]! @relation(name: "property_guests")
  propertiesAsMember: [Property!]! @relation(name: "property_members")
  propertiesAsOwner: [Property!]! @relation(name: "property_owner")
}

The weird thing is, this has been running as expected since June 16th and it just started giving us this issue today. We noticed because it broke the front end in some parts that were working fine before. The GraphQL that is returned isn’t throwing an error and is returning the property_editors, property_guests and property_members documents as if they were a properties document, just with all null attributes.

Thanks for the help!

When I run a Pagination on each Index like that I get properties refs for all of them, everything comes through okay.

That can only be possible if the propertyID field contains a reference to a property document. If that’s the case, the rest of your query makes sense.

The schema snippet you shared doesn’t include propertyID, so it doesn’t seem to be the schema that you’re actually using.

We’re getting far enough into the weeds that I’d likely need to see sample documents and the actual schema to advise further. Feel free to DM those documents if you’d rather not share here.

Correct, it contains a properties reference and is the schema I’m using. The propertyID is created by gql when a User is added to one of those lists.

Sorry, I didn’t include this bit. When the schema is uploaded to Fauna it generates the Indexes and also creates the Collections for the many-to many relationships. The property_owner is a one-to-many so there is no collection created for that one.

The property_guests, property_editors and property_members collections look like this:

{
  "ref": Ref(Collection("property_guests"), "xxxxx"),
  "ts": 1666029191600000,
  "data": {
    "propertyID": Ref(Collection("properties"), "xxxxx"),
    "userID": Ref(Collection("users"), "xxxxx")
  }
}

In the cases where it is returning the wrong ref when Reverse is used, it is returning the ref for the property_guests collection instead of the propertyID like it should be.

Tomorrow I can work on putting together an example with data for you.

I shared this thread with our engineering team, and it appears that you have encountered a bug that occurs with a recent storage engine update in the specific conditions that your query is using. Those conditions include using Union where the first set has no covered values.

A fix is being worked on. In the meantime, you might try moving the Match(Index("property_owner_by_user"), Ref(Collection("users"), "xxxxx")), expression so that it is not the first inside the Union.

That did the trick! I moved property_owner_by_user to the bottom of the list and everything is ship shape now.

Thanks a bunch @ewan!

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.