Bug when using Paginate with after cursor on the result of a Join with an IndexRef that has reverse:true in the first object of the values array

,

Hi everyone.
I asked on Slack if the bug was known and Jay kindly asked me to post a detailed thread here, anticipating a possible solution using Union.

Here I will simplify my code while retaining the same structure of the query I need for the logic of the app.

CreateIndex(
  {
    name:
      "posts_TERMS_ref_VALUES_created_likes_ref",
    source: Collection("posts"),
    terms: [{ field: ["ref"] }],
    values: [
      { field: ["data", "created"], reverse: true },
      { field: ["data", "likes"], reverse: true },
      { field: ["ref"] },
    ],
    serialized: true,
  }
)
CreateIndex({
  name: "posts_TERMS_author_VALUES_ref",
  source: Collection("posts"),
  terms: [
    {
      field: ["data", "author"],
    },
  ],
  values: [{ field: ["ref"] }],
  serialized: true,
})
    Let(
      {
        j: Join(
          Match(Index("posts_TERMS_author_VALUES_ref"), Var("author")),
          Lambda(
            ["postRef"],
            Match(
              Index(
                "posts_TERMS_ref_VALUES_created_likes_ref"
              ),
              Var("postRef")
            )
          )
        ),
        page: If(
          Equals(Var("before"), null),
          If(
            Equals(Var("after"), null),
            Paginate(Var("j"), { size: 1 }),
            Paginate(Var("j"), {
              after: Var("after"),
              size: 1,
            })
          ),
          Paginate(Var("j"), {
            before: Var("before"),
            size: 1,
          })
        )
      },
      Var("page")
    )

Possible scenarios:

  1. You don’t specify size, you don’t specify after --> page returns the posts ordered correctly.
  2. You specify size:1 and you specify after --> page behaves in a weird way. Sometimes data is the same with or without after. Sometimes data is empty. In any case, what you pass as after will be in the before cursor returned by Paginate.
  3. You specify size:1 and you specify after BUT you delete reverse: true --> everything works again.

Thanks for your help.

p.s. Here is a list of resources I’ve already consulted:

@alfrenerd Thanks for reposting it here. We request all our users to post their questions here instead of Slack for quicker attention.

I took your Index definitions, populated some data, reproduced the issue and provided workaround solution in the below steps. Please be aware that it is a long post but wanted to provide keystroke level examples.

CreateCollection({ name: "posts"});

CreateIndex(
  {
    name:
      "posts_TERMS_ref_VALUES_created_likes_ref",
    source: Collection("posts"),
    terms: [{ field: ["ref"] }],
    values: [
      { field: ["data", "created"], reverse: true },
      { field: ["data", "likes"], reverse: true },
      { field: ["ref"] },
    ],
    serialized: true,
  }
)

CreateIndex({
  name: "posts_TERMS_author_VALUES_ref",
  source: Collection("posts"),
  terms: [
    {
      field: ["data", "author"],
    },
  ],
  values: [{ field: ["ref"] }],
  serialized: true,
})

Create some sample documents in the collection. ( All documents with author = “Jay” and created between “2020-11-01” to “2020-11-05”)

Map(
[
 {
  author: "jay",
  created: "2020-11-01",
  likes: "100"
 },
 {
  author: "jay",
  created: "2020-11-01",
  likes: "300"
 },
 {
  author: "jay",
  created: "2020-11-02",
  likes: "1000"
 },
 {
  author: "jay",
  created: "2020-11-03",
  likes: "256"
 },
 {
  author: "jay",
  created: "2020-11-03",
  likes: "108"
 },
 {
  author: "jay",
  created: "2020-11-04",
  likes: "30000"
 },
 {
  author: "jay",
  created: "2020-11-05",
  likes: "1"
 }
],
Lambda("x", Create(Collection("posts"), {data: Var("x")}))
)

Used the join ( same as the one you posted) to Get all the posts where author="jay" sorted by created desc , likes desc

Paginate(Join(
  Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  Lambda(
    ["postRef"],
    Match(
      Index(
        "posts_TERMS_ref_VALUES_created_likes_ref"
      ),
      Var("postRef")
    )
  )
))

{
  data: [
    ["2020-11-05", "1", Ref(Collection("posts"), "281937619999264258")],
    ["2020-11-04", "30000", Ref(Collection("posts"), "281937619999262210")],
    ["2020-11-03", "256", Ref(Collection("posts"), "281937619999265282")],
    ["2020-11-03", "108", Ref(Collection("posts"), "281937619999263234")],
    ["2020-11-02", "1000", Ref(Collection("posts"), "281937619999267330")],
    ["2020-11-01", "300", Ref(Collection("posts"), "281937619999266306")],
    ["2020-11-01", "100", Ref(Collection("posts"), "281937619999261186")]
  ]
}

So far all good, let us limit the page size:2 to play around with after cursor.

Paginate(Join(
  Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  Lambda(
    ["postRef"],
    Match(
      Index(
        "posts_TERMS_ref_VALUES_created_likes_ref"
      ),
      Var("postRef")
    )
  )
), {size:2})

{
  after: [
    "2020-11-03",
    "256",
    Ref(Collection("posts"), "281937619999265282"),
    Ref(Collection("posts"), "281937619999265282")
  ],
  data: [
    ["2020-11-05", "1", Ref(Collection("posts"), "281937619999264258")],
    ["2020-11-04", "30000", Ref(Collection("posts"), "281937619999262210")]
  ]

Now let us get the next page with size:2 and we will use the after cursor from previous result.

  Paginate(Join(
  Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  Lambda(
    ["postRef"],
    Match(
      Index(
        "posts_TERMS_ref_VALUES_created_likes_ref"
      ),
      Var("postRef")
    )
  )
), {size:2,   after: [
    "2020-11-03",
    "256",
    Ref(Collection("posts"), "281937619999265282"),
    Ref(Collection("posts"), "281937619999265282")
  ]})

{
  before: [
    "2020-11-03",
    "256",
    Ref(Collection("posts"), "281937619999265282"),
    Ref(Collection("posts"), "281937619999265282")
  ],
  after: [
    "2020-11-03",
    "256",
    Ref(Collection("posts"), "281937619999265282"),
    Ref(Collection("posts"), "281937619999265282")
  ],
  data: [
    ["2020-11-05", "1", Ref(Collection("posts"), "281937619999264258")],
    ["2020-11-04", "30000", Ref(Collection("posts"), "281937619999262210")]
  ]
}

Issue - We still get first page data instead of second page.

  data: [
    ["2020-11-05", "1", Ref(Collection("posts"), "281937619999264258")],
    ["2020-11-04", "30000", Ref(Collection("posts"), "281937619999262210")]
  ]

Workaround - Instead of Join please use Lambda and Union with the same Indexes.

First let us get all the posts.

Paginate(
Let(
  {
  author_set: Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  page:Select("data", Paginate(Var("author_set"), { size: 100000 })),
  leaf_sets: Map(Var("page"),Lambda("ref",Match(Index("posts_TERMS_ref_VALUES_created_likes_ref"),Var("ref"))))
},
Union(Var("leaf_sets"))
)
)

{
  data: [
    ["2020-11-05", "1", Ref(Collection("posts"), "281937619999264258")],
    ["2020-11-04", "30000", Ref(Collection("posts"), "281937619999262210")],
    ["2020-11-03", "256", Ref(Collection("posts"), "281937619999265282")],
    ["2020-11-03", "108", Ref(Collection("posts"), "281937619999263234")],
    ["2020-11-02", "1000", Ref(Collection("posts"), "281937619999267330")],
    ["2020-11-01", "300", Ref(Collection("posts"), "281937619999266306")],
    ["2020-11-01", "100", Ref(Collection("posts"), "281937619999261186")]
  ]
}

Get the First page with size:2

Paginate(
Let(
  {
  author_set: Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  page:Select("data", Paginate(Var("author_set"), { size: 100000 })),
  leaf_sets: Map(Var("page"),Lambda("ref",Match(Index("posts_TERMS_ref_VALUES_created_likes_ref"),Var("ref"))))
},
Union(Var("leaf_sets"))
),
{size:2})

{
  after: [
    "2020-11-03",
    "256",
    Ref(Collection("posts"), "281937619999265282"),
    Ref(Collection("posts"), "281937619999265282")
  ],
  data: [
    ["2020-11-05", "1", Ref(Collection("posts"), "281937619999264258")],
    ["2020-11-04", "30000", Ref(Collection("posts"), "281937619999262210")]
  ]
}

Second page with size:2, you will notice that data contains next 2 documents.

Paginate(
Let(
  {
  author_set: Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  page:Select("data", Paginate(Var("author_set"), { size: 100000 })),
  leaf_sets: Map(Var("page"),Lambda("ref",Match(Index("posts_TERMS_ref_VALUES_created_likes_ref"),Var("ref"))))
},
Union(Var("leaf_sets"))
),
{size:2,
after: [
  "2020-11-03",
  "256",
  Ref(Collection("posts"), "281937619999265282"),
  Ref(Collection("posts"), "281937619999265282")
]})

{
  before: [
    "2020-11-03",
    "256",
    Ref(Collection("posts"), "281937619999265282"),
    Ref(Collection("posts"), "281937619999265282")
  ],
  after: [
    "2020-11-02",
    "1000",
    Ref(Collection("posts"), "281937619999267330"),
    Ref(Collection("posts"), "281937619999267330")
  ],
  data: [
    ["2020-11-03", "256", Ref(Collection("posts"), "281937619999265282")],
    ["2020-11-03", "108", Ref(Collection("posts"), "281937619999263234")]
  ]
}

Third Page … so on and so forth.

Paginate(
Let(
  {
  author_set: Match(Index("posts_TERMS_author_VALUES_ref"), "jay"),
  page:Select("data", Paginate(Var("author_set"), { size: 100000 })),
  leaf_sets: Map(Var("page"),Lambda("ref",Match(Index("posts_TERMS_ref_VALUES_created_likes_ref"),Var("ref"))))
},
Union(Var("leaf_sets"))
),
{size:2,
  after: [
    "2020-11-02",
    "1000",
    Ref(Collection("posts"), "281937619999267330"),
    Ref(Collection("posts"), "281937619999267330")
  ]})

{
  before: [
    "2020-11-02",
    "1000",
    Ref(Collection("posts"), "281937619999267330"),
    Ref(Collection("posts"), "281937619999267330")
  ],
  after: [
    "2020-11-01",
    "100",
    Ref(Collection("posts"), "281937619999261186"),
    Ref(Collection("posts"), "281937619999261186")
  ],
  data: [
    ["2020-11-02", "1000", Ref(Collection("posts"), "281937619999267330")],
    ["2020-11-01", "300", Ref(Collection("posts"), "281937619999266306")]
  ]
}

Hope this helps.

Thanks for your answer Jay. I still have a couple of observations if you don’t mind.

  1. Paginating over the Union causes 1 read operation for every user post. If it wasn’t for the bug with reverse: true, there would be just 1 read operation in total.
  2. Size: 100000 could be enough for posts of just one user, but that could not be the case for the hypothetical index “all_posts_VALUES_ref”. Based on your example, changing Select("data", Paginate(Var("author_set"), { size: 100000 })) to Select("data", Paginate(Var("author_set"), { size: 2, after: ["2020-11-03", "256", Ref(Collection("posts"), "281937619999265282")]} breaks the query.

Please correct me if I’m wrong, I’m just a beginner with FaunaDB. I really enjoy working with it, and I didn’t have any major roadblock until this happened. I’ve been wrapping my head around it for the last week. I was very surprised when I found out that it was in fact reverse: true that was breaking all my logic.

@alfrenerd thanks for bringing up these excellent questions. Sorry that you stumbled on this for a week and the pain is totally understood. We are still working on a process to present known issues to users to avoid these situations. We will be there :slight_smile:
Also, please be noted that, this above solution is a workaround till we fix the original Join + after cursor issue which came out to be huge LOE to fix and thus taking back seat.

I am still researching if this is expected. Will update the post soon with the findings.

You should let the page size of inner Paginate to be current max page size limit of 100000, else it would break as you mentioned. Since the Index is sorted reverse, all the entries of interest would be in those 100000.

Thank you @Jay-Fauna for your answer. I think that the time spent on learning FQL was absolutely worth it. The flexibility and composability of the language allows developer to reason and solve problems in multiple ways.

As what I think could be valuable feedback, I can report that I stumbled on this problem because many, many answers related to sorting/filtering propose to use Join as a solution. In my first post I’ve linked a few threads as an example. Because of this, I foresee that similar threads will be opened.

I hope a fix will be issued, but I understand that it could take a while.

Keep up the awesome work you are doing!

Hi there. This bug hits me badly, and workaround works but hits me even worse - because it causes query price growing like crazy with growing number of indexed items(which is the sole purpose of database, sure?). I really like Faunadb innovations and FQL is a genius product. I even like the complicated but transparent pricing - so I can think hard how to find optimal price/performance logic for my app. But BUGS are not so pleasant and BUG in core logic “…which came out to be huge LOE to fix and thus taking back seat.” is just awful. How can I trust system-critical cloud database which can’t fix it’s bugs for years? I am really upset now and consider to move to other solution, while it’s very painful, because so much efforts was put to learn and build on top of Fauna.

I understand, that building such a system is a hard task, and bugs happen, yeah. But I don’t understand bugs “taking back seat”.

Hi @TimPchelintsev,

I want to reassure you that, while we might prioritize some improvements as being worked on prior to others, we don’t put any of that work into the back seat, as it were. That was unfortunate phrasing and I apologize on behalf of Fauna that it left a sour taste for you.

While we can’t say when a particular improvement will be available, we are actively working on this, along with numerous other improvements. We appreciate all of the work you’ve put into using Fauna and very much don’t want that work to be wasted. Please continue to use the workaround while we finish the effort on this issue.

If you have any follow up questions please feel free to DM or email me directly. You can reach me at cory dot waddingham at fauna dot com.

Thanks,
Cory

1 Like

Thanks for your kind words, @Cory_Fauna!

Well, I firmly believe that with realtime already shipped(cool feature BTW) and with possible future offerings in the field of full-text search, geo-search, triggers and so on Faunadb will be unstoppable. Just don’t forget to fix bugs guys and you will be golden.

I am now working hard to optimize queries to go around this problematic use case.

After a long way through Fauna tutorials, demo apps and this forum at some point I managed to achieve what I felt like ‘developer nirvana’ - universal approach to write filtering queries on top of simple atomic ‘collection_by_term’ and ‘collection_by_ref_sortBy_value’ indexes, so I can easily incorporate more fields and sorting logic while experimenting with features. All this with pagination, counts and population of nested items by id - something I used to have by default with previous databases like MongoDB.

So I really like the power of FQL. I feel that this approach is not very scalable in terms of pricing though, so now as I understand query requirements much better, I’ll try to reimplement core queries
to use highly optimized indexes - which seems to be the core philosophy of Faunadb anyway.

3 Likes

Cool. So, did you find a solution that doesn’t involve the proposed (and very costly) workarounds?
Fauna is amazing - such a shame that crucial bugs like this are still unresolved.

Hi @alfrenerd!

Well, after 1 week long sprint I can say that problem is generally solved for me.

What I did to avoid previous approach(combination of atomic one field indexes):

  1. Implemented logic to have ability to measure query pricing both in tests and production so I can clearly see what gives to that huge bill at the end of the month(partially joke).
  2. Identified exact filtering and sorting requirements which my app actually uses on every screen. This was really hard to make at the beginning, but at this time it was possible.
  3. Wrote exact indexes to match those queries for each subset of terms which comes together(like items_TERMS_locale_type_collections_VALUES_-popularity_-created for query when I need to filter by locale, type and collections, sort by populatiry and created reversed.
  4. Then in UDF like ‘get_items_optimized’ I check input arguments, identify possible sets of provided arguments and link them to right index.
  5. Experimented with putting all needed values into index to prevent Map(Get) - but came to conclusion that this feature has too much value for me so I continue to use it. Anyway, number of additional reads and compute operations to support Map(Get) depends on page size and not on collection size - so for small pagination sizes like 10 or 20 it’s cost is fixed and though not a huge problem.
  6. What remains a potential problem is Count operatoins, as their price grows with collections size. So for my case it is possible to precalculate those counts too - but for now I decided to leave it as next optimization step.
  7. Implemented caching logic in my app both client and server side(Next.js ISR on backend + react-query on frontend) - so by this caching layer I can trade freshness of data for usage costs.
  8. Last thing was search. So, for now I use not very scalable solution of Paginate(Filter(ContainStr)). I see, that currently for 10k search requests I would pay ~5$ or 0.5$ for 1k requests. Algolia asks at least 1$ for 1k requests - so as far my costs for seatch with native Fauna will exceed Algolia, I’ll need to either somehow optimize search or use Algolia.

As a conclusion I can say that solution I used at the end is what was clearly advertised by all Fauna documentation I have read. So it’s just a question of mental model about building queries. Maybe we are just too “spoiled” by generosity of previous databases and Fauna approach feels like a strict one. But their promise is in their logo - use optimized things and you will fly(or your usage costs will fly, he-he).

2 Likes

I just wanted to confirm the bug still exists:

As this repro demonstrates.

The two main files of interest are (the rest are support files)

and

and the query that does not work is:

      Paginate(
        Join(
          Match(Index("things_by_selectMe")),
          Lambda("ref", Match(Index("all_things")))
        ),
        { size: 5, after: after }
      )

Specifically, after is not working (and yes I have tried every combination of all the fields of after)

UPDATE: The basic workaround indeed will work for me, but the scalability approaches discussed above are interesting as well.