Is sorting by reference the same as sorting by ts?

Hi guys, hopefully this isn’t a duplicate post, I did try and find others with similar questions but couldn’t find anything that quite answered my question.

We have an index that returns multiple documents based on the user ID. We want to use this with a Union to return documents that match a set of user IDs ( similar to a WHERE IN SQL clause ). But we also want this to be sorted by the time it was created. I’m aware that I could create an index that takes the user ID as the term and returns the timestamp and the ref as the value, something like this:

CreateIndex({
  name: "comments-by-user-id",
  unique: false,
  serialized: true,
  source: Collection("comments"),
  terms: [
    {
      field: ["data", "userId"]
    }
  ],
  values: [
    { field: ['ts'] },
    { field: ['ref'] }
  ]
})

But, I’ve heard a few people say that if you don’t specify any values, the documents are sorted by reference, but I’m unsure what exactly that means. Can the reference be trusted to sort on time created, or should we specifically create an index to do this?

Thanks!

Hi @sandersweb,

The short answer is that it depends on how the document IDs were generated.

Fauna’s sorting is accomplished by ordering of an index’s values.

If no values are specified, then, yes, it will sort based on the Reference of each matching index entry.

When creating a document, if you do not specify a document ID (by using the Ref function), a synthetic document ID is generated.
Synthetic document ID generation is based on the Snowflake ID algorithm. So sorting on these would be the same as sorting on creation timestamp.

However, if you have specified a document ID, then it is not guaranteed that the sorted results (sorted on ref) will be in the same order as they were created. In this case, you might want to create an index like you have already noted.

(Here are examples of creating a document with and without specifying a document ID -
Create | Fauna Documentation )

Hope that helps!

Initially, yes (mostly). After a document has been updated, no.

Upon document creation, ts reports the transaction timestamp when the document was created. It is possible for multiple documents to be created in a single transaction; those documents would have a common ts but their auto-generated references might cause them to be ordered slightly differently than expressed in the creation query (reference generation involves hashing).

After a document has been updated, ts reports the transaction timestamp of the update. That’s where your ordering could be unexpected. If you need to order by creation time, include a field in your documents whose value is Now(). For example:

Create(
  Collection("blog_posts"),
  {
    data: {
      title: "My blog post",
      body: "Lorem ipsum...",
      createdAt: Now(),
    }
  }
)

And create an index whose values definition orders by the createdAt field.

Ordering by ts, especially if reverse: true is specified, could be useful for listing documents in “updated” order.

Thanks @ldavuluri and @ewan and apologies for the late reply, I completely missed this message. All of our documents are created using Fauna’s IDs, so with that for context, am I correct in assuming that for the following indexes:

CreateIndex({
  name: "comments-by-user-id-ts-asc",
  unique: false,
  serialized: true,
  source: Collection("comments"),
  terms: [
    {
      field: ["data", "userId"]
    }
  ],
  values: [
    { field: ['ts'] },
    { field: ['ref'] }
  ]
})

CreateIndex({
  name: "comments-by-user-id-asc",
  unique: false,
  serialized: true,
  source: Collection("comments"),
  terms: [
    {
      field: ["data", "userId"]
    }
  ],
})

CreateIndex({
  name: "comments-by-user-id-custom-ts-asc",
  unique: false,
  serialized: true,
  source: Collection("comments"),
  terms: [
    {
      field: ["data", "userId"]
    }
  ],
  values: [
    { field: ['data', 'ts'] },
    { field: ['ref'] }
  ]
})
  • comments-by-user-id-ts-asc will sort documents based on the last transaction time ( i.e. the latest updated ). This is essentially a “sort by last modified ASC” type query.
  • comments-by-user-id-asc will sort documents based on the creation time. This is essentially “sort by creation time ASC”
  • comments-by-user-id-custom-ts-asc will sort documents based on the time specified in data.ts. Assuming this is created with something like { ts: Now() } and is never updated, it’s pretty much the same as comments-by-user-id-asc except that data.ts could potentially be updated.

Is that right?

And just another question to follow up on that. If a Fauna generated ID contains the timestamp, is there an easy way to extract that timestamp from the ID / ref, or is it easier to just store a custom ts property on the document and work with that? I couldn’t see anything immediately in the docs and tried a couple of methods to retrieve it, but neither seemed to work.

For the index comments-by-user-id-ts-asc, yes, the results are primarily sorted by ts in ascending order, and secondarily by their reference. Whenever documents share a ts (such as when multiple documents are created or updated at the same time, or if their event history has been modified), then their references determine the final ordering.

For the index comments-by-id-asc, results are sorted in reference (document ID) order. If each document is created in separate transactions, then that is equivalent to creation time. If several documents are created per transaction, you might find some references that are in a different order than were specified in the transaction. Overall, this is similar to ordering by creation time, with some potentially unexpected order differences.

For the index comments-by-user-id-custom-ts-asc, the order is determined primarily by ts and secondarily by ref. When multiple documents have the same ts, their references will determine the final order.

The Fauna-generated document ID’s do not contain a timestamp, but the transaction time os one of the contributing factors in their generation. We use a lightly modified version of the Snowflake algorithm to generate ids, and once generated there is no opportunity to recover the timestamp.

Each document stores a ts, reporting when it was last modified. If you need to persist the creation time, or any other specific time, just store the timestamp. For example:

Create(
  Collection("posts"),
  {
    data: {
      title: "My title",
      body: "Lorem ipsum...",
      createdAt: Now(),
      post-of-the-week: Time("2021-10-22T16:51:03+00:00")
    }
  }
) 

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.